After Getting Used to Voice Input, I Never Want to Type Again
April 4, 2026 · 2394 words · 5 min read · #Fun Series
Let the voice-first era begin. This post covers: why I started using voice input, why I chose Typeless, and the positive feedback it’s brought me.
My first encounter with voice input was in 2024, when WeChat’s keyboard added a voice-to-text function. At that point it was fairly basic — you spoke, it transcribed, word for word. The main requirement back then was accuracy.
But I found the use cases limited. Mostly I’d use it when typing wasn’t convenient, or with people I knew well enough that the informality felt okay. Using it freely felt off — I worried about how it would read on the other end.
Sometime around last year, as AI Agents took off, voice input tools started quietly exploding. By now there are dozens of voice-related products on X alone. For coding and writing, voice input — “speaking your output” — is a remarkably convenient way to input. You don’t type anything. You just express a thought.
I mentioned in my previous post that one of my most significant recent changes is going deep with voice input. There are plenty of products out there; I won’t review them all. Today I want to focus on three things: what voice input actually is, why I use it, and the positive feedback it’s generated for me.
Part One: What Is Voice Input?
(A note: in 2024 I picked up a bad habit of not including images in my posts. Text can describe information fairly precisely, and a quick search finds any product. So I’m keeping this image-light.)
Voice input is the simplest thing in the world: open the app, tap the button, start talking. Most of us have used it — WeChat’s voice-to-text is basically everyone’s first experience. Consider that the earliest prototype. What exists now is far more sophisticated.
Both phones and computers have many such products. I use iOS, and I’ve tried WeChat input, Doubao input, Shandian Shuo, and currently use Typeless. Setting other factors aside, Doubao’s accuracy is noticeably better than WeChat’s — despite multiple requests from our group chat, WeChat’s accuracy is still mediocre.
Voice input is faster than typing. Whether you use Shuangpin, Wubi, or standard Pinyin, voice input is quicker — you’re just talking normally, and it automatically adds punctuation. (Somewhat embarrassingly, I still haven’t gotten used to Shuangpin.)
Voice input also pushes you to clean up speech habits. Even though the mode is spoken, the output is written text — so you start noticing things. Verbal tics like “and then, and then, and then” look terrible in print. So before you start speaking, you pause for a few seconds to think about what you actually want to say.
For long text replies, voice-to-text is actually easier to revise than typed text. That’s counterintuitive — typed text seems quicker to edit — but in practice, especially for journaling, I’ve found it’s faster to revise a voice transcription.
Voice input does have some persistent weaknesses. Homophones are hard: distinguishing 他/她/它 (“he/she/it”) is tricky, and 的/得/地 (different grammatical particles that sound the same) are a constant challenge. Mixed Chinese-English is difficult, too. English words need to be pronounced fairly accurately; Chinese phonetic spellings (pinyin) need manual correction afterward. (That was as of 2024.)
Then I found Typeless. I started using it deeply around January 2026 — set it as my default input method. The core advantage: you don’t set anything up. Open it, start talking. When you finish, it lightly processes the content — removing filler words and errors.
The real magic of voice input, I discovered, is on desktop. Normally on a Mac, voice input is just a mode within the keyboard — one option among many. Typeless is different. It’s a standalone app. Once authorized, you call it in any window with a keyboard shortcut. No switching inputs. No interruption to your current text flow. You can even type and speak simultaneously.
That experience is something else. And the two things Typeless does better than anything else: automatic filler-word removal and real-time error correction and reconstruction. Filler-word removal keeps your output clean from the start. The reconstruction feature is the more interesting one. When you’re speaking, thoughts move fast. If I realize mid-sentence that I said something wrong, I can just tell Typeless: “I said something wrong earlier — please fix it.” It doesn’t transcribe that instruction as text. It reads it as a command and edits the prior content accordingly.
That’s the core reason AI makes voice input genuinely good.
It also has some drawbacks. Network dependency: it’s a foreign app with some latency requirements — typically 1–2 seconds of processing after you stop speaking. And the price is steep. Early on, the free tier was 4,000 characters per week. To stretch my quota, I ran two accounts on separate devices (phone and computer) since they don’t share data, giving me effectively 8,000 characters/week. Recently the limit was raised to 8,000 per single account, so I consolidated. But 8,000 still isn’t enough: by Tuesday of a given week I’d already spoken 6,000+ characters.
So I subscribed. List price is ¥998/year; with a student ID or certain other paths you can get 50% off. The offer I received was valid for 8 years. I paid a little over ¥400 for a year. For what it does to my output capacity, I think it’s extremely good value.
That’s the overview of voice input and Typeless. Since Typeless is my core tool, I’ll leave it at that rather than comparing everything on the market. Different things work for different people.
Part Two: Why I Use Voice Input
The most immediate reason: voice input makes my output dramatically more efficient. I just talk. I don’t have to think about whether my phrasing is correct or whether my idea holds up in real time — the goal is to express the thought, not to produce a polished sentence.
When I type, I can get stuck: a word won’t come, I second-guess an idea, I run slow when I have a lot to say. That friction is hard for me to accept. With voice I don’t stop. My attention doesn’t break. I don’t have to self-edit in the moment, don’t have to make every sentence beautiful. The goal is to get the content down — to make the thought exist somewhere outside my head.
A concrete example: writing a journal. Before, I’d plan the outline, think through the content, second-guess my structure. Now I open the journal app and Typeless, and I just talk from start to finish. When I stop, I often have a thousand words already. I’ve captured the full sweep of the day — memories, reflections, everything — quickly. Typeless handles the light cleanup; the output is maybe 90% what I’d want anyway.
Another example: when I’m working on Coding, describing requirements. I don’t need to organize everything carefully in advance. As long as I know the prompt structure — background, goals, constraints, standards — I can just walk through them verbally. Open the coding tool; it generates the file structure, starts building. I come back to adjust: wrong style, wrong logic. Same voice-based description to specify the fix. The moment I switch to typing, the pace drops. Efficiency is core to why I use this.
Second reason: my Mandarin pronunciation and some English words aren’t perfectly standard, but I want accurate output. Typeless helps with correction. Claude Code is a constant example — it’s easy to transcribe as “Cloud Code.” Some English words are just long: a tool called “Antigravity” is hard to type accurately and even harder to memorize the spelling of. With voice, I speak roughly what I mean; the app corrects it to the right word and right formatting. That’s an experience typed input can’t replicate.
Third reason: voice input has an “input history” feature — everything I say gets logged.
In Typeless, this data is stored locally. I wrote a Skills script to read and parse the local data, then generate content from it. The skill is published now, if anyone wants to try it. The value: excellent corpus accumulation. Traditional typing leaves content scattered across chat logs and specific documents. Voice input is more like a continuous recording tool. Not every app offers this, but the one I use captures everything I say — a word, a sentence, whatever — as a running log.
The exported corpus is extremely valuable to me. In the AI era, I care more and more about accumulating raw personal expression. Feed it to AI and it helps me understand what I used to say. My past experiences, thoughts, and ideas get surfaced again. Very interesting.
Voice input isn’t just a faster way to input. It amplifies my natural thinking. It’s closer to how I actually am, closer to my real expression. It doesn’t bind me to a structure when I’m recording. It doesn’t make me worry about rhetoric. It captures the scattered thoughts first — turns them into raw material. Then that raw material naturally becomes a record. Later I’ll convert these records into reusable, personal data assets.
Part Three: What Positive Feedback Has Voice Input Given Me?
One of the most important sources of positive feedback is actually AI. It can show me the patterns and problems in how I express myself when speaking or thinking. This depends on the content I’ve logged — I feed it to AI for analysis.
I exported my Typeless records. Only about 300 entries from this account — incomplete, because early on I hadn’t set it to save full history, and this is only the home computer; the more important data is split across my phone and work computer. But I gave the 300-entry export to ChatGPT and asked it to analyze.
I asked several questions: What’s my logic of expression? What are my core recurring themes? What suggestions do you have for improvement? What are the problems with how I express myself?
Reading the report, I found it matched my actual situation at probably 90%+.
ChatGPT said:
1. Your expression has a few stable characteristics. First, you’re naturally goal-oriented — your output is rarely for catharsis. It’s to advance something. Even casual remarks land on how to do it, where to put it, how to use it, how to make it more efficient. Second, you care about conclusion first, key points prominent. You have a native resistance to redundancy and performative structure. You keep saying: reduce ordered lists, use paragraphs to make the point whole. Third, you calibrate as you go — the moment you sense something is off, you correct it immediately. This suggests your judgment isn’t slow; in fact it’s fast, often faster than your words.
2. This also shows your expressive strengths clearly. Strong abstraction: you naturally elevate small life or work problems to principle-level. Strong calibration: you usually know specifically what “wrong” means, not just a vague dissatisfaction. Sensitivity to usefulness: you have a sharp sense of what’s worth keeping long-term and what’s just noise. These capacities are actually more visible in voice input than in writing, because voice preserves original thinking rather than a polished version.
3. There are also things worth improving. The main one: you often say more about your thinking process than about the final instruction. The upside is authenticity; the downside is that collaborators may catch the branches and miss the trunk. A small adjustment would help: lead with the conclusion, then the standard, then the background. Say what you want first. Then what “done” looks like. Then why you care. This won’t lose any of your thinking density — it makes it easier for others to enter your frame.
4. Also: you can start crystallizing your high-frequency value judgments into a stable vocabulary. Things like: accuracy first, conclusion-led, reduce redundancy, real pain point, sustainable, economic impact, social impact. The more stable these become, the more your expression sounds like you — and the more easily it’s inherited by AI and memory systems.
The positive feedback from voice input has gone well beyond “I can talk faster now.” It’s brought me closer to myself. It’s made it easier to see myself. It’s preserved my original judgments and revealed what I actually care about. What comes next is making that rawness converge earlier and become more reusable. Then expression isn’t just output — it gradually becomes my own methodology.
From this angle, voice input really isn’t just an input tool anymore. It improves my efficiency. It helps me accumulate corpus, calibrate expression, and review my thinking. In the AI era, the significance of this keeps growing. Because every raw expression you leave behind may eventually become part of understanding you, assisting you, and inheriting your way of thinking.
A note: Typeless currently doesn’t support exporting history on mobile or fully on desktop without workarounds. On desktop I use the Skill I wrote; mobile still isn’t supported. I submitted feedback; they said they’d consider it. I’m looking forward to seeing that developed — once it’s there, I can merge my data across all devices.
A few habit shifts too.
When you type, you don’t really hear yourself. Voice input means you listen back — is my pronunciation clear? Is my Mandarin standard enough? (It isn’t, but I try.)
To get better accuracy, I’ve learned to make my expression match the context more precisely. The more coherent my spoken input, the less correction I need afterward. That’s something you never feel when typing. So every voice input session is, in a small way, a training session for how you express yourself.
I now genuinely believe: voice input is one of the most powerful tools of the AI era. It makes me more willing to express, and more able to. More importantly, it means what I say no longer just floats past and disappears — it gets captured, organized, understood, and used again.
P.S. I recently saw some discussion in a group about whether AI-assisted content “counts” as AI-generated. This post was produced through Typeless voice input and then lightly adjusted for structure — all content is spoken input. Does that count as AI-generated? I don’t know, and I don’t think it’s worth overthinking. What matters is that my ideas got expressed.
Author: DemoChen
Link: https://demochen.com/en/posts/20260404/
License: Unless otherwise stated, this work is licensed under CC BY-NC-ND 4.0. Please credit the original when sharing.
Support: If you found this helpful, feel free to become a Sponsor — grateful for the connection.