I’ve been thinking about transcription as a service.
I got this feedback:
“i would need that on-demand, when the motivation is there. scheduling a call would be a dealbreaker for me … fwiw i think using a purpose built AI for this is a phenomenal use case to allow scalability. basically just otter, but with some live feedback. I’m quite confident this is possible using the current state of open source technology”
So now I’ve been thinking about transcription as a product, which obviously exists, e.g., Otter and many others.
But I don’t think the existing solutions solve the use cases I have in mind.
One use case is transcription for writing.
Why use transcription for writing?
It’s 3x faster.
The average typing speed is around 40 words per minute. The average speaking speed is about 150 words per minute.
It’s a more enjoyable writing experience.
Turn on your microphone and go for a walk or pace around the room. Talk out your thoughts as if you're having a conversation rather than having to sit down at a desk, hunched over your keyboard.
Beat writer’s block.
The blank page is daunting. Getting started feels like jumping over a hurdle. It’s easier to just start talking.
Anytime, anywhere.
If you’re in the car, you can turn on the microphone in your phone, put your phone in the cup holder, and talk while you’re driving. Make sure you turn on the microphone before you start driving. Don’t operate your phone while you’re driving.
Better writing.
As a writer myself, I do some of my best writing via transcription. It’s something about the stream of consciousness, less inhibition, writing the way you talk, less lag between thoughts. Your voice is a different creative channel than using your fingers to type.
AI assistance.
It’s helpful to talk out your ideas with someone else because they can ask questions and provide feedback. An AI can do this in real time as you’re transcribing.
A new way of writing?
Before computer keyboards and typing, we wrote with pens on paper.
Today, I would bet that most people write by typing, either on their computers or on their phones.
In the future, will people write by speaking?
Most of our devices have microphones.
Apple Watches have microphones.
You could be doing pretty much anything and just click the button on your watch and start writing.
Initial thoughts on UI
Blank, white web page
Red record button is the most obvious thing to click
Once you hit record, you can start speaking and your words will type out on the screen in real time, starting in the top left of the blank page
At the bottom of the page, buttons to pause or stop the recording
AI assistance
Two options for how the UI will look when the AI interacts with the user:
Commenting: more like how the UI looks when you add a comment on a Google Doc or Notion page
Conversation: more like how the UI looks for ChatGPT or text message conversations
The Commenting option might be better so that the AI can ask questions or give feedback as the user is still talking. A portion of the transcript will be highlighted and an arrow will point from the highlight to the comment to the side of the transcript.
Choose a form of writing
Options:
Journal entry
Blog post
Meditation
Email to a client
Letter to a friend
User might need to select a form of writing before they start recording, if the AI is going to be proactively interacting as the user is speaking. The AI needs to know the form of writing to know how to interact, e.g., if the user is writing a journal entry, the AI can ask questions that encourage the user to go deeper into a personal issue.
A drop-down selection before you click the record button
Editing
Should the AI edit the transcript in real time as the user is still speaking? Or wait until the full transcript is finished and then edit?
When the edits are made, the edited portion of the transcript will be highlighted and an arrow will point from the highlight to a comment to the side of the transcript that shows a log of the original version and the edit version. You can choose to accept or deny each edit.
Sending final draft
Once the transcript is recorded and the editing is done, the final draft will be sent to the user.
Should we require an email address in order to access the final draft?
Options for sending:
Email to yourself
Copy and paste
Download as DOC/PDF/TXT
Name ideas
Talk It Out
Write With Your Voice
Write With Voice
Talk To Writer
Talk To Blog
Speech To Blog
Next steps
Figure out how to build and integrate these four parts:
Record button that gains access to the user’s microphone
Open-source transcription software that will turn audio into text in real time
Open-source AI that will comment on the transcript
Open-source AI that will edit the transcript
Test v1 on one of my existing websites
Buy a domain and launch on a new site
Can you build this?
Let me know if you can!
I’d appreciate any feedback on the next steps I’ve outlined above.
I don’t think I want to fully hand over the building because I want to gain an understanding of how it works.
That being said, my software dev skills are not my strong suit, so I’d appreciate any help I can get! Especially if you have experience with transcription and AI chatbot software.
https://audiopen.ai/ is very similar. It was built by a good friend of mine.