4 Comments
User's avatar
lake's avatar

interesting idea. i am pretty sold on the concept of a transcription assistant that will also help point out inconsistent logic and/or ask questions to help extract the essence of the idea. but i would need that on-demand, when the motivation is there. scheduling a call would be a dealbreaker for me

fwiw i think using a purpose built AI for this is a phenomenal use case to allow scalability. basically just otter, but with some live feedback. I’m quite confident this is possible using the current state of open source technology

as an aside, i’ve been doing transcription based notes for a while at your recommendation. i will talk it out for two or three sessions, then use the transcription of the last one as my outline for writing, since it’s normally much more cohesive

Expand full comment
Cole Feldman's avatar

Great feedback re: on-demand vs. scheduling a call. You might want to check out Pi. I think the website is just pi.ai. It's an AI chatbot (like ChatGPT) but more conversational. You can even turn on the speaker mode and talk to it using your microphone and it will talk back through your speakers. There's an app for your phone. Further, you could probably prompt it with something like: "Please point out the inconsistent logic in what I'm about to say." Or, "Please ask intelligent follow-up questions to help me explore this idea."

I really like what you're thinking in terms of a purpose-built AI. That would make it way more scaleable, you're right. Basically just a fully software product instead of a service that requires human attention. Assuming there's open source transcription software, that would get us to the point of transcribing a users's audio into words. Then these would be the remaining steps that we would need to build:

1. Prompting the AI to (a) ask follow-up questions and/or (b) give feedback, e.g., pointing out inconsistent logic.

2. Prompting the AI to edit the transcript after the "call" is complete, e.g., formatting errors, spelling, and grammar (no changes to the originally spoken words).

I'm imaging the user interface as a website with a blank page and a record button at the bottom. Once you hit record, you can start speaking and your words will type out in front of you. When you pause for longer than 5 seconds, the AI will ask a question or give feedback. When you're done, you click a button next to the record button that says "Done." There's also an option to pause in the middle of recording and then come back and restart. But after you click "Done," the transcript will be edited. You can see the editing taking place in real time. Maybe we should show a log of the edits with pop-up comments next to each edit that show the (a) original transcription and (b) edited transcription. You can choose to accept or deny each edit. Once the editing is done, you can copy/paste, download as DOC/PDF/TXT, or email to yourself.

How can we easily build this?

I know OpenAI announced GPTs, which let you customize ChatGPT for a specific purpose: https://openai.com/blog/introducing-gpts

We could probably customize a GPT to ask questions and give feedback. We could also customize it to put the full conversation in one transcript and edit it. But I don't think it will help with the audio part. Like you could do our full use case if you were just typing. But I think the whole point is for the user to be able to talk. It should feel like you're having a conversation with a really good listener.

How do we build that audio transcription part?

We need an open-source transcription software. Something that will hear the audio and turn it into words in real time. We also want the words to type out on the page of the website for the user to see in real time.

So maybe the audio transcription software is separate from the software that asks questions, gives feedback, and edits the transcript?

Lake - I know you've done some work with the open-source AI stuff. Would appreciate your guidance here.

Love that you've been doing transcription based notes to outline your writing. That would be a great use case too. Actually, I think that might be the best use case to start with. Because "asking questions and giving feedback" is broad, but it's more specific if the focus is to transcribe audio with the goal of creating a draft for writing.

Expand full comment
lake's avatar

isn’t that just what pi.ai does?

Expand full comment
Cole Feldman's avatar

Yea, pretty much, but with a few things missing:

- You need to prompt Pi to interact with the user in the way we've described above. And it would be better if you're specific ... Ask me follow-up questions with the goal of letting me talk. Point out logical inconsistencies in what I'm about to say.

- The UI isn't great for exporting the conversation. For example, for the writing draft use case, you would have to copy and paste the conversation and then re-format it. Actually, now that I think about it, I wonder if you could just prompt the AI: "Now write a 500-word blog post based on everything we just talked about."

- I don't think Pi is designed for the user to be doing the majority of the talking. Like if you talk for 5 minutes straight, is the Pi UI can capture every word you say and transcribe it and put it on the screen?

- Even if Pi does transcribe your 5-minute monologue, the transcription won't be perfect, which is where the editing feature becomes valuable.

Expand full comment