- AI for Business Newsletter
- Posts
- Transcribe audio recordings privately in any language
Transcribe audio recordings privately in any language
Meeting note generation is one of AI’s superpowers. But can you trust an AI start-up with your most sensitive content?

Who can you trust?
The adoption of audio and video recording transcription services like Otter, Fireflies and Fathom is growing strongly, particularly among salespeople and business development executives.
However, it’s not clear what level of privacy these services provide.
If you are a business leader, it may make sense to limit the number of cloud infrastructure companies that have access to your most sensitive data. For example, if you already trust Google Cloud with some of your data, it would be logical to use them for meeting transcription and summarization as well.
While it’s not as convenient as using a specialized service, it’s fairly easy to set up. So, let’s review how you can use Google Cloud to transcribe meeting recordings and save the transcripts and summaries in your own environment.
The meeting summarization agent
The meeting summarization agent runs in the background. It monitors meeting recordings wherever they are stored (e.g., a Google Drive, a Dropbox folder, an Airtable base, or another storage location).
Whenever a new recording file has been added to the location, the agent performs the following tasks:
Uploads the recording (.mp3) file to Google Cloud Storage.
Transcribes the meeting verbatim using Google Cloud’s Speech-to-Text API v1 with diarization, which recognizes different speakers.
Creates a summary of decisions and next steps with Gemini.
Stores the meeting summary, while deleting the recording and transcript for privacy reasons.

Simple implementation steps
Here are the steps to follow to get the agent up and running quickly and easily.
Prerequisite: set up Google Cloud
First, you need to create a Google Cloud project and a service account with the appropriate permissions:
Visit Google Cloud to set up a project and billing.
Visit Google Storage to create a storage bucket. You can choose any name and any region. The default settings are fine. Write down the unique name of the bucket, which will be accessible via the https://storage.cloud.google.com/{bucket_name} URL.
Visit Google Speech-to-Text to select the project and enable the 'Cloud Speech-to-Text API'.
In the sidebar on the left, under IAM & Admin, select Service Accounts and click on Create Service Account. Choose any name, select Cloud Speech Administrator role as well as Storage Object User, and click Done. If you are reusing an existing service account, go to IAM instead to add this role to the service account.
Then, under the newly created service account, go to the Keys tab and click on “Add Key” to create a JSON file containing the service account credentials. Download the JSON file to your computer for future use.
Finally, create a Google Gemini API key by visiting the AI Studio.
Step 1: Upload to cloud storage
As a first step, the agent must upload any new recording to Google Cloud Storage.
Visit this Colab notebook for the full demo in Python.
Step 2: Transcribe
In this step, the agent calls Google Cloud’s Speech-to-Text API v1 with diarization. Each meeting transcription may take 15 minutes or longer.
Visit this Colab notebook for the full demo in Python (same notebook as above).
You can adapt the following script.
Step 3: Summarize
In this step, the agent calls Google Cloud’s Gemini API with an affordable model such as Gemini 2.0 Flash, to request a list of decisions/agreements and a list of next steps.
Here is a summarization prompt that you can use with your favorite LLM API.
prompt = f"""Summarize the following text provided between the <text> tags, by generating a Markdown summary in the format provided between the <example> tags.
The Markdown summary consists of two bullet points, 'summary' and 'next steps', each with less than 10 sub-bullet points.
Under summary, please make sure to list the main agreements and decisions reached.add()
Under next steps, please make sure the list the agreed actions and next steps.
Bullet points must start with a star character.
Sub bullet points must be indented with 4 spaces followed by a * character.
Please do not include any other text in your response, other than the list of bullet points.
<text>
{transcript}
</text>
Here is an illustrative example of the output:
<example>
* Summary
* The meeting participants agreed to pursue business relationships
* Next steps
* The meeting participants agreed to meet again in two weeks.
* They will revert back with names of potential team members within 1 week.
</example>
"""
Visit this Colab notebook for the full demo in Python (same notebook as above).
Takeaway messages
Having an AI meeting note taker at your beck and call is a fantastic time saver. However, if you feel unsure about trusting a 1-year-old startup with your most sensitive data, you can easily build that agent capability in-house using Google Cloud.