Aumiqx

Meet Buddy: Open Source AI Meeting Assistant That Doesn't Record Your Audio

We built an open source meeting transcription tool that captures Google Meet captions without audio recording, syncs to GitHub, and feeds into Claude Code for AI-powered analysis — no APIs, no SaaS subscriptions, no cloud dependency. Here's how and why.

Behind the Build|Axit @ Aumiqx||18 min read
meet buddyai meeting assistantopen source meeting transcription

Why We Built Another AI Meeting Assistant (And Why It's Different)

There are dozens of AI meeting assistants in 2026 — Otter.ai, Fireflies.ai, Fathom, tl;dv, Tactiq, Final Round AI. They all do roughly the same thing: join your call as a bot, record audio, transcribe it with Whisper or a similar model, and give you a summary.

They work. But they have three problems that developers specifically hate:

  1. They record audio. That's a privacy issue in regulated industries, with cautious clients, and in cultures where recording consent is nuanced. India, where we're based, runs largely on trust-based business relationships — dropping a recording bot into a call changes the dynamic.
  2. They live in their own SaaS dashboard. Your meeting notes are in Otter's cloud. Your code is in VS Code. Your tasks are in Linear or GitHub Issues. Three separate places for information that should flow together.
  3. They don't connect to your development workflow. You get a transcript. Then you manually extract action items. Then you manually create tickets. Then you manually search your codebase for relevant code. Every "manually" is a leak.

We wanted an open source AI meeting assistant that solves all three. No audio recording. Data goes to GitHub. And it feeds directly into Claude Code — the same tool we already use for development — through the Model Context Protocol (MCP).

The result is Meet Buddy: a Chrome extension that captures live Google Meet captions, pushes them to a GitHub repo you own, and exposes the data to Claude Code through an MCP server — where AI agents can analyze the transcript, search your codebase for solutions, and generate implementation plans while the meeting is still fresh.

No API keys. No cloud transcription service. No monthly subscription. Just a Chrome extension, a Git repo, and the AI coding tool you already have open.

Meet Buddy vs Otter.ai vs Fireflies vs Fathom: How It Compares

Let's be honest about what Meet Buddy is and isn't. It's not a replacement for Otter.ai if you're a sales team that needs CRM integration and speaker analytics across 500 calls. It's built for developers and small teams who want their meeting data in their existing workflow — not a separate dashboard.

FeatureMeet BuddyOtter.aiFirefliesFathomMeetily
Audio recordingNo (captions only)YesYesYesYes (local)
Open sourceYes (MIT)NoNoNoYes
Data storageYour GitHub repoOtter cloudFireflies cloudFathom cloudLocal disk
Developer workflowMCP + Claude CodeAPIAPINoNo
AI agent analysis5-agent swarmBuilt-in summaryBuilt-in summaryBuilt-in summaryLLM summary
Monthly cost$0 (self-hosted)$8-40/mo$15-39/mo$29/mo (teams)$0 (self-hosted)
Works without internetNo (needs GitHub)NoNoNoYes
Google Meet supportYesYesYesYesYes
Zoom/Teams supportNot yetYesYesYesYes

The closest open source alternative is Meetily (formerly meetily.ai) — a self-hosted meeting transcription tool using local Whisper models. It's excellent for privacy-first audio transcription. But it doesn't connect to your development workflow. Your transcript lives in Meetily's UI, not in your GitHub repo or Claude Code session.

TranscripTonic is another open source Chrome extension that captures Google Meet captions — similar to our approach. But it downloads transcripts as files. It doesn't push to GitHub, doesn't have an MCP server, and doesn't feed into AI agents.

Meet Buddy's unique angle: the entire pipeline from meeting to code is automated. Caption → GitHub → MCP → Claude Code → agent swarm → implementation plan. No copy-pasting. No switching tabs. No "let me check my meeting notes."

How the Open Source Meeting Transcription Pipeline Works

The architecture is deliberately simple. Data flows one direction — from Google Meet to your code editor — through four components, each independently useful:

1. Chrome Extension (Manifest V3)

The extension runs a content script on meet.google.com that polls the DOM every 500ms for caption elements. Google Meet renders captions as obfuscated div elements — class names like nMcdL and ygicle that change periodically. The scraper uses these as primary selectors with a structural fallback that finds captions by screen position and DOM pattern (bottom 40% of viewport, contains avatar image + name span + text).

Captions go through a 4-second stabilization window to deduplicate. Google Meet updates captions in-place as the speaker talks — "I think" becomes "I think we should" becomes "I think we should focus on the API." Without dedup, you'd get 7 entries for one sentence. With it, you get one clean line after the speaker pauses.

A UI junk filter strips out toolbar labels that Google Meet renders as text in the same DOM region as captions: "frame_person", "more_vert", "backgrounds and effects" — caught by a blocklist and a camelCase heuristic.

2. GitHub Sync (Contents API)

The service worker batches caption chunks and pushes to your GitHub repo every 15 seconds via the Contents API. Each meeting gets its own folder:

meetings/2026-03-19-client-call/
  ├── meta.json           (title, start/end time, word count)
  ├── transcript.md       (timestamped speaker + text)
  └── screenshots/
      ├── 001-architecture.jpg    (65% JPEG, ~70KB)
      └── 002-error-screen.jpg

Authentication uses OAuth Device Flow — you enter a code on github.com once, and the extension has push access to all your repos. No PAT management, no per-org installation.

3. MCP Server (TypeScript)

A stdio-transport MCP server that exposes 7 tools to Claude Code: meeting_list, meeting_active, meeting_transcript, meeting_screenshots, meeting_meta, meeting_notes, meeting_sync. The meeting_sync tool does a sparse Git checkout — only the meetings/ folder, not your entire repo.

The meeting_screenshots tool returns base64 image data, so Claude Code can actually view the screenshots in the conversation. During our test, Claude described a screenshot showing "Earth Clique's avatar, captions at the bottom, Meet Buddy overlay showing Recording + 62 words" — it was reading the JPEG we'd just captured.

4. Agent Swarm (claude-flow)

Five specialized agents coordinated through shared memory:

  • Watcher — polls the GitHub repo, stores new transcript chunks in shared memory
  • Analyst — extracts pain points, action items, emotional signals, unanswered questions
  • Code Reviewer — maps identified problems to existing code in your project
  • Brainstormer — generates feature ideas and creative solutions based on the discussion
  • Planner — creates a prioritized implementation plan with specific file paths and estimated effort

You come back from the meeting. You say "what did you find?" You get a full report. You say "implement." Claude starts coding.

Why We Didn't Use Any Transcription API — And Why That Matters

Every meeting transcription tool in the market either records audio and runs it through Whisper/Deepgram/AssemblyAI, or uses a meeting bot service like Recall.ai. That means your audio goes to a server — theirs or yours.

Meet Buddy doesn't touch audio at all.

We use Google Meet's own built-in caption feature. Google is already transcribing the audio for the captions you see on screen. We just read those captions from the DOM. No audio capture. No speech-to-text API. No server processing.

This has real consequences:

  • Privacy — No audio recording means no recording consent issues. In India, where we operate, client calls often involve sensitive business discussions. Telling someone "we're recording this" changes the conversation. Reading captions doesn't.
  • Cost — Zero API costs. Whisper API is $0.006/min, Deepgram is $0.0043/min. A 30-minute daily standup costs $4-5/month per meeting. We pay nothing because Google is doing the transcription anyway.
  • Simplicity — No audio pipeline to maintain. No Whisper model to host. No WebSocket connections for streaming audio. Just a content script that reads the DOM.
  • Speed — Captions appear in real-time. There's no "processing your recording" delay. The transcript is available the moment the words are spoken.

The tradeoff: we depend on Google Meet's caption quality. It's good but not perfect — proper nouns get mangled, heavy accents reduce accuracy, and it doesn't handle code-switching (mixing Hindi and English in the same sentence) as well as dedicated multilingual models. Our test session in Hinglish produced readable but imperfect transcripts.

For developer meetings where the goal is capturing requirements, decisions, and action items — not legal-grade transcription — it's more than good enough.

The Claude Code Advantage: Your AI Already Knows Your Codebase

Here's the insight that makes Meet Buddy different from every other meeting tool: Claude Code already has your project context.

When you're working in Claude Code, it knows your file structure, your tech stack, your recent changes, your CLAUDE.md instructions. When a client says "the checkout page is broken on mobile," Claude Code can immediately search your codebase for the checkout component, check recent git changes, and draft a fix.

Traditional meeting tools give you a transcript. Then you switch to your IDE. Then you search for relevant code. Then you write tasks. Every step is manual.

With Meet Buddy + MCP, the flow is:

  1. Client mentions a problem on the call
  2. Meet Buddy captures it as text, pushes to GitHub
  3. MCP server makes it available to Claude Code
  4. Claude Code — which is already open with your project loaded — can immediately search your codebase for the relevant code
  5. The agent swarm generates an implementation plan with specific file paths and line numbers

No context switching. No "let me find that in the codebase." The meeting flows directly into development because the same AI tool handles both.

This is only possible because we built Meet Buddy as an MCP server, not a standalone SaaS app. The Model Context Protocol lets Claude Code call meeting_transcript the same way it calls git log or reads a file — it's a native part of the development workflow.

We Built It and Tested It in the Same Session — Here's What Actually Happened

Here's the thing about build stories — most of them are written after the fact, cleaned up, and made to sound smooth. This one happened in real-time, with real bugs, real frustration, and real conversations captured by the tool itself.

We built Meet Buddy in a single Claude Code session — about 3 hours from idea to working prototype. Then, without closing the session, we jumped on a Google Meet call with a friend (Earth Clique) to test it live.

The first thing that happened? The extension showed 0 words. The caption scraper couldn't find Google Meet's caption container. We opened Chrome DevTools mid-call, inspected the DOM, found the exact class names (nMcdL, NWpY1d, ygicle), updated the scraper code, reloaded the extension — all while still on the call. That's what building with Claude Code looks like: you don't stop the conversation, you fix the code in parallel.

The second thing: deduplication was completely broken. Google Meet updates captions word by word. Our v1 scraper captured every intermediate state. One sentence — "Hello, can you hear me?" — generated 8 transcript entries. We rewrote the dedup logic three times during the call. Third time worked.

Then the auth issue. We'd set up a GitHub App for authentication. It worked — but only showed repos where the app was installed. "Bro, why aren't my aumiqx repos showing here?" We switched to an OAuth App mid-call. All repos appeared. Lesson: GitHub Apps are for specific installations; OAuth Apps are for user-level access across all repos.

After fixing all three issues, the word count started climbing: 90 words... 210... 500... 704 words and 1 screenshot by the 5-minute mark. Data was flowing from Google Meet → Chrome Extension → GitHub → and Claude Code could read it through the MCP server.

Then the meta moment happened. We asked Claude to analyze the transcript mid-call. Claude reported: "Earth Clique is talking about a PHP error — request too large, max 20 MB." It was right. Earth Clique had mentioned a PHP upload limit on another project. Claude caught it from the live transcript while we were still talking.

We read Claude's analysis back to Earth Clique. Google Meet transcribed us reading the analysis. The transcription got pushed to GitHub. Claude analyzed its own analysis being discussed. Earth Clique's response to this recursion:

"Apni poochh, khud hi khaega kyon?" — Will it eat its own tail?

That became the project's unofficial tagline.

The results from our live test:

MetricSession 1 (24 min)Session 2 (8 min)
Words captured1,415906
Screenshots20
Transcript lines3635
GitHub pushes~15~8
Agent reports generated4 (analysis, implementation plan, code solutions, blog post)

We fixed three major bugs during the live call: the caption scraper not finding Google Meet's DOM elements (solved by inspecting the DOM during the call), the word-by-word deduplication fiasco (rewrote the stabilization logic three times), and the GitHub App auth issue (switched to OAuth mid-call).

The recursion got real. We read Claude's analysis back to our friend on the call. Google Meet transcribed us reading Claude's analysis. That transcription got pushed to GitHub. Claude then analyzed its own analysis being discussed.

Our friend's response to this recursion:

"Apni poochh, khud hi khaega kyon?" — Will it eat its own tail?

That became the project's unofficial tagline. The tool that tests itself, analyzes itself, and improves itself through its own pipeline.

What Didn't Work and What We're Fixing

We're not going to pretend it was flawless. Here's what broke:

Long-speech deduplication

Google Meet keeps one caption block for continuous speech and appends to it. A 30-second monologue generates one growing block that gets captured at every flush. The 4-second stabilization window helps for sentence-level speech, but long, uninterrupted talking still produces partial duplicates. We're adding a minimum delta threshold and smarter prefix diffing.

Buffer loss on session end

The caption buffer flushes every 15 seconds. If you click "End Session" 5 seconds after the last flush, those 5 seconds of captions are lost. We've added a forced flush on session end with a 2-second delay before finalizing.

Real-time latency

Google Meet captions update in milliseconds. Our pipeline has inherent latency: 4s stabilization + 15s buffer + GitHub API roundtrip + git pull on the Claude Code side. Total: ~20-30 seconds from spoken word to Claude seeing it. Fine for post-meeting analysis. Not fine for live-during-the-call AI suggestions.

Agent coordination

Claude Code sessions are built for human interaction, not autonomous agent orchestration. Our agents ran as background tasks that completed and stopped instead of looping for the full meeting duration. The fix: event-driven architecture where agents spawn on-demand when new data arrives, rather than trying to keep them alive for 30 minutes.

The honest summary: Meet Buddy v1 is a working prototype that proves the concept. It captured 2,300+ words across two sessions, generated a 466-line implementation plan, and identified every pain point discussed — all without recording a single second of audio. The infrastructure needs work. The pipeline is solid.

During the call, I told Claude straight up: "Claude sessions are for humans, not for automations, but we are doing it just for the jugaad purposes." ("Jugaad" is a Hindi word for a clever hack.) Claude agreed. The response was honest: "I was reactive when I should have been proactive. You had to keep prompting me to check things, spawn agents, fix issues. The whole point of this tool is that I work autonomously while you're on the call — and instead you were babysitting me AND doing the call."

That kind of self-awareness from an AI is why we use Claude Code. It doesn't pretend things worked when they didn't. It identifies its own failures and proposes fixes. That's a real collaboration, not a prompt-and-pray workflow.

How to Set Up Meet Buddy (5 Minutes)

Step 1: Clone and load the Chrome extension

git clone https://github.com/aumiqx/meet-buddy.git
cd meet-buddy/extension/icons && bash generate-icons.sh

Open Chrome → chrome://extensions → Developer mode → Load unpacked → select the extension/ folder.

Step 2: Create a GitHub OAuth App

Go to github.com/settings/developers → OAuth Apps → New. Name it "Meet Buddy", set homepage to your URL, callback to https://github.com, and check Enable Device Flow. Copy the Client ID.

Step 3: Authenticate

Click the Meet Buddy icon in Chrome → paste the Client ID → click Authenticate → enter the device code on GitHub. Done — the extension now has push access to all your repos.

Step 4: Build and connect the MCP server

cd meet-buddy/mcp-server && npm install && npm run build

Add to your Claude Code .mcp.json:

{
  "mcpServers": {
    "meet-buddy": {
      "command": "node",
      "args": ["/path/to/meet-buddy/mcp-server/dist/index.js"]
    }
  }
}

Step 5: Use it

Join a Google Meet call → enable captions (CC button or press c) → click Meet Buddy → select your repo → Start Session. When the call ends, ask Claude Code: "Sync and analyze the latest meeting."

The MCP server will pull the transcript, Claude will read it, and you'll get a structured analysis — pain points, action items, and code-mapped solutions — without having recorded a single second of audio.

Open Source Roadmap: What's Coming in v2 and v3

Meet Buddy is MIT licensed and open source. Here's what's planned:

v2 — Real-Time Infrastructure (Q2 2026)

  • WebSocket/SSE transport replacing git polling for sub-second transcript delivery
  • Browser-based agent dashboard — monitor all running agents, restart dead ones, send commands without leaving the meeting
  • chokidar filesystem watcher in the MCP server for near-real-time updates to Claude Code
  • Auto-start recording when joining a Google Meet call (opt-in setting)
  • Immediate buffer flush on session end — no more lost captions

v3 — Platform Expansion (Q3-Q4 2026)

  • Zoom and Microsoft Teams support — different DOM structure, same MCP pipeline
  • Speaker diarization with time tracking and talk-time analytics
  • Auto-action items → individual GitHub Issues created from sentences starting with "we should", "let's", "can you"
  • Canvas-based screenshot annotation — draw arrows, highlight regions, add labels before capture
  • Offline-first mode — buffer everything locally, sync to GitHub when connectivity returns

The core extension + MCP server will always be free and open source. We may offer a managed version with the agent dashboard, team analytics, and enterprise integrations — but the pipeline from meeting to code will never be paywalled.

Meet Buddy is MIT licensed. Star it, fork it, break it, fix it. Contributions welcome — especially if you know Zoom or Teams' DOM structure.

Key Takeaways

  1. 01Meet Buddy is the only open source AI meeting assistant that doesn't record audio — it uses Google Meet's built-in captions instead
  2. 02Data flows from Google Meet → Chrome Extension → GitHub → MCP Server → Claude Code, keeping everything in your development workflow
  3. 03Unlike Otter.ai, Fireflies, or Fathom, Meet Buddy stores data in your GitHub repo — not a third-party cloud service
  4. 04The MCP server gives Claude Code direct access to transcripts and screenshots, enabling AI analysis within your existing IDE
  5. 05Built, tested, and debugged in a single session — the tool analyzed its own test call and generated a 466-line implementation plan

Frequently Asked Questions

Related Guides