π¨π³ ζ₯ηδΈζζζ‘£
β οΈ This project is under active development and may be subject to changes.
π Live Demo: https://voice-memo-ai-phi.vercel.app/
- Overview
- High-Level Architecture
- Architecture
- Components
- Workflow
- Prerequisites
- Deployment
- Technologies Used
- Demo
- Contributors
VoiceMemo AI is a cloud-native system designed to automate the process of transcribing, summarizing, and generating reports from audio recordings. It delivers an end-to-end pipeline from user input to structured insights, reducing manual effort and operational cost.
Organizations waste a significant portion of time and resources transcribing and summarizing audio manually. In some cases, up to 70% of time is consumed by these tasks. For instance, call centers and medical institutions struggle with the high cost and complexity of manual processing.
VoiceMemo AI solves these issues by offering an automated, scalable workflow powered by Azureβs cloud and AI services. It transforms unstructured voice data into measurable, actionable insights.
flowchart TD
%% Styling with elegant colors but clearer connections
classDef user fill:#e3f2fd,stroke:#90caf9,color:#1e88e5,stroke-width:1px
classDef processor fill:#f3e5f5,stroke:#ce93d8,color:#8e24aa,stroke-width:1px
classDef reports fill:#fff8e1,stroke:#ffe082,color:#ffa000,stroke-width:1px
classDef stakeholder fill:#e8f5e9,stroke:#a5d6a7,color:#43a047,stroke-width:1px
classDef inputs fill:#e1f5fe,stroke:#81d4fa,color:#0288d1,stroke-width:1px
classDef outputs fill:#fce4ec,stroke:#f48fb1,color:#d81b60,stroke-width:1px
%% Nodes with minimal styling
User(["User"])
AudioProcessor["Audio Processor"]
Reports["Analysis Reports"]
%% Stakeholder nodes
Management["Management"]
DecisionMakers["Decision Makers"]
Compliance["Compliance Teams"]
Sales["Sales Teams"]
CustomerService["Customer Service"]
%% Input categories
MeetingRecordings["Meeting Recordings"]
CustomerCalls["Customer Calls"]
FieldNotes["Field Notes"]
Interviews["Interviews"]
%% Output categories
ActionItems["Action Items"]
KeyInsights["Key Insights"]
Summaries["Meeting Summaries"]
Analysis["Sentiment Analysis"]
%% Main flow with clearer connections
User -->|"Upload"| AudioTypes
subgraph AudioTypes["Audio Input Types"]
MeetingRecordings
CustomerCalls
FieldNotes
Interviews
end
AudioTypes -->|"Process"| AudioProcessor
AudioProcessor -->|"Generates"| Reports
Reports --> OutputTypes
subgraph OutputTypes["Report Types"]
ActionItems
KeyInsights
Summaries
Analysis
end
OutputTypes -->|"Used by"| Stakeholders
subgraph Stakeholders["Business Stakeholders"]
Management
DecisionMakers
Compliance
Sales
CustomerService
end
%% Apply styles
class User user
class AudioProcessor processor
class Reports reports
class Management,DecisionMakers,Compliance,Sales,CustomerService stakeholder
class MeetingRecordings,CustomerCalls,FieldNotes,Interviews inputs
class ActionItems,KeyInsights,Summaries,Analysis outputs
%% Style subgraphs
style AudioTypes fill:#fafafa,stroke:#e0e0e0,stroke-width:1px
style OutputTypes fill:#fafafa,stroke:#e0e0e0,stroke-width:1px
style Stakeholders fill:#fafafa,stroke:#e0e0e0,stroke-width:1px
%% Make all connection lines darker and more visible
linkStyle default stroke:#555555,stroke-width:1.5px
- π₯ Medical Appointment Summarization
- π Social Worker Session Documentation
- π Call Center Quality Analysis
- βοΈ Legal Proceedings Transcription
- π Academic Interview Analysis
- πΌ Business Meeting Summary Generation
VoiceMemo AI uses the following Azure services:
- Azure Static Web Apps: User interface for uploading and viewing reports.
- Azure App Service: Backend logic for handling user and audio workflows.
- Azure Blob Storage: Stores raw audio, transcripts, and results.
- Azure Functions: Event-driven backend for audio processing.
- Azure Speech-to-Text API: Transcribes audio recordings.
- Azure OpenAI GPT-4o: Generates summaries from transcribed text.
- CosmosDB (Serverless): Stores metadata, prompts, and logs.
flowchart TD
%% Styling with softer colors but clear connections
classDef user fill:#e3f2fd,stroke:#90caf9,color:#1e88e5,stroke-width:1px
classDef webApp fill:#e8f5e9,stroke:#a5d6a7,color:#43a047,stroke-width:1px
classDef storage fill:#f3e5f5,stroke:#ce93d8,color:#8e24aa,stroke-width:1px
classDef function fill:#fff8e1,stroke:#ffe082,color:#ffa000,stroke-width:1px
classDef ai fill:#e1f5fe,stroke:#81d4fa,color:#0288d1,stroke-width:1px
classDef db fill:#fce4ec,stroke:#f48fb1,color:#d81b60,stroke-width:1px
%% Nodes with minimal styling, following original layout
User(["User"])
WebApp["Frontend Web App"]
AppService["Backend App Service"]
Blob["Blob Storage
(Recordings)"]
Function["Azure Function"]
Speech["Speech-to-Text"]
Transcripts["Blob Storage
(Transcripts)"]
Cosmos["CosmosDB"]
GPT["Azure OpenAI GPT-4o"]
Results["Blob Storage
(Results)"]
%% Connections exactly like the original
User -->|"Upload Audio"| WebApp
WebApp -->|"Send File"| AppService
AppService -->|"Store File"| Blob
Blob -->|"Trigger"| Function
Function -->|"Transcribe"| Speech
Speech -->|"Return Text"| Function
Function -->|"Store Transcript"| Transcripts
Function -->|"Get Prompt"| Cosmos
Function -->|"Summarize"| GPT
GPT -->|"Return Summary"| Function
Function -->|"Store Summary"| Results
Function -->|"Update Status"| Cosmos
WebApp -->|"Fetch Reports"| AppService
AppService -->|"Fetch Metadata"| Cosmos
User -->|"View Reports"| WebApp
%% Apply styles
class User user
class WebApp,AppService webApp
class Blob,Transcripts,Results storage
class Function function
class GPT,Speech ai
class Cosmos db
%% Make all connection lines darker and more visible
linkStyle default stroke:#555555,stroke-width:1.5px
- Azure Static Web App (Frontend) β React app for UI and interactions.
- Azure App Service (Backend) β FastAPI service for auth, file ops, and DB access.
- Azure Blob Storage β Stores audio files and processed results.
- Azure Functions β Event-driven transcription and summarization handler.
- Azure Speech-to-Text β Converts speech to raw text.
- Azure GPT-4o β Summarizes the transcription.
- CosmosDB β Tracks job metadata, prompts, and results.
- User uploads audio via the web interface.
- File stored in Blob β triggers Azure Function.
- Function:
- Submits transcription β polls for completion.
- Stores transcribed text in Blob.
- Retrieves prompt from CosmosDB.
- Sends to GPT-4o for summarization.
- Saves result in Blob β updates CosmosDB.
- Frontend fetches and displays results.
- Azure subscription with access to:
- Static Web Apps
- App Service
- Blob Storage
- Functions
- CosmosDB (Serverless)
- Speech-to-Text API
- Azure OpenAI GPT-4o
- Node.js & Python for local dev
- Deploy frontend via Azure Static Web Apps.
- Set up App Service for backend (FastAPI).
- Configure Blob Storage:
recordings/
results/
- Create Azure Function with Blob Trigger.
- Integrate Speech-to-Text & GPT-4o APIs.
- Use CosmosDB to manage metadata and prompts.
- Frontend: React + Azure Static Web Apps
- Backend: FastAPI + Azure App Service
- Storage: Azure Blob
- Processing: Azure Functions
- AI Services: Azure Speech, OpenAI GPT-4o
- Database: CosmosDB Serverless
Name | |
---|---|
Rui Tao |
Feel free to submit a PR if you'd like to contribute!