About Transcriptions
The Transcriptions functionality is an offshoot of SPS By The Numbers, a data oriented site about Seattle Public Schools. The site was created because there was a huge information disparity (and therefore power disparity in advocacy) between those who had time and energy to watch board meetings and those who did not.
The first version was a simple adaptation by Albert J Wong of the Majdoddin&s collab example for using Whipser and Pyannote speaker diarization to transcribe Youtube videos into a diarized, clickable, HTML transcript that would jump the video to the word clicked.
Then with the help of Mark Verrey, this was extended to include:
- timestamped URLs that can be used as citations
- names and tags for speakers
- translations for the transcripts using Google Translate
- Meilisearch integration for full text search.
- responsive layout with MUI Material UI
Special shout out to Joseph Fromel who cleaned up the basic docker setups and prototyped the cloud function based vast.ai instance management which allowed for transcriptions to happen automatically each night.
The functionality is general enough that it can be applied to any meetings that are published on Youtube. The backend code has been specifically engineered to be highly cacheable and low cost. The frontend fits comfortable in the free tier of Firebase serving. The automated transcription pipeline use Google Cloud Scheduling to trigger works and then Vast.ai for cheap machines that have lots of cores (the bottleneck is diarization which is limited by number fo CPU cores) and GPU memory (the whisper v3 transcription model is large). For the Seattle City Council and Seattle Public Schools Board meetings, this comes out to ~$50 a year of spending for the past couple of years (with the exception of getting crawled by Meta or Google)
The code is on github. It should be possible to download, modify the values on config/constants.ts to fit your site, and then create a Firebase Host App using the code so that you can have translated public meetings for your municipality.
For questions, email sps.by.the.numbers@gmail.com.