build a semantic text matching engine. ignore the raw video and extract the transcript and look for deep semantic overlap. updagrade to 6GB . support multiple whisper request or create a way to sequence the work rather than in parallel. lets upgrade from whisper-tiny to whisper-base
Marcus
1
zfrika
2 step matching workflow ===================== 1a. audio extraction and transcription (the words). 1b. ASR (automatic speech recognition) model. like OpenAI Whisper. 1c. process the video's audio track and convert the spoken words into clean text string. 2a. text embedding (the meaning) . 2b. generated text string pass through a dedicated text embedding model to output a vector representation of the conversation. use these models: the xenon/all-MiniLM-L6-v2 or Xenova/bge-small-en-v1.5 these models specialize in analyzing sentence and paragraph structure to determine precise semantic alignment. CLIP (Contrastive Language-Image Pre-training)
create a duplicate enviroemtn. (done). 1. move this project to an independent website.