zlacker

It doesn't make sense to have a language-restricted transcription model because of code switching. People aren't machines, we don't stick to our native languages without failure. Even monolingual people move in and out of their native language when using "borrowed" words/phrases. A single-language model will often fail to deal with that.

replies(2): >>javier+Y6 >>janals+W21

>>popalc+(OP)
yeah, one example I run into is getting my perplexity phone assistant to play a song in spanish. I cannot for the life of me get a model to translate: "Play señorita a mi me gusta su style on spotify" correctly

>>popalc+(OP)
Everything is a tradeoff, and different use cases require different tradeoffs:

Option A: this model

Option B: faster model, only 1 language

Option C: same size model, only 1 language but higher quality

My point is that option A isn’t always best.

And on the borrowed words bit, there’s no rule that we cannot add borrowed words into the vocab. But you don’t need the whole language. I know what deja voux means but I don’t speak French.

replies(1): >>popalc+hA1

>>janals+W21
that depends entirely on how common the borrowed thing is. And anyway, option A is always going to be insufficient for my code-switching example -- as another commenter pointed out, it is very common to want to refer to a foreign work (song, movie, book) by its foreign language title. Monolingual ASR solutions break over this all the time. Try asking Alexa to play a Spanish language track on Spotify. It fails frequently.

The real world is like that.