Automatic speed for music and speech
I listen to some podcasts with a higher speed. It is very boring to have the music part of it with the non regular speed.
-
mrto13 commented
"I would propose a first simple implementation: many podcast use chapter description with keywords for music. The speed could slow down if a word of a custom keyword list is in the chapter description.
Even false positives are not that critical, or you may even have keyword lists per podcast."
Simple and elegant solution. It would be great to have this functionality. Filtering chapters by words works great. -
Benjamin Beichler commented
I would propose a first simple implementation: many podcast use chapter description with keywords for music. The speed could slow down if a word of a custom keyword list is in the chapter description.
Even false positives are not that critical, or you may even have keyword lists per podcast.
-
Robert Cobourn commented
This feature would be great, but seems like it would be resource intensive. If it's doable.. well, my votes are in.
-
SkorpEN commented
In programming words: When music present in podcast, then player speed should be 1.0x. This also could be optional, but most user will want to hear music in normal speed even when player speed is set to other values.
-
Ben commented
I'll also say that I've been thinking about this because the sound quality of sped-up speech is adequate, whereas the timewarp algorithm does a ghastly job for music. Another approach might be to see what the state of the art is in timewarping and see if music quality could be brought up to par, but there are other (e.g. artistic / psychoacoustical) reasons to default to playing music at the speed at which it was recorded.
-
Ben commented
YES!
First thing I'd try is apply-and-then-reverse the playback speed adjustment and measure sum of squared error or something... more sophisticated metrics would be better.
Might be easier, and computationally cheaper in realtime, to train networks (or find some extant package) to recognise speech vs nonspeech-sounds, and set speed factor to 1 for the latter.