Page 1 of 1

Subtitle language detection bug

Posted: Wed Mar 23, 2016 7:55 am
by tine
I have a problem with almost every external subtitle file that UMS detects the subtitle language wrong.

Subtitle is finnish language but gets detected as Turkish. Doesn't matter if i add .fi.srt or .fin.srt at the end. UMS still detects subtitles as Turkish.
And it screws up my audio/subtitle settings totally.

I attached one subtitle as an example.

Using latest 6.2.0 Java 8.

-tine

Re: Subtitle language detection bug

Posted: Sun Apr 03, 2016 8:46 pm
by tine
Is it really just me with this problem ??

Re: Subtitle language detection bug

Posted: Mon Apr 04, 2016 10:16 am
by valib
This is a problem of the automatic code page detection.We are using the external SW which is not 100% accurate for all languages but it is the best what we can use. If you mostly use subtitles coded in you language code page you can set it at the Subtitles setting page in the ""Non-unicode subtitles encoding" field.

Re: Subtitle language detection bug

Posted: Tue Apr 05, 2016 2:52 am
by tine
valib wrote:This is a problem of the automatic code page detection.We are using the external SW which is not 100% accurate for all languages but it is the best what we can use. If you mostly use subtitles coded in you language code page you can set it at the Subtitles setting page in the ""Non-unicode subtitles encoding" field.
Hi valib,
and thanks for replying to my thread!

I have tried Auto detect, CP1952, ISO-8859-10, but the result is always the same.

I could live with this if UMS would respect ISO 639-1 language code added to the end of subtitle file.
subtitles.fi.srt instead of subtitles.srt
meaning the file would automatically be detected as Finnish.

And what makes it even more insane that this 2-character language code adding worked perfectly atleast in 5.0.1. It has broken somewhere along the way. But 5.0.1 doesn't support the latest and greatest so it's otherwise obsolete.

Re: Subtitle language detection bug

Posted: Tue Apr 05, 2016 8:32 am
by valib
If the ISO-8859-10 doesn't work it is not the UMS problem. If in your debug.log you can find that the ffmpeg command contains

Code: Select all

charenc=ISO-8859-10
than it is properly set by UMS but it seems to me that ffmpeg doesn't support it.
Can you provide some not working subtitles as an example for testing?

EDIT: subtitles.fi.srt should work for the automatic detection. Can you follow the instruction written in red above and provide the log?

Re: Subtitle language detection bug

Posted: Wed Apr 06, 2016 5:19 am
by tine
On the debug.log there is this:

Code: Select all

charenc=ISO-8859-10
you mentioned.

I attached the debug files, one subtitle file that gets detected as Turkish, and a screenshot of my TV where the subtitle is recognized as Turkish.
Even though the file is named subtitle.fi.srt.

If I post these same result from version 5.0.1 there is no problem and the file is recognized as Finnish.
IMG_1320.JPG
IMG_1320.JPG (1.28 MiB) Viewed 10532 times

Re: Subtitle language detection bug

Posted: Wed Apr 06, 2016 9:25 am
by valib
There is a bug in the subtitles detection. I am working on it and it will be included in the next release.

Re: Subtitle language detection bug

Posted: Wed Apr 06, 2016 10:44 am
by Madoka
Hi @valib.

Thanks for working on this. I have had similar issues although my subs are randomly detected as different languages. Also, I noticed that UMS only detects languages from UTF-8 subs. It seems to fail if they are UTF-8-BOM. However, the subs need to be in UTF-8-BOM if there are special characters like curly quotes, M- or N-dashes. If UTF-8, these characters will show up in FFmpeg transcoded video as garbage characters, but if I convert the subs to UTF-8-BOM, then they display ok, but UMS no longer tries or fails to detect the language. Your new commit to use the language in the filename would help as then I can set it manually, but it would be nice if UMS could auto-detect the language. Aegisubs auto sets the subs to UTF-8-BOM when I use it.

Also, FFmpeg does not play embedded vobsub subtitles even if they are tagged as forced. UMS will not defer to MEncoder for these even if "Defer to MEncoder" is checked in the FFmpeg settings tab.

Thanks for reading.

Re: Subtitle language detection bug

Posted: Wed Apr 06, 2016 11:34 pm
by tine
Thanks valib for your effort on this one!

Really appreciated!

-tine