Subtitle language detection bug

For help and support with Universal Media Server
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Post Reply
tine
Posts: 11
Joined: Sat Feb 02, 2013 6:52 am

Subtitle language detection bug

Post by tine »

I have a problem with almost every external subtitle file that UMS detects the subtitle language wrong.

Subtitle is finnish language but gets detected as Turkish. Doesn't matter if i add .fi.srt or .fin.srt at the end. UMS still detects subtitles as Turkish.
And it screws up my audio/subtitle settings totally.

I attached one subtitle as an example.

Using latest 6.2.0 Java 8.

-tine
Attachments
The.Intern.2015.REMUX.1080p.BluRay.AVC.DTS-HD.MA.5.1-iFT.fi.rar
(42.78 KiB) Downloaded 648 times
tine
Posts: 11
Joined: Sat Feb 02, 2013 6:52 am

Re: Subtitle language detection bug

Post by tine »

Is it really just me with this problem ??
User avatar
valib
Developer
Posts: 699
Joined: Fri Feb 08, 2013 3:11 am

Re: Subtitle language detection bug

Post by valib »

This is a problem of the automatic code page detection.We are using the external SW which is not 100% accurate for all languages but it is the best what we can use. If you mostly use subtitles coded in you language code page you can set it at the Subtitles setting page in the ""Non-unicode subtitles encoding" field.
tine
Posts: 11
Joined: Sat Feb 02, 2013 6:52 am

Re: Subtitle language detection bug

Post by tine »

valib wrote:This is a problem of the automatic code page detection.We are using the external SW which is not 100% accurate for all languages but it is the best what we can use. If you mostly use subtitles coded in you language code page you can set it at the Subtitles setting page in the ""Non-unicode subtitles encoding" field.
Hi valib,
and thanks for replying to my thread!

I have tried Auto detect, CP1952, ISO-8859-10, but the result is always the same.

I could live with this if UMS would respect ISO 639-1 language code added to the end of subtitle file.
subtitles.fi.srt instead of subtitles.srt
meaning the file would automatically be detected as Finnish.

And what makes it even more insane that this 2-character language code adding worked perfectly atleast in 5.0.1. It has broken somewhere along the way. But 5.0.1 doesn't support the latest and greatest so it's otherwise obsolete.
User avatar
valib
Developer
Posts: 699
Joined: Fri Feb 08, 2013 3:11 am

Re: Subtitle language detection bug

Post by valib »

If the ISO-8859-10 doesn't work it is not the UMS problem. If in your debug.log you can find that the ffmpeg command contains

Code: Select all

charenc=ISO-8859-10
than it is properly set by UMS but it seems to me that ffmpeg doesn't support it.
Can you provide some not working subtitles as an example for testing?

EDIT: subtitles.fi.srt should work for the automatic detection. Can you follow the instruction written in red above and provide the log?
tine
Posts: 11
Joined: Sat Feb 02, 2013 6:52 am

Re: Subtitle language detection bug

Post by tine »

On the debug.log there is this:

Code: Select all

charenc=ISO-8859-10
you mentioned.

I attached the debug files, one subtitle file that gets detected as Turkish, and a screenshot of my TV where the subtitle is recognized as Turkish.
Even though the file is named subtitle.fi.srt.

If I post these same result from version 5.0.1 there is no problem and the file is recognized as Finnish.
IMG_1320.JPG
IMG_1320.JPG (1.28 MiB) Viewed 10530 times
Attachments
Room.2015.1080p.BluRay.x264.DTS-WiKi.fi.zip
(25.81 KiB) Downloaded 513 times
ums_dbg.zip
(98.64 KiB) Downloaded 686 times
User avatar
valib
Developer
Posts: 699
Joined: Fri Feb 08, 2013 3:11 am

Re: Subtitle language detection bug

Post by valib »

There is a bug in the subtitles detection. I am working on it and it will be included in the next release.
User avatar
Madoka
Posts: 329
Joined: Fri Jun 01, 2012 12:51 pm

Re: Subtitle language detection bug

Post by Madoka »

Hi @valib.

Thanks for working on this. I have had similar issues although my subs are randomly detected as different languages. Also, I noticed that UMS only detects languages from UTF-8 subs. It seems to fail if they are UTF-8-BOM. However, the subs need to be in UTF-8-BOM if there are special characters like curly quotes, M- or N-dashes. If UTF-8, these characters will show up in FFmpeg transcoded video as garbage characters, but if I convert the subs to UTF-8-BOM, then they display ok, but UMS no longer tries or fails to detect the language. Your new commit to use the language in the filename would help as then I can set it manually, but it would be nice if UMS could auto-detect the language. Aegisubs auto sets the subs to UTF-8-BOM when I use it.

Also, FFmpeg does not play embedded vobsub subtitles even if they are tagged as forced. UMS will not defer to MEncoder for these even if "Defer to MEncoder" is checked in the FFmpeg settings tab.

Thanks for reading.
tine
Posts: 11
Joined: Sat Feb 02, 2013 6:52 am

Re: Subtitle language detection bug

Post by tine »

Thanks valib for your effort on this one!

Really appreciated!

-tine
Post Reply