Page 2 of 3

Re: .srt character encoding troubles

Posted: Sat May 25, 2019 1:38 am
by Nadahar
Thanks. Regarding the language, I noticed that the attached files are named "_rus". That won't be picked up by UMS, you must use a dot so that they end with ".rus.srt".

I'll continue to look into the encoding problems.

Re: .srt character encoding troubles

Posted: Sat May 25, 2019 2:11 am
by Nadahar
All the SRT files have a UTF-8 BOM, even though none seems to be UTF-8. Have you "converted" these in Notepad++ or similar? If so, do you have the original files, as you downloaded them? The only one I've been able to find the actual charset of is the "working" one, which is actually encoded as Windows codepage 1251. Here is the same file actually converted to UTF-8.

Re: .srt character encoding troubles

Posted: Sat May 25, 2019 3:51 am
by Nadahar
When reading your log, the 803 file is detected as UTF-8 (because of the UTF-8 BOM but probably would be anyway):

Code: Select all

TRACE 2019-05-24 15:26:21.273 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt, external file character set: UTF-8: {}
The problem is that it's not a valid UTF-8. The same happens for 804:

Code: Select all

Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-804_rus.srt, external file character set: UTF-8: {}
801 on the other hand is correctly detected:

Code: Select all

TRACE 2019-05-24 15:26:21.074 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: ru, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-801_rus.srt, external file character set: WINDOWS-1251: {}
This explains why this file is working properly.

Here is what FFmpeg reports when starting the transcoding of 803:

Code: Select all

DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [Parsed_subtitles_0 @ 00000000022a0f00] Unable to open Z:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt
DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [AVFilterGraph @ 00000000023cf260] Error initializing filter 'subtitles' with args 'Z\:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt'
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Error initializing complex filters.
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Invalid data found when processing input
This is without any doubt caused by the fact that it's identified as UTF-8 but doesn't contain valid UTF-8 characters. My guess is that FFmpeg simply disregards the subtitles file as a result, and transcodes without subtitles.

If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.

Re: .srt character encoding troubles

Posted: Sat May 25, 2019 9:12 pm
by mikeaoller
Nadahar wrote: Sat May 25, 2019 3:51 am If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.
Hi Nadahar,

thank you very much for your help! All the files I uploaded are the original downloads, I didn't change anything.

Re: .srt character encoding troubles

Posted: Sat May 25, 2019 10:40 pm
by Nadahar
If you only open them in Notepad++ and change the encoding, Notepad++ will (in some cases) save the file. If you didn't do this, then I have no other explanation than that these files are bad at the download location.

Are you able to get the files to show the correct Russian characters in any text editor, word processor or similar on your PC? If not, we can't expect UMS/FFmpeg to figure them out either. I was only able to get the "working" subtitles file to display properly as a text file, and when I look at the "bad" ones with a hex editor, it doesn't look like they can be valid to me. It seems to me like the characters are 16 bits, which would mean UTF-8 or UTF-16 - but none of them give meaningful output. There might be other 16 bit encoding that I don't know about though, but as I've run them through several "online charset detectors" too I doubt that's the case.

Re: .srt character encoding troubles

Posted: Sun May 26, 2019 2:15 am
by Nadahar
Could you post a URL for one of the non-working subtitles downloads. If you can't post it publicly, please send me a PM.

Re: .srt character encoding troubles

Posted: Sat Jun 01, 2019 12:07 am
by mikeaoller
https://www.opensubtitles.org

Game of Thrones > season 8 > Russians only > Select episode > Download the most downloaded srt.

Re: .srt character encoding troubles

Posted: Sat Jun 01, 2019 12:23 pm
by Madoka
OpenSubtitles just tried to force install an extension on FF. Nice.

Code: Select all

OpenSubtitles Ads by Subtitle Pro

OpenSubtitles Ads removes Google Ads and inserts our Ads instead.
Yeah, that's useful.

Re: .srt character encoding troubles

Posted: Wed Jun 05, 2019 4:40 am
by Nadahar
@Makoka I really hate when they do things like this, but I guess everybody craves money :( Anyway, it happened to me too, and it doesn't seem like they actually tried to install it, they simply tried to open the page where you could manually install the plugin. However, the plugin isn't available for me at least, so it seems like Firefox might have removed it. It's still a terrible move by opensubtitles.org

@mikeaoller When trying to follow your instructions, I ended up with this file: https://www.opensubtitles.org/en/subtit ... g-night-ru

I downloaded it and checked and it's encoded as Windows 1251. The encoding is correctly identified and it works when I test it. I also used Notepad++ and converted it to UTF-8. This file was also correctly identified and played. I'll attach the two files I used in my test. It seems to me like this can't be the same file that you used.

Re: .srt character encoding troubles

Posted: Sun Oct 13, 2019 7:33 am
by Nadahar
Hi spambot, it's not funny anymore now..