[Transcoding folders] Limiting options

For help and support with Universal Media Server in general
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
User avatar
Nadahar
Developer
Posts: 1153
Joined: Tue Jun 09, 2015 5:57 pm

Re: [Transcoding folders] Limiting options

Post by Nadahar » Mon Apr 16, 2018 3:26 am

That you for your feedback, a lot has happened lately so I haven't had the time to answer.
Madoka wrote:
Mon Apr 09, 2018 3:00 pm
I set "Full Subtitle info", and was able to determine that UMS can detect the correct language on ANSI and UTF-8 encoded subs, but not UTF-8-BOM encoded subs.
I didn't remember that there was a "heuristics" based language evaluation when other methods to determine the language fails. It seems that there's some issue with files with BOM and this detection using ICU4J, which we use for this. It's probably something we could work around/handle in a better way, but I haven't looked at that code at all.
Madoka wrote:
Mon Apr 09, 2018 3:00 pm
Just personal preference, perhaps because I'm used to them this way, but I like the Engine name after the Filename as before.
I think it mostly comes down to what we're used to. This wasn't really consistent before, and I chose to put them first. I agree that it can be a bit strange some times, but separating it clearly from the "additional info" that is appended also has its benefits. It's easier for us to explain a user what to look for when these aren't mixed up. It's not set in stone though, but I think it works when I've gotten a little bit used to it.
Madoka wrote:
Sun Apr 15, 2018 8:34 am
For what it's worth, the "Subtitle Icon bug when thumbnails are off" that I've mentioned before is gone with this build. I haven't tried the Live Subtitles option as I usually download my own, but I haven't noticed any new bugs so far.
A lot of the work with this build is on the Live Subtitles, and the code is completely new. That said, not everything is finished on that part yet so some issues remain as to the "quality" of the matches and the fact that the same subtitles will often be listed more than once.
Madoka wrote:
Tue Apr 10, 2018 4:11 am
The "add a new subtitle into an opened folder" issue is still there. It rescans the folder but will not used the newly added subtitle as before. I didn't mention this last time, just restarting the server with the button doesn't help; I have to quit out and restart UMS for it to see the new subtitle file.
I'm a bit surprised by this, but it's also a bit complicated. There's a much more fundamental problem at play here, which is the way UMS caches media resources. 7.0.0 tries to address some of this, but I'm not sure of the details. The way it works pre 7.0.0 is that whenever a resource is "discovered" it's kept in memory until the program is restarted/quit. This causes several issues, both a lot of memory use for huge libraries and the fact that things don't refresh. A folder is rescanned every time you browse into the folder, but changes and removals aren't handled. New media files are added though. While this has several huge drawbacks and should be redesigned, it's also a matter of performance. It would be very easy to not store this in memory and resolve it every time, but browsing performance would be terrible. Thus, a smarter solution is needed, which both provides caching and the ability to react to changes. As of now, no such solution is implemented.

The above means that the subtitles information shown behind the name won't change. The cache is per renderer though, so a new renderer should see the updated information. That doesn't really help a lot though, as there is no way you can clear the renderer's cache without restarting UMS. What I did do to try to mend this is that the subtitles are rescanned and which one should be used is reevaluated every time a new transcode is started. That should mean that even though the "old" subtitles information will be shown on the renderer, the "new" subtitles should be used while transcoding. For streaming subtitles it's not so easy, as the information given to the renderer about the URL of the subtitles file is a part of the non-updating cache. Internal subtitles when without transcoding should work fine, but it's hardy relevant to add new internal subtitles ;)

It would be nice if you could confirm or deny whether the added subtitles will actually be used during playback even though the displayed information isn't updated.

Madoka
Posts: 301
Joined: Fri Jun 01, 2012 12:51 pm

Re: [Transcoding folders] Limiting options

Post by Madoka » Mon Apr 16, 2018 5:20 am

Thank you for your replay and explanations.

One main reason I mentioned the UTF-8-BOM subs is that I think Aegisub is one of the most popular external subtitle programs, and it automatically produces UTF-8-BOM encoded subs. It's not a setting you can change so I think most subbers won't even notice. I use Aegisub myself.
Nadahar wrote:
Mon Apr 16, 2018 3:26 am
It would be nice if you could confirm or deny whether the added subtitles will actually be used during playback even though the displayed information isn't updated.
I just tried it again, and I can confirm that it does not automatically use the newly added subs. However, this time I looked into the Transcode folder and found the new subs listed in the folder. Selecting the subs there will result in proper use of the subs. This is a nice workaround, and I am sure I looked in there before, and I could have sworn it wasn't listed in prior versions of UMS, although I may not have checked in recent versions.

I see that UMS 7.0.1 has been released. But I like this build much more as I need subs for practically everything I watch; I'll going to stick with it until the subtitle improvements are added to the main release branch.

One question, does this build recognize Java 9? I know 6.8.0 does, but this build says 6.8.0-Snaphot so I'm not sure anymore.

User avatar
Nadahar
Developer
Posts: 1153
Joined: Tue Jun 09, 2015 5:57 pm

Re: [Transcoding folders] Limiting options

Post by Nadahar » Mon Apr 16, 2018 10:29 am

The BOM issue is something that should be looked at, but I haven't done so (yet) so I don't know what the possibilities are. I wasn't aware that this limitation existed, although I would recommend having the language code or name (somevideo.en.srt, somevideo.eng.srt, somevideo.english.srt) in the sub file name anyway. I never trust "auto"detection for something like that ;)

Regarding not using the newly added subs, I'd have to look into that because I was sure it would work. That said, it would only work if the new subtitle has a "higher priority" in some way, e.g. being a language of higher priority in your configuration. When UMS finds multiple subtitles of "equal priority" it will simply use the first in the list, and that will be the one found first aka the file that was there in the first place.

I'm afraid using the TRANSCODE folder isn't a safe workaround either. I might remember wrong, but I think it's only populated the first time you enter it because of the same cache mechanism as described above. That means that if you don't enter the TRANSCODE folder before you add the new file, add the file and then enter it, it will work - but only once. The underlying subtitles code does handle the changes though, but it won't result in anything until the cache mechanism is improved.

This version is identical to 6.8.0 plus the subtitles stuff, so it supports everything 6.8.0 does. The SNAPSHOT version and the warning text in the title are standard stuff that's always shown for non-release builds.

It should thus work with Java 9 as good or bad as 6.8.0 does. That said, Java 9 is a real disaster and I think it will be a long time until it will be properly supported so I would stay with Java 8 for the time being. The reason is that Oracle decided to close access to a lot of crucial functionality that has "always been there" in Java but haven't officially been a part of the API. They have provided alternatives for some things, but others are simply gone. A lot of software relies on some of these things, including several of the libraries UMS use.

I don't know what will happen in the long run, I guess a lot of people was hoping that Oracle would come to their senses, but until these problems are resolved and we have libraries available that work properly with Java 9 there will be potential issues when using it. Some versions of Java 9 allows some of the removed stuff but issues warnings, while other versions simply block it completely. UMS' GUI doesn't work at all with the completely blocked versions.

Java 9 is already "old" too, Java 10 has already been released. It has the same problems as Java 9 though. Java 11 will be released in another 6 months. They seem to have gone completely mad. Java 8 is to be supported until 2021ish, far longer than both Java 9 and 10, so for the moment I simply just try to ignore the mess they have made and stick to Java 8 until the dust settles.

Madoka
Posts: 301
Joined: Fri Jun 01, 2012 12:51 pm

Re: [Transcoding folders] Limiting options

Post by Madoka » Mon Apr 16, 2018 12:47 pm

Thank you for the explanations. The advice on having the language name in the file name is very helpful. I didn't know that UMS would prioritize that over auto-detection.

I do find the "newly added subs" issue interesting as UMS clearly recognized the new subs, having referenced them in the Transcode folder. In my instance it's usually no subtitle associated with the video file to only one new one. Would it be possible to force UMS to refresh the cache only when new subtitles are found as new videos files are display correctly?

Thank you for the wisdom on Java. I had recently looked to see what is the current version by Googling it, and I got back a page to the official Java 8-u161 downloads, which totally confused me as I had heard about people using Java 9 and even release builds of 10. I'm going to stay with Java 8 then.

Madoka
Posts: 301
Joined: Fri Jun 01, 2012 12:51 pm

Re: [Transcoding folders] Limiting options

Post by Madoka » Mon Apr 16, 2018 2:28 pm

Ok, this is interesting. So I added .en to the subtitles in the folder that I was testing, the ones with the -BOM files. I exited the folder and re-entered. After a rescan, the subtitles are recognized with the {Ext Subs} info tag. I had played the file with subs via the Transcode folder earlier in the day. If I then remove the .en, exit and re-enter, the info tag will disappear, and it will not play with the subs unless I go into the Transcode folder. If I add it back, it will recognize them again.

But, it gets weird. If I watch part of the video file (say Ep 3), then the above will not work. It will stay in the state when I played it, ie, if no subs, adding subs will require me to look in the Transcode folder although they will be labeled as English. Even stranger, if I then add another subtitle to a later episode (say Ep 4) and force a rescan, the tag for Ep 3 will show up (because I played it?).

It seems as if UMS gets confused if it can't ID the language. It does better if I help it when using UTF-8-BOM files.

Edit: I do have "und" listed in my subtitle priorities as the third choice. The -BOM subtitle files are listed as Unk. by UMS which I'm think means the same thing.

User avatar
Nadahar
Developer
Posts: 1153
Joined: Tue Jun 09, 2015 5:57 pm

Re: [Transcoding folders] Limiting options

Post by Nadahar » Wed Apr 18, 2018 9:53 pm

When you list "und" or "*" listed in your subtitles priorities, that means "any language". "Unk." means unknown language, and it only matches "any" language.

I've looked a little bit at the BOM issue, and it turns out the problem is quite different. The reason UMS is able to "detect" the language of UTF-8 without BOM is actually a bug. ICU4J isn't able to detect language from Unicode text at all, because it detects language on a per-charset basis using specific byte code patterns for that charset. UTF-8 without BOM is seen as ISO-8859-1 and processed as such. There is a limited set of languages that is expected to be found in this charset, so it will only evaluate: Danish, German, English, Spanish, French, Italian, Dutch, Norwegian, Portuguese and Swedish. Any UTF-8 file without BOM will thus be detected as "the most likely" of these languages and will be completely wrong for the majority of languages.

Making a byte/codepoint based language detection for Unicode might be possible, but it seems that ICU4J hasn't implemented this. To fix the bug, language detection should actually be removed for UTF-8 files. The problem is that UTF-8 and ISO-8859-1 is identical for one byte characters, so there's no way to tell them apart unless a multi-byte code point is present, in which case I expect that ICU4J will recognize the file as UTF-8 and not return a language. You can say that there is some logic to the current way, if there are no multi-byte code points present is means that the text actually sticks to the characters in ISO-8859-1, which means that it's likely that the language is one of those detected. It is far from certain though.

Because there's really no way to distinguish UTF-8 from ISO-8859-1 when only one-byte code points are used, I'd say that the current implementation is the best solution, although far from ideal. Subbers should learn to include the language code in the name though.

Madoka
Posts: 301
Joined: Fri Jun 01, 2012 12:51 pm

Re: [Transcoding folders] Limiting options

Post by Madoka » Thu Apr 19, 2018 10:21 am

Thanks for looking into the problem and your explanation. I agree that it'd be nice if subbers identified the language in their subtitles, but unfortunately I get "Unk." internal SSA subs in anime all the time. For the external ones, It's easy enough for me to add the language to the name.

I really like the changes in this build and look forward to future improvements. I'm going to stick with this build until things are added to the master branch and released.

Post Reply