UMS getting stuck, some hard to spot bug

General discussion about Universal Media Server (no support or requests)
bodom
Posts: 24
Joined: Wed Nov 29, 2017 9:18 am

Re: UMS getting stuck, some hard to spot bug

Post by bodom »

It happened again. I now have additional info.

The system gets unusable because it hits the threads limit (cgroups). The result is being unable to fork (so it is impossible to login too). Luckily, this time i had left a shell open :mrgreen:

Long story short, the Java process "java -Xmx768M -Xss2048k -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djna.nosys=true -classpath update.jar:ums.jar net.pms.PMS trace" has 10806 (!!!) threads. Apparently no clue from the logs.

Any ideas?

Edit: after raising the max threads number, UMS is back in working state (without restart), but it keeps leaking threads (10828 now). Soon or later it will hit the hard (32767) thread limit anyway and definitely crash the system.
User avatar
owlhuang
Posts: 5
Joined: Wed Aug 17, 2016 6:55 pm

Re: UMS getting stuck, some hard to spot bug

Post by owlhuang »

Just want to jump in and I hit the same resource leak. UMS cannot create any new threads and hangs the system. I am running 8.1.0 but I believe it happens on 6.8.0, as well. Since it is not the first time, I also have a log tracking the number of threads UMS is using (by "ps -eLf") It shows the UMS runs out of all 32767 native threads at 7-ish am. I'll keep monitoring the new version.

Code: Select all

2019/05/12 01:00:02-0700 UMS: Total of 59 threads
2019/05/12 02:00:01-0700 UMS: Total of 60 threads
2019/05/12 03:00:01-0700 UMS: Total of 641 threads
2019/05/12 04:00:01-0700 UMS: Total of 9044 threads
2019/05/12 05:00:02-0700 UMS: Total of 16577 threads
2019/05/12 06:00:02-0700 UMS: Total of 24139 threads
2019/05/12 07:00:02-0700 UMS: Total of 31713 threads
 UMS: Total of  threads
 UMS: Total of  threads
2019/05/12 20:00:02-0700 UMS: Total of  threads

Code: Select all

DEBUG 2019-05-12 07:04:02.315 [UPNP-AliveMessageSender] Sending ALIVE...
DEBUG 2019-05-12 07:04:13.910 [UPNP-AliveMessageSender] Sending ALIVE...
DEBUG 2019-05-12 07:04:25.386 [UPNP-AliveMessageSender] Sending ALIVE...
INFO  2019-05-12 07:04:26.745 [cling-2] Exception in thread "cling-2" java.lang.OutOfMemoryError: unable to create new native thread
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.lang.Thread.start0(Native Method)
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.lang.Thread.start(Thread.java:717)
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
INFO  2019-05-12 07:04:26.745 [cling-2]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
INFO  2019-05-12 07:04:26.746 [cling-2]         at java.lang.Thread.run(Thread.java:748)
DEBUG 2019-05-12 07:04:36.941 [UPNP-AliveMessageSender] Sending ALIVE...
DEBUG 2019-05-12 07:04:48.419 [UPNP-AliveMessageSender] Sending ALIVE...
DEBUG 2019-05-12 07:04:59.893 [UPNP-AliveMessageSender] Sending ALIVE...
bodom
Posts: 24
Joined: Wed Nov 29, 2017 9:18 am

Re: UMS getting stuck, some hard to spot bug

Post by bodom »

Yep, it still happens in 8.1.0.

I am now setting up a script to kill and restart it when it approaches the threads limit.
bodom
Posts: 24
Joined: Wed Nov 29, 2017 9:18 am

Re: UMS getting stuck, some hard to spot bug

Post by bodom »

I can't remember clearly, but i have a feeling this bug has "always" been present since i became an UMS user at the early 7.x days. It is still present in 9.0.0-b2.

At the beginning, it took about a month for UMS to exhaust the thread resources and hang the system, now it takes only a few days; this could anyway not be related to a change in the code, but to changes in my library. Also, once i've figured out what's the problem, i've limited it down to 15.000 threads so it can't hang the whole system anymore.

UMS 9.0.0-b2 is actually running at 14.999 threads and hung. Is there any test i can do to help you spot the issue before restarting it?

Also, i'm not sure if it can be related in any way, but, from the log, i see UMS is wasting a lot of efforts in try to track vanishing '.tmp' files.

Code: Select all

DEBUG 2019-08-16 13:55:43.568 [File watcher] net.pms.util.FileWatcher ENTRY_DELETE (ct=1): /path/files/tmp.tmp
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.formats.FormatFactory Could not match any format to "/path/files/tmp.tmp"
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.dlna.RootFolder File /path/files/tmp.tmp was deleted or moved on the hard drive, removing it from the database
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.dlna.DLNAMediaDatabase Deleting rows from FILES table where the filename is "/path/files/tmp.tmp"
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.formats.FormatFactory Could not match any format to "/path/files/tmp.tmp"
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.formats.FormatFactory Could not match any format to "/path/files/tmp.tmp"
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.formats.FormatFactory Could not match any format to "/path/files/tmp.tmp"
TRACE 2019-08-16 13:55:43.569 [File event] net.pms.dlna.RootFolder File tmp.tmp was not recognized as valid media so was not added to the database
TRACE 2019-08-16 13:55:43.571 [File event] net.pms.dlna.DLNAMediaDatabase Deleted 0 rows from FILES
TRACE 2019-08-16 13:55:43.571 [File event] net.pms.database.TableFilesStatus Removed entries 0 in FILES_STATUS for filename "/path/files/tmp.tmp"
Is there a way to tell UMS to just ignore *.tmp? if nothing else, it will make my logs cleaner and easier to spot any relevant message.
bodom
Posts: 24
Joined: Wed Nov 29, 2017 9:18 am

Re: UMS getting stuck, some hard to spot bug

Post by bodom »

So, i've been monitoring the UMS thread count for a while, and here is what happens on UMS 9.0.0-b2

Image

The server stays "good" for a while, then something happens and it runs out of control in 2 hours.
I have some file events around the time when threads started to grow in number… could this be somehow related to the file watcher?
Post Reply