01-04-2021, 04:58 PM
(This post was last modified: 01-04-2021, 05:04 PM by TheOldPresbyope.
Edit Reason: fix typos
)
Shortly after Christmas, Alain reported a strange behavior in his moOde 7.0.1 player (initially, posts #37, #84) in which it became unresponsive after a period of time. These posts and the subsequent interchange established it happens only when he enables moOde's local display with a small HDMI LCD display attached and is a consequence of a Chromium process consuming available memory. He reported he does not see the same issue with moOde 6.7.1 installed on the same hardware.
I spent some time over the weekend trying to pin down the mechanism using an RPi3B+; deadly dull work because it takes hours for the fail to occur on my system after making any change and rebooting.
1) I confirmed that the behavior happens with moOde 7.0.1 and that it doesn't happen with moOde 6.7.1. Note that between these two releases, the underlying Raspberry Pi OS went from 10.4 to 10.6 and the installed Chromium went from 78.0.3904.108 to 84.0.4147.141 whereas AFAIK there was no significant change to moOde's implementation of the local display.
2) I confirmed that the behavior happens only when the HDMI output is involved, even if no HDMI display is connected to the RPi. No such behavior occurs with the Raspberry Pi 7" Touch Display, which is also driven by moOde's local display function but which is not an HDMI display (it uses the DSI port). As an aside, I believe this is why the test team didn't notice the behavior until the issue was raised. AFAIK none of us routinely uses an HDMI local display.
3) The startup of the local display goes like this-
a) When the local display function is enabled, xinit reads the commands in /home/pi/.xinitrc and starts an X11 server running an instance of the Chromium browser with a set of suitable options.
b) the Chromium process in turn spawns 8 more chromium processes which perform various functions and communicate with each other via shared memory.
- one of these Chromium processes is type=renderer, started with a number of options established by the primary process (e.g., not set by moOde).
4) I wrote a simple bash script which logs data I considered significant, sleeps for 60 seconds, then repeats.
Note - all elapsed times reported in the following are approximate
5) while the moOde player sits idle with the local display enabled, this script reports that for roughly 25 minutes of CPU time (which is about about 4 hours wallclock time on an RPi3B+ with other processes competing for CPU cycles) everything is pretty stable. The free memory in the system is ca 260MB and the VmData segment in the type=renderer process is ca 170MB.
6) then I to see the VmData segment in the type=renderer process begin to grow and the free memory in the system begin to fall. The process grabs about 20 MB/min. The oomscore of this process begins to climb.
7) after about 10 min (wall clock time) of this, the system starts reducing the amount of memory available for buffers and cache in order to keep feeding this process.
8) after about another 20 min (wall clock), so much memory has been consumed that the system is struggling to run other processes. The 60-sec sleep cycle of my script begins stretching out...2 min...20 min... while IO activity begins to climb as the system tries to move pages of memory to make room.
9) the status of the type=renderer process changes from a rhythmic S/R (sleep/run) cycle to a "D" status which is usually called uninterruptible sleep.
10) time stretches seemingly to infinity. The type=renderer process eventually becomes a zombie (maybe the oomkiller got it but I can't speak to this). Normal memory distribution is restored. At some point, the process is removed from the process table. Somewhere in here, syslog errors are thrown concerning blocked kworker task, mmc_rescan, etc.
11) Normal function of moOde is restored at some point, albeit without the local display because its renderer is no more. My logging doesn't say when, but I suppose when the memory is redistributed.
What's not happening? This is not thread exhaustion - no new processes or threads are thrown. This is not shared memory exhaustion - the size of /dev/shm does not expand.
Incidentally, since in one post Alain mentioned he modified an hdmi setting in /boot/config.txt, I should note that even clearing all the hdmi settings on my installation did not change the issue. Neither is this issue caused by some library limitation.
Root cause?
So, the issue would seem likely due to a change in Chromium which occurred between v78 and v84, although I can't begin to interpret the commits to its code base which occurred. Less likely but I suppose also possible, it's due to a change in Raspberry Pi OS. I can't explain, for example, why using the DSI interface instead of the HDMI interface would cause a difference, and this certainly is down at the OS level.
Internet searches turned up only a very few hits on some of the items I report above and none of the hits has revealed useful information. Published lists of unpublished Chromium options don't offer anything which appears to me to be useful. I don't know enough about the Chromium developer community to know how to go about asking for enlightenment. The Chromium user forums are full of unanswered questions. It still might be we are somehow misusing Chromium, but if there really is an issue then the devs ought to know about it.
Possible remediation?
1) I suppose the moOde watchdog script could be augmented to include monitoring some key datapoint, vmstat, for example, and triggering a response when needed. Possibilities which come to mind
a) simply restart xinit. This is pretty brutal and would cause the same initial screen-flash sequence we see when we start the local display. I consider it very distracting.
b) if this behavior truly occurs only when Chromium remains idle, then contrive to give it something innocuous to do from time to time. I like this better than a) but I haven't yet determined what might do the trick. Again, any screen activity would be distracting.
Other alternatives
2) put the Chromium processes in their own memory-limited cgroup. This might keep the rest of moOde running but one still has to deal with the OOM issue.
3) try to install an earlier version of the Chromium package to see if the issue resolves. I'm not interested in pursuing this approach.
4) try enabling swap on the RPi3+ or moving to a larger memory RPi4B (Chief Brody to Quint "you're gonna need a bigger boat!", Jaws 1975). This might ameliorate the situation, but it seems to me a very unsatisfactory solution.
Regards,
Kent
I spent some time over the weekend trying to pin down the mechanism using an RPi3B+; deadly dull work because it takes hours for the fail to occur on my system after making any change and rebooting.
1) I confirmed that the behavior happens with moOde 7.0.1 and that it doesn't happen with moOde 6.7.1. Note that between these two releases, the underlying Raspberry Pi OS went from 10.4 to 10.6 and the installed Chromium went from 78.0.3904.108 to 84.0.4147.141 whereas AFAIK there was no significant change to moOde's implementation of the local display.
2) I confirmed that the behavior happens only when the HDMI output is involved, even if no HDMI display is connected to the RPi. No such behavior occurs with the Raspberry Pi 7" Touch Display, which is also driven by moOde's local display function but which is not an HDMI display (it uses the DSI port). As an aside, I believe this is why the test team didn't notice the behavior until the issue was raised. AFAIK none of us routinely uses an HDMI local display.
3) The startup of the local display goes like this-
a) When the local display function is enabled, xinit reads the commands in /home/pi/.xinitrc and starts an X11 server running an instance of the Chromium browser with a set of suitable options.
b) the Chromium process in turn spawns 8 more chromium processes which perform various functions and communicate with each other via shared memory.
- one of these Chromium processes is type=renderer, started with a number of options established by the primary process (e.g., not set by moOde).
4) I wrote a simple bash script which logs data I considered significant, sleeps for 60 seconds, then repeats.
Note - all elapsed times reported in the following are approximate
5) while the moOde player sits idle with the local display enabled, this script reports that for roughly 25 minutes of CPU time (which is about about 4 hours wallclock time on an RPi3B+ with other processes competing for CPU cycles) everything is pretty stable. The free memory in the system is ca 260MB and the VmData segment in the type=renderer process is ca 170MB.
6) then I to see the VmData segment in the type=renderer process begin to grow and the free memory in the system begin to fall. The process grabs about 20 MB/min. The oomscore of this process begins to climb.
7) after about 10 min (wall clock time) of this, the system starts reducing the amount of memory available for buffers and cache in order to keep feeding this process.
8) after about another 20 min (wall clock), so much memory has been consumed that the system is struggling to run other processes. The 60-sec sleep cycle of my script begins stretching out...2 min...20 min... while IO activity begins to climb as the system tries to move pages of memory to make room.
9) the status of the type=renderer process changes from a rhythmic S/R (sleep/run) cycle to a "D" status which is usually called uninterruptible sleep.
10) time stretches seemingly to infinity. The type=renderer process eventually becomes a zombie (maybe the oomkiller got it but I can't speak to this). Normal memory distribution is restored. At some point, the process is removed from the process table. Somewhere in here, syslog errors are thrown concerning blocked kworker task, mmc_rescan, etc.
11) Normal function of moOde is restored at some point, albeit without the local display because its renderer is no more. My logging doesn't say when, but I suppose when the memory is redistributed.
What's not happening? This is not thread exhaustion - no new processes or threads are thrown. This is not shared memory exhaustion - the size of /dev/shm does not expand.
Incidentally, since in one post Alain mentioned he modified an hdmi setting in /boot/config.txt, I should note that even clearing all the hdmi settings on my installation did not change the issue. Neither is this issue caused by some library limitation.
Root cause?
So, the issue would seem likely due to a change in Chromium which occurred between v78 and v84, although I can't begin to interpret the commits to its code base which occurred. Less likely but I suppose also possible, it's due to a change in Raspberry Pi OS. I can't explain, for example, why using the DSI interface instead of the HDMI interface would cause a difference, and this certainly is down at the OS level.
Internet searches turned up only a very few hits on some of the items I report above and none of the hits has revealed useful information. Published lists of unpublished Chromium options don't offer anything which appears to me to be useful. I don't know enough about the Chromium developer community to know how to go about asking for enlightenment. The Chromium user forums are full of unanswered questions. It still might be we are somehow misusing Chromium, but if there really is an issue then the devs ought to know about it.
Possible remediation?
1) I suppose the moOde watchdog script could be augmented to include monitoring some key datapoint, vmstat, for example, and triggering a response when needed. Possibilities which come to mind
a) simply restart xinit. This is pretty brutal and would cause the same initial screen-flash sequence we see when we start the local display. I consider it very distracting.
b) if this behavior truly occurs only when Chromium remains idle, then contrive to give it something innocuous to do from time to time. I like this better than a) but I haven't yet determined what might do the trick. Again, any screen activity would be distracting.
Other alternatives
2) put the Chromium processes in their own memory-limited cgroup. This might keep the rest of moOde running but one still has to deal with the OOM issue.
3) try to install an earlier version of the Chromium package to see if the issue resolves. I'm not interested in pursuing this approach.
4) try enabling swap on the RPi3+ or moving to a larger memory RPi4B (Chief Brody to Quint "you're gonna need a bigger boat!", Jaws 1975). This might ameliorate the situation, but it seems to me a very unsatisfactory solution.
Regards,
Kent