This is my third post of my Google Summer of Code 2018 series. Links for the previous posts can be found below:
As I mentioned in my last post, OBS workers were hanging on my local installation. I finally got to the point where the only missing piece of my local instalation (to have a raw OBS install which can build Debian packages) was to figure out this issue with the OBS workers.
The issue here was that the OBS Download on Demand feature would download the package dependencies needed to build the Debian packages imported but would just hang as scheduled forever. This was not an easy issue to debug, since neither the worker nor the schedulers were throwing any outputs to the logs.
After a while, as I inspected OBS code, I realized that OBS workers run on screen by default. I noticed that screen was running on my virtual machines even after I tried to stop the worker deamon. Worse than that, if I’d try to restart the deamon, multiple worker processes would be spawned, and the builds would still remain as scheduled. I then attached my shell to the screen session initialized by the OBS worker and found out that the actual OBS worker exectuable script was not running. The screen session was stuck in a screen with a permission denied message. But why?
The OBS worker checks out some code and configurations from the OBS server
instance it points to. It then saves the code and configurations under the /run
directory. This path is configurable on the /etc/default/obsworker
file,
under the OBS_RUN_DIR
variable. The permission denied message on the
screen session pointed to this file under /run
. Then, I tried running the
commands specified in the deamon init script, which involved running a script
under /run
, and got the same permission denied error.
I kept digging deeper and found that on Debian default installations, /var/run
is a symlink to /run, and that it is mounted with the noexec
flag on. This
means that files under that directory cannot be executed, even if they do have
execution permissions set.
Systemd is the responsible for mounting the /run
directory, and this is
specified in the binaries to splicitly avoid that users change the way this
directory is mounted. See
https://freedesktop.org/wiki/Software/systemd/APIFileSystems for further
reference.
As pointed in a
thread in
the freedesktop mailing lists, the /run directory does not have the noexec
flag on by default because initrd
may place executables in that directory so
it can cleanly disassemble the /
file system. This fact is also reinforced by
one of systemd’s documentation
pages
Now we know that the /run
directory in Debian is mounted with the noexec
flag on by default.
athos@debian$ mount | grep '/run'
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=404932k,mode=755)
We also know that this is not the systemd default. I also checked other systems
to make sure files under /run
would have permission to be executed.
Fedora:
athos@fedora$ mount | grep '/run'
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
Since OBS is developed by OpenSUSE people, having OBS_RUN_DIR=/run
by default
for OBS workers is not a problem and will not get workers stuck on OBS
installations on OpenSUSE machines. This is not true in Debian though.
Two questions remain:
noexec
flag on the /run
directory?To answer question 1, I started by looking at the systemd Debian package,
looking for patches that would set the MS_NOEXEC
flag for the /run partition
on the APIFileSystems code. No patches were found. I had no idea where to go
from there, so I asked the guys on the #debian-systemd IRC channel. Someone
pointed me to the initramfs, and there it was:
athos@debian:/usr/share/initramfs-tools$ cat init | grep '/run'
mount -t tmpfs -o "noexec,nosuid,size=10%,mode=0755" tmpfs /run
I still cannot answer why this is the Debian way of mounting /run
, and I
decided that I would stop my investigations here, since I was heading way out
of the scope of my project.
We now have 3 possible solutions for the issue:
Should OBS set OBS_RUN_DIR=/run
by default? Maybe that should not be the
default directory for that variable, since it executes code in that path. To
answer this question, we must understand why Debian sets the noexec
flag
to the /run
directory, so we can justify a patch upstream.
The OBS package should definitely change the default path to the OBS_RUN_DIR
directory, since the workers will not run on default Debian installations with
the variable set to it’s default value (/run
).
Until we do get our changes into the OBS package, we will change the value of
the OBS_RUN_DIR
variable on the workers configuration file in our salt scripts.
Patches will be available in the next days and links will follow in my next post.
We now have a working instance of OBS running on https://irill8.siege.inria.fr. I will also set a default Debian unstable Download on Demand project there on the upcoming days so we can start building our packages against our OBS instance.