June 6, 2018

This is my third post of my Google Summer of Code 2018 series. Links for the previous posts can be found below:

About the stuck OBS workers

As I mentioned in my last post, OBS workers were hanging on my local installation. I finally got to the point where the only missing piece of my local instalation (to have a raw OBS install which can build Debian packages) was to figure out this issue with the OBS workers.

The problem

The issue here was that the OBS Download on Demand feature would download the package dependencies needed to build the Debian packages imported but would just hang as scheduled forever. This was not an easy issue to debug, since neither the worker nor the schedulers were throwing any outputs to the logs.

Debugging

After a while, as I inspected OBS code, I realized that OBS workers run on screen by default. I noticed that screen was running on my virtual machines even after I tried to stop the worker deamon. Worse than that, if I’d try to restart the deamon, multiple worker processes would be spawned, and the builds would still remain as scheduled. I then attached my shell to the screen session initialized by the OBS worker and found out that the actual OBS worker exectuable script was not running. The screen session was stuck in a screen with a permission denied message. But why?

The OBS worker checks out some code and configurations from the OBS server instance it points to. It then saves the code and configurations under the /run directory. This path is configurable on the /etc/default/obsworker file, under the OBS_RUN_DIR variable. The permission denied message on the screen session pointed to this file under /run. Then, I tried running the commands specified in the deamon init script, which involved running a script under /run, and got the same permission denied error.

I kept digging deeper and found that on Debian default installations, /var/run is a symlink to /run, and that it is mounted with the noexec flag on. This means that files under that directory cannot be executed, even if they do have execution permissions set.

Systemd is the responsible for mounting the /run directory, and this is specified in the binaries to splicitly avoid that users change the way this directory is mounted. See https://freedesktop.org/wiki/Software/systemd/APIFileSystems for further reference.

As pointed in a thread in the freedesktop mailing lists, the /run directory does not have the noexec flag on by default because initrd may place executables in that directory so it can cleanly disassemble the / file system. This fact is also reinforced by one of systemd’s documentation pages

Now we know that the /run directory in Debian is mounted with the noexec flag on by default.

athos@debian$ mount | grep '/run'
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=404932k,mode=755)

We also know that this is not the systemd default. I also checked other systems to make sure files under /run would have permission to be executed.

Fedora:

athos@fedora$ mount | grep '/run'
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)

Since OBS is developed by OpenSUSE people, having OBS_RUN_DIR=/run by default for OBS workers is not a problem and will not get workers stuck on OBS installations on OpenSUSE machines. This is not true in Debian though.

Two questions remain:

  • Who (what package) is setting the noexec flag on the /run directory?
  • Why?

To answer question 1, I started by looking at the systemd Debian package, looking for patches that would set the MS_NOEXEC flag for the /run partition on the APIFileSystems code. No patches were found. I had no idea where to go from there, so I asked the guys on the #debian-systemd IRC channel. Someone pointed me to the initramfs, and there it was:

athos@debian:/usr/share/initramfs-tools$ cat init | grep '/run'
mount -t tmpfs -o "noexec,nosuid,size=10%,mode=0755" tmpfs /run

I still cannot answer why this is the Debian way of mounting /run, and I decided that I would stop my investigations here, since I was heading way out of the scope of my project.

The solution

We now have 3 possible solutions for the issue:

Patching upstream

Should OBS set OBS_RUN_DIR=/run by default? Maybe that should not be the default directory for that variable, since it executes code in that path. To answer this question, we must understand why Debian sets the noexec flag to the /run directory, so we can justify a patch upstream.

Patching the OBS package

The OBS package should definitely change the default path to the OBS_RUN_DIR directory, since the workers will not run on default Debian installations with the variable set to it’s default value (/run).

Configure OBS workers with salt

Until we do get our changes into the OBS package, we will change the value of the OBS_RUN_DIR variable on the workers configuration file in our salt scripts. Patches will be available in the next days and links will follow in my next post.

OBS staging instance

We now have a working instance of OBS running on https://irill8.siege.inria.fr. I will also set a default Debian unstable Download on Demand project there on the upcoming days so we can start building our packages against our OBS instance.

Next steps (A TODO list to keep on the radar)

  • Write patches for the OBS worker issue described in this post
  • Configure Debian projects on OBS with salt, not manually
  • Change the default builder to perform builds with clang
  • Trigger new builds by using the dak/mailing lists messages
  • Verify the rake-tasks.sh script idempotency and propose patch to opencollab repository
  • Separate salt recipes for workers and server (locally)
  • Properly set hostnames (locally)

Comments

comments powered by Disqus
← Setting up a local OBS development environment | Blog Archive | Some notes on the OBS Documentation →

about

Athos Ribeiro, Software Engineer, contributor at the Fedora Project

subscribe

To receive updates from this site, you can subscribe to the  RSS feed of all updates to the site in an RSS feed reader.

search