Re: [BALUG-Talk] Lubuntu 14.04 guest session login failure

9 Jul 2017


      Quoting Elizabeth K. Joseph (lyz@princessleia.com):
...
In case anyone was curious as to what happened with this, I finally
had some time to sit down on site this evening and do some debugging.
Nice detective work.
...
Some background as to how the guest logins work in Lubuntu: A
guest-XXXXX (random characters) user is created upon login, which is
used throughout the session. It is then deleted when the user logs
out.
After some red herrings in the auth logs (mostly PAM errors around KDE
and Gnome keyrings), I did some digging in the lightdm logs.
Eventually I noticed the UID of the guest account trying to be created
was the same every time a login attempt was made: 999. Odd. So I
looked in /etc/passwd and noticed that there were hundreds of
guest-XXXXX accounts. That's no good!
Turns out, at some point the /etc/subgid.lock file got stuck in an
existing state (wasn't deleted when the lock concluded), which meant
the command to delete the user was not completing successfully upon
logout. Users were piling up and never being deleted. Once the UIDs
hit 999 it was failing to create new guest users, so the login would
fail. I quick mv (rm didn't work) of the subgid.lock file and a script
to delete all the guest accounts got us going again.
Next time you encounter that situation, I'd be curious what
rm -fv /etc/subgid.lock
reports.  The '-f' is for force, which honestly won't help here, because
all it does, IIRC, is force rm to omit error reporting if the target
doesn't exist -- I think.  The '-v' is probably more useful: verbose
reporting of what rm encounters when it tries to take the requested
action.
GNU rm is (again, IIRC) just a wrapper around the unlink(2) syscall,
which removes a specified hardlink to an inode (/socket, FIFO, device),
or in the case of a symlink, removes that.  So, basically it's about the
same as the unlink(1) command except a bit more featureful.
Ordinarily, I would expect 'rm' (or unlink) to fail only because either
there's a read-only mount status in the way (obviously not the case,
here), or hardware-level blocking (obviously not the case, here), or the
immutable flag having been set (highly unlikely in this case), or
ownership / rights issues.  But I'm not going to hazard a guess, except,
gremlins?  ;->  I'm intrigued, anyway.
As you suggest, the real long-term fix is a bug report on someone's
buggy code in useradd or in something calling adduser.  I gather that
the latter is a known problem:
https://askubuntu.com/questions/459080/useradd-cannot-lock-etc-subuid-try-ag...
(Note someone's suggestion, in the cited case, that something might be
running multiple instances of useradd simultaneously.)
However, that sort of contention over /etc/subgid.lock ought to show up
in fuser / lsof, which you say doesn't check out -- so I'm back to being
intrigued.
...
I'm considering my options to get us out of this reoccurring issue in
the future. I'm thinking of just a cron job on each machine that
checks for a subgid.lock file sticking around for more than a couple
days and moving it out of the way, but I'll sleep on it. More clever
suggestions welcome ;)
Well, not really.  If the unlink syscall (basis of /bin/rm) isn't
working, then I don't know of a different way of making the file
completely go away.  You might think of mv'ing it to a small filesystem
(like a ramfs) and then blowing away and re-creating the filesystem -- 
but unfortunately /bin/mv uses rename() only when moving/renaming 
the file within the same filesystem.  For a cross-filesystem mv, it
instead does an unlink() followed in quick succession by a link() .
Not that you don't know this already, but a more-satisfactory solution
would be to figure out what's bugging /bin/rm.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [BALUG-Talk] Lubuntu 14.04 guest session login failure