[BALUG-Talk] Lubuntu 14.04 guest session login failure

Rick Moen rick@linuxmafia.com
Sun Jul 9 13:28:55 PDT 2017


Quoting Elizabeth K. Joseph (lyz@princessleia.com):

> In case anyone was curious as to what happened with this, I finally
> had some time to sit down on site this evening and do some debugging.

Nice detective work.

> Some background as to how the guest logins work in Lubuntu: A
> guest-XXXXX (random characters) user is created upon login, which is
> used throughout the session. It is then deleted when the user logs
> out.
> 
> After some red herrings in the auth logs (mostly PAM errors around KDE
> and Gnome keyrings), I did some digging in the lightdm logs.
> Eventually I noticed the UID of the guest account trying to be created
> was the same every time a login attempt was made: 999. Odd. So I
> looked in /etc/passwd and noticed that there were hundreds of
> guest-XXXXX accounts. That's no good!
> 
> Turns out, at some point the /etc/subgid.lock file got stuck in an
> existing state (wasn't deleted when the lock concluded), which meant
> the command to delete the user was not completing successfully upon
> logout. Users were piling up and never being deleted. Once the UIDs
> hit 999 it was failing to create new guest users, so the login would
> fail. I quick mv (rm didn't work) of the subgid.lock file and a script
> to delete all the guest accounts got us going again.

Next time you encounter that situation, I'd be curious what 

   rm -fv /etc/subgid.lock 

reports.  The '-f' is for force, which honestly won't help here, because
all it does, IIRC, is force rm to omit error reporting if the target
doesn't exist -- I think.  The '-v' is probably more useful: verbose
reporting of what rm encounters when it tries to take the requested
action.

GNU rm is (again, IIRC) just a wrapper around the unlink(2) syscall,
which removes a specified hardlink to an inode (/socket, FIFO, device),
or in the case of a symlink, removes that.  So, basically it's about the
same as the unlink(1) command except a bit more featureful.

Ordinarily, I would expect 'rm' (or unlink) to fail only because either
there's a read-only mount status in the way (obviously not the case,
here), or hardware-level blocking (obviously not the case, here), or the
immutable flag having been set (highly unlikely in this case), or
ownership / rights issues.  But I'm not going to hazard a guess, except,
gremlins?  ;->  I'm intrigued, anyway.

As you suggest, the real long-term fix is a bug report on someone's
buggy code in useradd or in something calling adduser.  I gather that
the latter is a known problem:
https://askubuntu.com/questions/459080/useradd-cannot-lock-etc-subuid-try-again-later

(Note someone's suggestion, in the cited case, that something might be
running multiple instances of useradd simultaneously.)

However, that sort of contention over /etc/subgid.lock ought to show up
in fuser / lsof, which you say doesn't check out -- so I'm back to being
intrigued.

> I'm considering my options to get us out of this reoccurring issue in
> the future. I'm thinking of just a cron job on each machine that
> checks for a subgid.lock file sticking around for more than a couple
> days and moving it out of the way, but I'll sleep on it. More clever
> suggestions welcome ;)

Well, not really.  If the unlink syscall (basis of /bin/rm) isn't
working, then I don't know of a different way of making the file
completely go away.  You might think of mv'ing it to a small filesystem
(like a ramfs) and then blowing away and re-creating the filesystem -- 
but unfortunately /bin/mv uses rename() only when moving/renaming 
the file within the same filesystem.  For a cross-filesystem mv, it
instead does an unlink() followed in quick succession by a link() .

Not that you don't know this already, but a more-satisfactory solution
would be to figure out what's bugging /bin/rm.





More information about the BALUG-Talk mailing list