[BALUG-Talk] symlinks / symbolic links, hard links, ... (was: DNS ... CNAME records ... to CNAMES? How long a chain? Loops? ...)

Michael Paoli Michael.Paoli@cal.berkeley.edu
Sat Oct 15 07:10:32 UTC 2022


On 2022-10-13 00:48, Rick Moen wrote:

>     fixes that.)  This error mode is analogous to the "dangling 
> symlink"
>     problem that teaches all sysadmins "don't use a symlink unless
>     a hard link cannot do the job".

Ah yes, ... symlinks / symbolic links.

Hard links or bust!  ;-)  And yeah, rather literally true - at least
in many ways.  Rather a whole 'nother topic, and I wouldn't quite
go as far as to say symlinks=bad, they do however tend to get
overused.  And effectively, for the most part, symlinks only
offer one significant "solution"/advantage over hard links ...
but with that they bring also a whole lot of problems, risks, and 
general
breakage.  That's not that hardlinks are entirely without risk,
but there's a whole lot of problems one can get from symlinks, that
one will never have to worry about with hardlinks.  And yes,
chief among them, the dangling symlink issue(/plague).

So, *much* of the time, hard links are better. A key advantage,
there will never be a "dangling symlink" issue with a hard
link.  A dangling symlink being case where symlink
refers to a path that no longer has anything there.
Basically the target is moved or removed and ... the
source has no clue about it.  Rather like web pages and
404 page not found errors.  In very much the same way, they're
broken links.

So, what's the /one/ significant useful thing that symbolic links can
do that hard links can't?  Yes, they can cross filesystems - notably
refer to something on a different filesystem - just can't do that
with hardlinks ... though, even with hardlinks, there can be ways
to address that, e.g. have one more copy on that other filesystem.
Or also mount it to that other filesystem ... e.g. Linux even lets you
mount a file of type ordinary file from one filesystem atop same on
another filesystem (bind mounts).  Or, you could not have them on
separate filesystems - but use the same filesystem.
Could even have something periodically check, and update copy if/when
needed/relevant.  Anyway, there are alternatives to symlinks for
dealing with that issue - those are at least some of the
possibilities.

Hardlink "hazards"?  Not much.  The key thing to be aware of
with file having multiple hard links is, it's one file - all
the same file.  The (hard) links are just the names by which
the file is known in the directory(/ies) on the filesystem where
it's present.

So ... when/where does it get more, uh, "interesting",
complex, and/or messy?

Let's start with symlinks.  They can be relative, or
absolute.  "Of course" if the target is removed, the
source symlink becomes a dangling link - no
surprises there.  But moves get more interesting.
If one uses relative symbolic links, and sources
and targets are likewise moved in same relative manner
under same location in the filesystem hierarchy, they
continue to work and point to the same existing now
moved targets.  But if absolute symlinks were used (target
starts with / in the link), then they break.
However, if the source symlinks are moved, but
the targets don't move, if the symlinks are
absolute they continue to point to the same
(unmoved) targets - but if the symlinks are
relative (don't start with /) then they break.
Also, under chroot, relative symlinks will contiue
to work if the chroot point is a common ancestor in
the hierarchy relative to both source and target.
If that's not the case or they're absolute, they break.
So ... no guarantees things won't break, but some
careful thought and planning regarding use of symbolic
links, and relative vs. absolute, and exactly what one
does and doesn't move and how, well, ... can at least
reduce breakage with due planning and attention.

And ... lovely thing about hard links - can move any of those
anywhere within the same filesystem, and there's absolutely no
breakage of the link relationship ... as they're all still links to the
same file - no matter where those links are on the filesystem or how
they're named on the filesystem.  But yes, "of course", can't do
a hard link across/between separate filesystems.  Though as
mentioned, there are often ways to work around that if/when one
really needs to ... and besides using a symbolic link.
And, yeah, seems to me the more "territory" a symbolic link
traverses (further apart in hierarchy and/or across filesystems),
the more probable it is to break / get broken.  Most notably
when (re)moving the target, the more distant from the source,
the more probable it is source will be forgotten and things broken.
Whereas,
$ ln -s foo bar
When poking around in that directory and considering (re)moving
the target, one is more probable to actually notice and consider
the symlink and it's relationship to the target.  Not foolproof,
but ... somewhat more goof resistant?

Now, let's look at some more interesting bits.
Let's say you have a super actively referenced critical configuration
file.  For sake of argument, let's say it's:
/etc/critical
And ... you need to change its configuration.
Let's also say it's at least somewhat large ... more than one filesystem 
physical block,
maybe even hundreds or thousands or more.  So, if you have lots of stuff 
reading it
essentially continuously, how do you safely update it?
# vi /etc/critical
Surely you jest.
May vary by editor, but most typical editors will, when writing out the 
file,
overwrite the original.  Well, *nix, multi-user multi-tasking operating 
system,
so, while that file is being overwritten, stuff can read it.  So,
something may read it, while it's been partially overwritten and, e.g.
get start of the new data, then EOF before getting it all.  Not good
for our /etc/critical file and the host.
So, one of the relatively generic ways of safely doing that is
instead, in same directory (or at least same filesystem) a new file is
created, with same ownerships and permissions.
Then after that file is fully written out, the new file replaces old
via the rename(2) system call - which is (at least on ordinary local 
filesystem types)
guaranteed to be atomic.  Anything that opens /etc/critical to read it 
will
either get the old file and its contents, or the complete new file and 
its contents.
There's never a between, the /etc/critical path is always there, is 
always one of those
two files, and in chronological sequence - no jumping back and forth - 
once
the newer file is placed there, that's what that path gives, before 
that, the old file,
at no point does the path fail nor will it at all open to the older file 
again after
the rename(2) has completed - that's it - nice simple fast clean switch 
from
old to new?  Right?  Mostly.  ("Of course", also, anything that still
has the old file open still gets the old data - until it (typically
closes it and) reopens it - and generally using same path again.  But 
most
anything that critically cares about changes to data on that pathname 
will generally
quickly notice change (e.g. inode data on old file changes), and will 
then
generally promptly reopen by pathname.  Some daemons/services might need 
to
be explicitly told to reopen the "file" (pathname, to get the new file 
and data)).

Now let's modify that scenario slightly.  Let's say that for, e.g.
reasons of compatibility, /etc/crit is equivalent and should have same 
contents,
and some programs use /etc/critical, and some use /etc/crit.
So, let's say /etc/critical and /etc/crit are hard links to the same
file.  If one invokes an editor, or uses other means that overwrite
the file ... there's the issue of things reading the file while that's 
taking place,
that may get a bad read of that data - something that's neither the old, 
nor new
config - not so great.  However, once that write has completed, both 
links
have same data - and at same time - as it's only one file, just two 
(hard) links.
Okay, what if instead, we go the rename(2) way (e.g. as mv(1) will to in 
the
more simple cases).  We can replace one of the two hard links with the
new, then likewise replace the other link with a second hard link to
the new.  But there may be an issue with that.  What if we need both
to be updated at the exact same time - and as an atomic operation -
so we never have the two paths being something that could be opened at 
same
exact time giving different data results.  Well, two separate
rename(2) operations won't suffice, as various processes can open and
read things between those two rename(2) operations.  So, how to "fix"
that and avoid both problems - race condition issues with overwriting
file, and race condition issues with separately replacing two
pathnames?  Here the solution is ... uhm, yeah, ...
a symbolic link.  Of the two paths, have one be a symlink
to the other.  Now we leave the symlink alone, and use rename(2)
to replace the other - which is of type ordinary file.
Now the update happens atomically in most regards effectively
for both paths.  The operating system still has to read the
link from the symlink, and open the path that it refers to,
but once it at least gets to that point - so if it's already
made it at least that far, then either via the symlink, or
the direct path, the same - either older, or replaced with
newer - file will be opened and read.

So, symlinks aren't /all/ bad, but they should be carefully and 
judiciously used (and not overused).
And many of such timings aren't /nearly/ that critical ... sure, ought 
do atomic updates
to be safe, but much of the time sysadmins fail to do that where they 
ought, and for the
most part it (almost) never breaks things ... though some files /are/ 
significantly
more critical, and failing to do it properly can (and certainly 
sometimes does)
break things if not properly replaced/updated (e.g. do an overwrite of a 
critical
library or binary and the host can about instantly be seriously hosed 
... package management
software takes care of those details on replacement to avoid such 
problems).

So ... directly overwite the file - generally not atomic and may get 
some
reads of bad data before overwrite is complete.
Replace the file via rename(2) - generally better - but you lose
linking relationship to any additional hard links.  Replace those
separately with additional rename(2) operations and one loses the
synchronicity (both paths won't be updated at exact same time).
Have one hard link to ordinary file, any others
requiring simultaneity and being atomic on its update be symbolic links 
to that
path - that's about as good as it gets in that scenario ... but most
scenarios don't have requirements that stringent, so most of the time 
there
are better simpler solutions.

Though far from universal, some files that need updating and may be 
rather to highly critical,
have their own locking mechanisms and protocols for dealing with these 
types of matters.
E.g. for passwd(5), there's vipw, mail files generally have their own 
locking protocols,
visudo(8) uses locking protocols on the sudoers(5) files, etc.  But not 
everything has
its own specific locking protocol - so one shouldn't presume too much.



More information about the BALUG-Talk mailing list