warning re: NFS mount hangs and workaround

Ruth Milner rmilner at zia.aoc.nrao.edu
Wed Aug 8 04:08:29 AEST 1990


Most of you are probably familiar with the old problem of innocent
bystanders getting hung when an NFS server goes down, even if they don't
refer to it in any way. The problem is due to stat(2) checking the entries
in /, and the workaround is to have the actual mount points for the NFS
filesystems in a subdirectory a couple of levels down rather than actually
in /.

There is a twist to this, however, that just bit me. If you have symbolic
links in / which point in any semi-direct fashion to the NFS mount point,
then the actual directory entry in / for the symbolic link must come
*after* the directory entries for any local filesystems being referenced.
For example, in /etc/fstab I have

cholla:/mnt	/nfs/cholla/mnt	nfs	rw,bg,intr,noquota	0 0

and in /

lrwxrwxrwx  1 root           11 Aug  7 11:23 /cholla -> /nfs/cholla/

So that users can refer to /cholla/mnt without worrying about the extra
directory. Note that even though this symbolic link doesn't point to the
mount point itself, everyone was still getting hung whenever they tried to
do such things as pwd, cd, read mail, etc. etc. The reason was because the
slot in the directory file "/" which was occupied by the symbolic link
"cholla" was ahead of the slot occupied by the entry for the local
filesystem containing user files (called "/u"). 

Note that a normal "ls" will only show you the alphabetical order. Using
the "-f" option, however, you can see the real order within /, which is
the order in which stat(2) searches. In order to shuffle these around, you
must rm the offending symbolic link, create enough junk files (touch(1)
suffices for this) to occupy any free slots ahead of all your local
filesystems, and then recreate the link. At that point you can then get
rid of your junk files.  From then on, anyone who really isn't referring
to the dead NFS server in any way (i.e. no PATH entries, etc.) will be
able to work as usual.

If anyone out there has attemped the normal workaround without success,
you might want to check whether you are running into this charming little
quirk.

BTW, this isn't fixed in 4.1. One of our users on a diskless client
running 4.1 has just reported exactly the same problem, even though the
dead NFS server is not the one he's dependent on, and not one he's trying
to access.

Ruth Milner
Systems Manager                     NRAO/VLA                     Socorro NM
                            rmilner at zia.aoc.nrao.edu



More information about the Comp.sys.sun mailing list