1024 specified on YPMAXRECORD the ypmatch can get in an infinite retry
loop when is requesting the information from the NIS server.
The ypmatch(1) will return an error until the command receives an
kill(1).
To avoid this problem, we check the MAX_RETRIES that is by default set
to 20 and avoid get in infinet loop at the client side.
NOTE: FreeBSD nis(8) server doesn't present this issue.
Submitted by: Ravi Pokala <rpokala@panasas.com>,
Lakshmi N. Sundararajan <lakshmi.n@msystechnologies.com>,
Lewis, Fred <flewis@panasas.com>,
Pushkar Kothavade <pushkar.kothavade@msystechnologies.com>
Approved by: bapt (mentor)
MFC after: 1 month
Differential Revision: D4095
This fixes a race condition where another thread may fork() before CLOEXEC
is set, unintentionally passing the descriptor to the child process.
This commit only adds O_CLOEXEC flags to open() or openat() calls where no
fcntl(fd, F_SETFD, FD_CLOEXEC) follows. The separate fcntl() call still
leaves a race window so it should be fixed later.
yp_next as revision 1.50 did. This should fix, or at least very much
reduce the risk of, NIS timing out due to UDP packet loss for NIS
functions.
See also revision 1.50 for more details about the general problem.
Tested by: nosedive, freefall, hub, mx1, brooks
MFC after: 1 week
Approved by: re (mux)
packet loss when talking to a NIS server.
- Set 1 second retry timeout to further realistically handle UDP
packet loss for yp_next packet bursts. If the packet hasn't come
back within 1 second its rather unlikely to come back at all. There
is still back-off mechanism in RPC so if there is another reason
than packet loss for the lack of response within 1 second, the NIS
server will not be totally bombarded with requests.
This reduces the risk of NIS failing with:
yp_next: clnt_call: RPC: Timed out
considerably. This is mainly a problem if you have larger NIS maps
(like at FreeBSD.org) since enumerations of the lists will cause a UDP
packet bursts where a few packets being lost once in a while do
happen.
MFC after: 1 week
Discussed with: peter
Problem mainly diagnosed by: peter
technique) so that we don't wind up calling into an application's
version if the application defines them.
Inspired by: qpopper's interfering and buggy version of strlcpy
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.
Tested on: alpha, i386
Reviewed by: bde, jake, tmm
change prototypes to be the same as in the original sun tirpc code.
Remove ()P macro in a file where the mayority had ()P already removed.
Add them if the mayority use ()P macros.
Submitted by: mbr
Requested by: bde
alpha these bugs didn't cause any problems because it was little endian,
but on sparc64, we ended up with garbage for the IP address when we tried
to contact the server. (Usually 3.253.0.0)
Not objected to by: wpaul
adding (weak definitions to) stubs for some of the pthread
functions. If the threads library is linked in, the real
pthread functions will pulled in.
Use the following convention for system calls wrapped by the
threads library:
__sys_foo - actual system call
_foo - weak definition to __sys_foo
foo - weak definition to __sys_foo
Change all libc uses of system calls wrapped by the threads
library from foo to _foo. In order to define the prototypes
for _foo(), we introduce namespace.h and un-namespace.h
(suggested by bde). All files that need to reference these
system calls, should include namespace.h before any standard
includes, then include un-namespace.h after the standard
includes and before any local includes. <db.h> is an exception
and shouldn't be included in between namespace.h and
un-namespace.h namespace.h will define foo to _foo, and
un-namespace.h will undefine foo.
Try to eliminate some of the recursive calls to MT-safe
functions in libc/stdio in preparation for adding a mutex
to FILE. We have recursive mutexes, but would like to avoid
using them if possible.
Remove uneeded includes of <errno.h> from a few files.
Add $FreeBSD$ to a few files in order to pass commitprep.
Approved by: -arch
just use _foo() <-- foo(). In the case of a libpthread that doesn't do
call conversion (such as linuxthreads and our upcoming libpthread), this
is adequate. In the case of libc_r, we still need three names, which are
now _thread_sys_foo() <-- _foo() <-- foo().
Convert all internal libc usage of: aio_suspend(), close(), fsync(), msync(),
nanosleep(), open(), fcntl(), read(), and write() to _foo() instead of foo().
Remove all internal libc usage of: creat(), pause(), sleep(), system(),
tcdrain(), wait(), and waitpid().
Make thread cancellation fully POSIX-compliant.
Suggested by: deischen
points. For library functions, the pattern is __sleep() <--
_libc_sleep() <-- sleep(). The arrows represent weak aliases. For
system calls, the pattern is _read() <-- _libc_read() <-- read().
- Completely recoded the ypmatch cache code. The old code could leak
memory: it would allow the cache to grow, but never
shrink. The new code imposes the following limits:
o The cache is capped at a limit of 5 entries.
o Each entry expires after five seconds, at which point
its slot is freed.
o If an insertion is to be done and all five slots
are filled, the oldest entry is forcibly expired
to release its slot.
Also, the cache is implemented on a per-binding basis rather than
having a global cache covering all bindings. This means that each
bound domain has its own 5 slot cache.
- Changed clntudp_create() to clntudp_bufcreate() so that the
xmit/recv message buffer sizes can be set explicitly. NIS transactions
are rarely much larger than 1024 bytes since YPMAXRECORD is 1024.
The defaults chosen by clntudb_create() are actually much larger
than needed. I set the xmit buffer to a little over 1024 and the
recv buffer to a little over 2048. This saves a few Kbytes for each
NIS binding.
- Add my name to the copyright. I think I've made enough changes to
this file to merit it. :)
Note: these changes should go into the 2.2.x branch, but I'm waiting
on feedback from a tester to see if the cache fixes solve the reported
memory leak problem.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
_yp_dobind() checks to see if a fork() happens (by checking PIDs) and
invalidates all bindings if it finds itself in a newly created child
process. (This avoids sharing RPC client handles and socket descriptors
with the parent, which would be bad.) Unfortunately, it summarily
calls clnt_destroy() on the handles, which may result in the destruction
of a descriptor that isn't really a socket.
This is fixed by replacing the explicit call to clnt_destroy() with a
call to _yp_unbind(), which deals with potentially hosed socket descriptors
an a safe manner.
This is basically a one-liner. Once I confirm that it fixes Christoph's
problem, I'd like permission to put it in the 2.2-RELENG branch.
for NULL RPC client handles. This should hopefully fix the problems
Satoshi reported on -current.
- Add socket descriptor sanity checks to _yp_unbind().
- Fix yp_order() so that it handles the RPC_PROCUNAVAIL error gracefully.
NIS+ in YP compat mode doesn't support the YPPROC_ORDER procedure.
This is a 2.2 candidate with bells on.
directly in order to obtain binding information, check that the local
ypbind is using a reserved port and return YPERR_YPBIND if it isn't.
We should not trust any ypbind running on a port >= IPPORT_RESERVED;
it may have been started by a malicious user hoping to trick us into
talking to a bogus ypserv.
Note that we do not check the ypserv port returned to us from ypbind.
It is assumed that ypbind has already done a reserved port test (or not,
depending on whether or not it was started with -s); if we trust the
authenticity of the local ypbind, we should also trust its judgement.
Obtained from: OpenBSD
Now that we preserve RPC handles instead of rebuilding them each time
a ypcln function is called, we have to be careful about keeping our sockets
in a sane state. It's possible that the caller may call a ypclnt
function, and then decide to close all its file descriptors. This would
also close the socket descriptor held by the yplib code. Worse, it
could re-open the same descriptor number for its own use. If it then calls
another ypclnt function, the subsequent RPC will fail because the socket
will either be gone or replaced with Something Completely Different. The
yplib code will recover by rebinding, but it doing so it may wreck the
descriptor which now belongs to the caller.
To fix this, _yp_dobind() needs to label the descriptor somehow so
that it can test it later to make sure it hasn't been altered between
ypclnt calls. It does this by binding the socket, thus associating a port
number with it. It then saves this port number in the dom_local_port member
of the dom_binding structure for the given domain. When _yp_dobind() is
called again (which it is at the start of each ypclnt function), it checks
to see if the domain is already bound, and if it is, it does a getsockname()
on the socket and compares the port number to the one it saved. If the
getsockname() fails, or the port number doesn't match, it abandons the
socket and sets up a new client handle.
This still incurs some syscall overhead, which is what I was trying to
avoid, but it's still not as bad as before.
to call clnt_destroy() on a potentially NULL RPC handle. Somebody should
bang on this a bit to make sure the problem is really gone; I seem to
have difficulty reproducing it. Patch provided by Peter Wemm and
slightly tweaked by me.
- Don't call _yp_unbind() in individual ypclnt functions unless we encounter
an RPC error while making a clnt_call().
Each of the ypclnt functions does a _yp_dobind() when it starts and then
a _yp_unbind() when it finishes. This is not strictly necessary and it
wastes cycles: it means we do a new clnt_create() and clnt_destroy()
for each yp_whatever() call. In fact, you can do multiple clnt_call()s
using a single RPC client handle returned by clnt_create(). Ideally we only
have to create a handle to ypserv once (the first time we call a ypclnt
function) and then destroy it and rebind only if a call to ypserv fails.
- Modify _yp_dobind() so that it only creates a new RPC client handle
when establishing a new binding or when one of the ypclnt calls
invalidates an existing binding and calls _yp_dobind() to establish
a new one.
- Modify the various ypclnt functions to only call _yp_unbind() if a
call to ypserv fails.
/var/run resides on an NFS filesystem (flock() always returns 0 in
this case, so we falsely assume that ypbind is dead and bail out).
Settle instead for better failure checking when using clnttcp_create()
and clnt_call() to interact with ypbind. We still try to flock()
/var/yp/binding/$DOMAINNAME.2, but if this doesn't work, we drop into
the code that retrieves the binding information from ypbind directly.
If that also fails, then we're toast. On NFS filesystems, this means
we'll be ignoring the binding file for no reason and always talking to
ypbind even though we don't have to, but at least things will work.
(I could just replace the flock(/var/run/ypbind.lock) check with
an RPC call to ypbind's NULLPROC procedure, but if the flock() of
the binding file doesn't pan out we're going to try to talk to
ypbind later anyway. *sigh* Is NFS file locking ever going to work?)
of a successful map retrieval. (This has to do with a previous change
to xdr_ypresp_all_seq() and ypxfr_get_map(); originally, yp_all()
would look for a return value of YP_FALSE to signal success, but now
it should be looking for YP_NOMORE. It should not be passing YP_NOMORE
back up to the caller though.)
Noticed by: <aagero@aage.priv.no>
There is also another small bug here, which is that the call to
xdr_free() that happens immediately after the clnt_call() in yp_all()
clobbers the return status value. I've worked around this for now,
but I think the xdr_free() is actually bogus and should be removed.
I want to check some more before I do that though.
XDR routines auto-generated by rpcgen don't quite match the format of
the original ones even though tey have the same names (that was one of
the things wrong with the old XDR routines).
rpcgen-erated on the fly (just like librpcsvc).
Makefile: Add rule for generating yp_xdr.c and yp.h.
xdryp.c: gut everything except the special ypresp_all XDR function
needed to to handle yp_all() (this one can't be created on
the fly), and xdr_datum(), which isn't used internally by
libc, but which as documented as being there in yp_prot.h,
so what the hell. We now get everything else from yp_xdr.c.
yplib.c: change a few structure member names to match those found in
yp.h instead of those declared in yp_prot.h.
it before before trying to establish a binding. If /var/run/ypbind.lock
doesn't exist, or if it exists and isn't locked, then ypbind isn't
running, which means NIS is either turned off or hosed.
- Have _yp_check() call yp_unbind() after it sucessfully calls yp_bind()
to make sure it frees resources correctly. (I don't think there's really
a memory leak here, but it seems somehow wrong to call yp_bind() without
making a corresponding call to yp_unbind() afterwards.)
This makes the NIS code behave a little better in cases where libc makes
calls to NIS, but it isn't running correctly (i.e. there's no ypbind).
This cleans up some strange libc behavior that manifests itself if
you have the system domain name set, but aren't actually running NIS.
In this event, the getrpcent(3) code could try to call into NIS and
cause several inexplicable "clnttcp_create error: RPC program not
registered" messages to appear. This happens because _yp_check() checks
if the system domain name is set and, if it is, proceeds to call
yp_bind() to attempt to establish a binding. Since there is no
binding file (remember: ypbind isn't running, so /var/yp/binding
will be empty), _yp_dobind() will attempt to contact ypbind to
prod it into binding the domain. And because ypbind isn't running,
the code generates the 'clnttcp_create' error. Ultimately the
_yp_check() fails and the getrpcent(3) code rolls over to the /etc/rpc
file, but the error messages are annoying, and the code should be
smart enough to forgo the binding attempt when NIS is turned off.
on, which is fine, except that _yp_dobind() is called before we check
the cache. The means we can return from the cache check (if we have
a hit) without calling _yp_unbind().
We should do the cache check first and _then_ drop into the section
that binds the server and does the yp_match query.
Strange as it sounds, it should map to YPERR_DOMAIN instead.
The YP_NODOM protocol error code is generally returned by ypserv when you
ask it for data from a domain that it doesn't support. By contrast,
the YPERR_NODOM error code means 'local domain name not set.'
Consequently, this incorrect mapping leads to yperr_string() generating
a very confusing error message. YPERR_DOMAIN says 'couldn't
bind to a server which serves this domain' which is much closer
to the truth.
ypbind.c:
Make fewer assumtions about the state of the dom_alive and dom_broadcasting
flags in roc_received().
If select() fails, use syslog() to report the error rather than perror().
Check that all our malloc()s succeed. Report malloc() failure in
ypbindproc_setdom_2() to callers.
yplib.c:
Use #defined constants in ypbinderr_string() rather than hard-coded values.
- Moved to a more client-driven model. We aggressively attempt to keep
the default domain bound (as before) but we give up on non-default
domains if we lose contact with a server and fail to get a response
after one round of broadcasting. This helps drastically reduce the
amount of network bandwitdh that ypbind consumes: if a client references
the secondary domain at some later point, this will prod ypbind into
establishing a new binding anyway, so continuously broadcasting without
need is pointless.
Note that we still actively seek out a binding for our default domain
even if no client program has queried us yet. I'm not exactly sure if
this matches SunOS's behavior or not, but I decided to do it this way
since we can get into all sorts of trouble if our default domain comes
unbound. Even so, we're still much quieter than we used to be.
- Removed a bunch of no-longer pertinent comments and a couple of
chunks of #ifdef 0'ed code that no longer fit in to the new layout.
- Theo deRaadt must have become frustrated with the callback mechanism
in clnt_broadcast(), because he shamelessly stole the clnt_broadcast()
code right out of the RPC library and hacked it up to suit his needs.
(Comments and all! :)
I can understand why: clnt_broadcast() blocks while awaiting replies.
Changing this behavior requires surgery. However, you can work around
this: fork the broadcast into a child process and relay the results
back to the parent via a pipe. (Careful obervation has shown that the
SunOS ypbind forks children for broadcasting too, though I can only
guess what sort of interprocess communication it uses. pipe() seems to
do the job well enough.)
This may seem like the long way around, but it's not really that
hard to implement, and I'd prefer to use documented RPC library functions
wherever possible. We're careful to limit the number of simultaneous
broadcasters to avoid swamping the system (the current limit is 5).
Each clnt_broadcast() call only sends out a small number of packets
at increasing intervals. We're also careful not to spawn more than one
bradcaster for a given domain.
- Used clntudp_bufcreate() and clnt_call() to implement a ping()
function for directly querying a particular server so that we can
check if it's still alive. This lets me completely remove the old
bradcasting code and use actual RPC library calls instead, at the
cost of more than a few handfulls of torn-out hair. (Make no mistake
folks: I *HATE* RPC.) Currently, the ping interval is one minute.
- Fixed another potential 'nfds too big for select()' bug: use
_rpc_dtablesize() instead of getdtablesize().
- Quieted gcc -Wall a bit.
- Probably a bunch of other stuff that I've forgotten.
ypbind.8:
- Updated man page to reflect modifications.
ypwhich.c:
- Small mind-o fix from last time: decode error results from
ypbind correctly (*groan*)
yplib.c:
- same as above
- Change behavior of _yp_dobind() a little: if we get back a 'Domain
not bound' error for a given domain, retry a few times before giving
up and passing the error back to the caller. We have to sleep for a
few seconds between tries since the 'Domain not bound' error comes
back immediately (by repeatedly looping, we end up pounding on ypbind).
We retry at most 20 times at 5 second intervals. This gives us a full
minute to get a response. This seems to deviate a bit from SunOS
behavior -- it appears to wait forever -- but I don't like the idea
of perpetually hanging inside a library call.
Note that this should fix the problems some people have with bindings
not being established fast enough at boot time; sometimes amd is started
in /etc/rc after ypbind has run but before it gets a binding set up. The
automounter gets annoyed at this and tends to exit. By pausing ther YP
calls until a binding is ready, we avoid this situation.
- Another _yp_dobind() change: if we determine that our binding files
are unlocked or nonexistent, jump directly to code that pokes ypbind
into restablishing the binding. Again, if it fails, we'll time out
eventually and return.
ypbind.c: if a client program asks ypbind for the name of the server
for a particular domain, and there isn't a binding for that domain
available yet, ypbind needs to supply a status value along with its
failure message. Set yprespbody.ypbind_error before returning from
a ypbindproc_domain request.
yplib.c: properly handle the error status messages ypbind now has the
ability to send us. Add a ypbinderr_string() function to decode the
error values.
ypwhich.c: handle ypbind errors correctly: yperr_string() can't handle
ypbind_status messages -- use ypbinderr_string instead.