MFdragonfly: resolver fix for timeouts on unqualified hostnames

res_search only incremented got_servfail for h_errno == TRY_AGAIN *AND*
  hp->rcode == SERVFAIL.  However, there are cases such as timeouts where
  rcode is not always set to SERVFAIL.  This leads to inconsistent nameserver
  operation during multi-domain and truncated dot searches, especially during
  booting when portions of the network are being brought up simultanious with
  dns lookups.

  This patch attempts to correct the problem by unconditionally terminating
  the search if TRY_AGAIN is returned (after res_query has gone through all
  retries and name servers) instead of trying other domain elements in the
  domain seach path.

  This patch should fix reported problems (which I can reproduce) with some
  NFS mounts failing during boot.  This occured because mount_nfs thought the
  host name lookup returned a definitive failure using a non-dotted host name
  when, in fact, it timed out on the first part (host.search.domain.name) and
  got a definitive host-not-found response on the second part (host.).

  Generally speaking, search path name server timeouts can exceed 60 seconds
  per element and most machines which consistently timeout on earlier portions
  of a search path are effectively non-operational due to the imposed delays.
  It is more important for DNS lookups to return the proper error code then
  to be able to recover a valid lookup in later portions of the search path
  in these situations.

Obtained from:	DragonFly
MFC after:	3 weeks
This commit is contained in:
Nate Lawson 2004-04-21 00:56:38 +00:00
parent 75988358a2
commit 1cc11684ac

View File

@ -273,11 +273,24 @@ res_search(name, class, type, answer, anslen)
/* keep trying */
break;
case TRY_AGAIN:
if (hp->rcode == SERVFAIL) {
/* try next search element, if any */
got_servfail++;
break;
}
/*
* This can occur due to a server failure
* (that is, all listed servers have failed),
* or all listed servers have timed out.
* hp->rcode may not be set to SERVFAIL in the
* case of a timeout.
*
* Either way we must terminate the search
* and return TRY_AGAIN in order to avoid
* non-deterministic return codes. For
* example, loaded name servers or races
* against network startup/validation (dhcp,
* ppp, etc) can cause the search to timeout
* on one search element, e.g. 'fu.bar.com',
* and return a definitive failure on the
* next search element, e.g. 'fu.'.
*/
++got_servfail;
/* FALLTHROUGH */
default:
/* anything else implies that we're done */