Random number generator initialization cleanup:
- Introduce new SI_SUB_RANDOM point in boot sequence to make it
clear from where one may start using random(9). It should be as
early as possible, so place it just after SI_SUB_CPU where we
have some randomness on most platforms via get_cyclecount().
- Move stack protector initialization to be after SI_SUB_RANDOM
as before this point we have no randomness at all. This fixes
stack protector to actually protect stack with some random guard
value instead of a well-known one.
Note that this patch doesn't try to address arc4random(9) issues.
With current code, it will be implicitly seeded by stack protector
and hence will get the same entropy as random(9). It will be
securely reseeded once /dev/random is feeded by some entropy from
userland.
Submitted by: Maxim Dounin <mdounin@mdounin.ru>
Approved by: re (kib)
Close a race with caching of -ve name lookups in the NFS client.
Specifically, clients only trust -ve cache entries while the directory
remains unchanged and discard any -ve cache entries for a directory when
they notice that the modification time of a directory entry changes. The
race involves two concurrent lookups as follows:
- Thread A does a lookup for file 'foo' which sends a lookup RPC to the
server. The lookup fails and the server replies.
- The 'foo' file is created (either by the same client or a different
client) updating the modification time on the parent directory of 'foo'.
- Thread B does a lookup for a different file 'bar' which updates the
cached attributes of the parent directory of 'foo' to reflect the new
modification time after 'foo' was created.
- Thread A finally resumes execution to parse the reply from the NFS
server. It adds a -ve cache entry and sets the cached value of the
directory's modification time that is used for invalidating -ve cached
lookups to the new modification time set by thread B.
At this point, future lookups of 'foo' will honor the -ve cached entry
until the cached entry is pushed out of the name cache's LRU or the
modification time of the parent directory is changed again by some other
change. The fix is to read the directory's modification time before
sending the lookup RPC and use that cached modification time when setting
the directory's cached modification time. Also, we do not add a -ve cache
entry if another thread has added -ve cache entry that set the directory's
cached modification time to a newer value than the value we read before
sending the lookup RPC.
Approved by: re (kib)
The flow-table function flowtable_route_flush() may be called
during system initialization time. Since the flow-table is
designed to maintain per CPU flow cache, the existing code
did not check whether "smp_started" is true before calling
sched_bind() and sched_unbind(), which triggers a page fault.
Reviewed by: jeff
Approved by: re
Change from CAM_TID_INVALID to CAM_SEL_TIMEOUT error code when the usb device
has been yanked, this works around a cam recounting bug when
CAM_DEV_UNCONFIGURED is set late in the detach. In certain conditions the
reference to the XPT device would not be released which would cause the usb
explore thread to sleep forever on "simfree", preventing any new usb devices to
be found/ejected on the bus.
Approved by: re (kib)
Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.
Approved by: re (kensmith)
In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.
Reviewed by: discussed with kmacy
Approved by: re
Verified by: rnoland, yongari
fixes a TX hang bug that it could happen when if_start callback didn't
be restarted by full of the output queue.
Tested by: bsduser <bsd at acd.homelinux.org>
MFC r198099:
fixes a TX hang that could be possible to happen when the trasfers are
in the high speed that some drivers don't call if_start callback after
marking ~IFF_DRV_OACTIVE.
Approved by: re (kib)
This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.
4. The return error code was incorrect.
Reviewed by: kmacy
Approved by: re
of problems on non-DELL branded machines with IPMI
support. The proposed fix was committed to HEAD but has
not received much test coverage yet.
Discussed with: bz
Approved by: re (kensmith)
Map PIE binaries at non-zero base address.
MFC r198202:
Honour non-zero mapbase for PIE binaries. Inform interpreter-less PIE
binary about its relocbase.
Approved by: re (kensmith)
Define architectural load bases for PIE binaries.
MFC r198203 (by marius):
Change load base for sparc to match default gcc memory layout model.
Approved by: re (kensmith)
Use zfs_read() instead of xfsread() to read /boot.config. xfsread() fails
short read requests, so the result was that a /boot.config smaller than 512
bytes was ignored. boot2 uses fsread() instead of xfsread() to read
/boot.config already, so this makes zfsboot more like boot2.
Approved by: re (kib)
In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.
Approved by: re (kib), davidxu
Refine r195509, instead of checking that vnode type is VBAD, that is
set quite late in the revocation path, properly verify that vnode is
not doomed before calling VOP.
Approved by: re (bz)
If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.
During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there. This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).
Reported by: des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by: des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with: phk, marcel
Reviewed by: marcel
Approved by: re (kib)
r197831:
Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().
PR: kern/118320
Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz>
r197842:
Fix white-spaces.
r197843:
On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.
r197860:
File system owner is when uid matches and jail matches.
r197861:
Allow file system owner to modify system flags if securelevel permits.
Approved by: re (kib)
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory".
Not all our memory barriers, right now, clobber memory for GCC-like
compilers.
Fix these cases.
Approved by: re (kib)
When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.
Approved by: re (kib)
Fix RTS/CTS flow control, broken by the TTY overhaul. The new TTY
interface is fairly simple WRT dealing with flow control, but
needed 2 new RX buffer functions with "get-char-from-buf" separated
from "advance-buf-pointer" so that the pointer could be advanced
only when ttydisc_rint() succeeded.
Approved by: re (kib)
Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.
Approved by: re (kib, bz)
In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order. In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.
Reported by: Kamigishi Rei <spambox at haruhiism.net>
Reviewed by: bz
Approved by: re (kib)
unifdef NFSCLIENT because the nlm depends on the nfsclient even if NFSCLIENT
is not defined.
Now the nfslockd module works with the nfsclient module.
Reviewed by: kib
Approved by: re (kensmith)
Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.
Approved by: re
Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.
PR: kern/139113
Reviewed by: bz
Approved by: re
The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.
Reviewed by: bz, kmacy
Approved by: re
Fix some unexpected potential NULL de-references in kernel mode due to
usage of pre-8.0 wifi operations with the ndis driver wrapping a Win32/64
wifi driver.
Submitted by: Paul B Mahol <onemda@gmail.com>
Approved by: re
Use __NO_STRICT_ALIGNMENT to determine whether de(4) have to apply
alignment fixup code for received frames on strict alignment
architectures.
MFC r197463:
Consistently use bus_addr_t.
MFC r197464:
Destroy dmamap in dma cleanup.
MFC r197465:
Align Tx/Rx descriptors on 32 bytes boundary instead of PAGE_SIZE.
Also align setup descriptor on 32 bytes boundary. Tx buffer have no
alignment limitation so create dmamap without alignment
restriction[1]. Rx buffer still seems to require 4 bytes alignment
limitation but we can simply use MCLBYTES for size to map the
buffer instead of TULIP_DATA_PER_DESC as the buffer is allocated
with m_getcl(9).
de(4) supports up to TULIP_MAX_TXSEG segments for Tx buffers,
increase maximum dma segment size to TULIP_MAX_TXSEG * MCLBYTES.
While I'm here remove TULIP_DATA_PER_DESC as it is not used anymore.
This should fix de(4) breakage introduced after r176206.
Submitted by: jhb [1]
Reported by: WATANABE Kazuhiro < CQG00620 <> nifty dot ne dot jp >
Tested by: WATANABE Kazuhiro < CQG00620 <> nifty dot ne dot jp >,
Takahashi Yoshihiro < nyan <> jp dot freebsd dot org >
Approved by: re (kib)
Two more mxge watchdog fixes
1) Restore the PCI Express control register after a watchdog
reset. This is required because the device will come out
of watchdog reset with the pectl reg at its default state,
and important BIOS configuration (like max payload size)
could be lost.
2) Call mxge_start_locked() for every tx queue before dropping
the lock in the watchdog handler. This is required, as
the queue's buf ring may have filled during the reset.
Approved by: re (kib)