Import two PCI quirks from Linux
- Add quirk for ATI SB600 and SB700 to free SMB controller
- Correct schedule sleep time to 10us on the VIA ehci controller
We used force all of the GPIO pins low first and then
enable the ones we want. This has been changed to better
match the ADMtek's reference design to avoid setting the
power-down configuration line of the PHY at the same time
it is reset.
- FIFO's are always opened separately in read and write direction even if the
actual device is opened for read and write. Fix fflags check so that the UFM
and URIO drivers work.
Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.
Split the 'video' ACPI lock up into two locks to resolve a LOR with the
sysctl lock. The 'video' lock now protects the 'bus' of video output
devices attached to a graphics adapter. It is used when iterating over
the list of outputs, etc. The 'video_output' lock is used to lock the
output-specific data similar to a driver lock for the individual video
outputs.
Extend the device pager to support different memory attributes on different
pages in an object.
- Add a new variant of d_mmap() currently called d_mmap2() which accepts
an additional in/out parameter that is the memory attribute to use for
the requested page.
- A driver either uses d_mmap() or d_mmap2() for all requests but not both.
The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate
that the driver provides a d_mmap2() handler instead of d_mmap(). This
is done to make the change ABI compatible with existing drivers and
MFC'able to 7 and 8.
Random number generator initialization cleanup:
- Introduce new SI_SUB_RANDOM point in boot sequence to make it
clear from where one may start using random(9). It should be as
early as possible, so place it just after SI_SUB_CPU where we
have some randomness on most platforms via get_cyclecount().
- Move stack protector initialization to be after SI_SUB_RANDOM
as before this point we have no randomness at all. This fixes
stack protector to actually protect stack with some random guard
value instead of a well-known one.
Note that this patch doesn't try to address arc4random(9) issues.
With current code, it will be implicitly seeded by stack protector
and hence will get the same entropy as random(9). It will be
securely reseeded once /dev/random is feeded by some entropy from
userland.
Submitted by: Maxim Dounin <mdounin@mdounin.ru>
Approved by: re (kib)
Close a race with caching of -ve name lookups in the NFS client.
Specifically, clients only trust -ve cache entries while the directory
remains unchanged and discard any -ve cache entries for a directory when
they notice that the modification time of a directory entry changes. The
race involves two concurrent lookups as follows:
- Thread A does a lookup for file 'foo' which sends a lookup RPC to the
server. The lookup fails and the server replies.
- The 'foo' file is created (either by the same client or a different
client) updating the modification time on the parent directory of 'foo'.
- Thread B does a lookup for a different file 'bar' which updates the
cached attributes of the parent directory of 'foo' to reflect the new
modification time after 'foo' was created.
- Thread A finally resumes execution to parse the reply from the NFS
server. It adds a -ve cache entry and sets the cached value of the
directory's modification time that is used for invalidating -ve cached
lookups to the new modification time set by thread B.
At this point, future lookups of 'foo' will honor the -ve cached entry
until the cached entry is pushed out of the name cache's LRU or the
modification time of the parent directory is changed again by some other
change. The fix is to read the directory's modification time before
sending the lookup RPC and use that cached modification time when setting
the directory's cached modification time. Also, we do not add a -ve cache
entry if another thread has added -ve cache entry that set the directory's
cached modification time to a newer value than the value we read before
sending the lookup RPC.
Approved by: re (kib)
The flow-table function flowtable_route_flush() may be called
during system initialization time. Since the flow-table is
designed to maintain per CPU flow cache, the existing code
did not check whether "smp_started" is true before calling
sched_bind() and sched_unbind(), which triggers a page fault.
Reviewed by: jeff
Approved by: re
Change from CAM_TID_INVALID to CAM_SEL_TIMEOUT error code when the usb device
has been yanked, this works around a cam recounting bug when
CAM_DEV_UNCONFIGURED is set late in the detach. In certain conditions the
reference to the XPT device would not be released which would cause the usb
explore thread to sleep forever on "simfree", preventing any new usb devices to
be found/ejected on the bus.
Approved by: re (kib)
Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.
Approved by: re (kensmith)
In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.
Reviewed by: discussed with kmacy
Approved by: re
Verified by: rnoland, yongari
fixes a TX hang bug that it could happen when if_start callback didn't
be restarted by full of the output queue.
Tested by: bsduser <bsd at acd.homelinux.org>
MFC r198099:
fixes a TX hang that could be possible to happen when the trasfers are
in the high speed that some drivers don't call if_start callback after
marking ~IFF_DRV_OACTIVE.
Approved by: re (kib)
This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
ARP entry was timing out too quickly (1 second timeout), as
such, a new entry is created each time arpresolve() is called.
Therefore the maximum attempts made is always 1. Consequently
the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
lifetime.
3. Return "incomplete" entries to the application.
4. The return error code was incorrect.
Reviewed by: kmacy
Approved by: re
of problems on non-DELL branded machines with IPMI
support. The proposed fix was committed to HEAD but has
not received much test coverage yet.
Discussed with: bz
Approved by: re (kensmith)
Map PIE binaries at non-zero base address.
MFC r198202:
Honour non-zero mapbase for PIE binaries. Inform interpreter-less PIE
binary about its relocbase.
Approved by: re (kensmith)
Define architectural load bases for PIE binaries.
MFC r198203 (by marius):
Change load base for sparc to match default gcc memory layout model.
Approved by: re (kensmith)
Use zfs_read() instead of xfsread() to read /boot.config. xfsread() fails
short read requests, so the result was that a /boot.config smaller than 512
bytes was ignored. boot2 uses fsread() instead of xfsread() to read
/boot.config already, so this makes zfsboot more like boot2.
Approved by: re (kib)
In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.
Approved by: re (kib), davidxu
Refine r195509, instead of checking that vnode type is VBAD, that is
set quite late in the revocation path, properly verify that vnode is
not doomed before calling VOP.
Approved by: re (bz)
If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.
During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there. This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).
Reported by: des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by: des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with: phk, marcel
Reviewed by: marcel
Approved by: re (kib)
r197831:
Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().
PR: kern/118320
Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz>
r197842:
Fix white-spaces.
r197843:
On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.
r197860:
File system owner is when uid matches and jail matches.
r197861:
Allow file system owner to modify system flags if securelevel permits.
Approved by: re (kib)
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory".
Not all our memory barriers, right now, clobber memory for GCC-like
compilers.
Fix these cases.
Approved by: re (kib)
When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.
Approved by: re (kib)
Fix RTS/CTS flow control, broken by the TTY overhaul. The new TTY
interface is fairly simple WRT dealing with flow control, but
needed 2 new RX buffer functions with "get-char-from-buf" separated
from "advance-buf-pointer" so that the pointer could be advanced
only when ttydisc_rint() succeeded.
Approved by: re (kib)
Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.
Approved by: re (kib, bz)
In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order. In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.
Reported by: Kamigishi Rei <spambox at haruhiism.net>
Reviewed by: bz
Approved by: re (kib)
unifdef NFSCLIENT because the nlm depends on the nfsclient even if NFSCLIENT
is not defined.
Now the nfslockd module works with the nfsclient module.
Reviewed by: kib
Approved by: re (kensmith)
Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.
Approved by: re
Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.
PR: kern/139113
Reviewed by: bz
Approved by: re
The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.
Reviewed by: bz, kmacy
Approved by: re
Fix some unexpected potential NULL de-references in kernel mode due to
usage of pre-8.0 wifi operations with the ndis driver wrapping a Win32/64
wifi driver.
Submitted by: Paul B Mahol <onemda@gmail.com>
Approved by: re
Use __NO_STRICT_ALIGNMENT to determine whether de(4) have to apply
alignment fixup code for received frames on strict alignment
architectures.
MFC r197463:
Consistently use bus_addr_t.
MFC r197464:
Destroy dmamap in dma cleanup.
MFC r197465:
Align Tx/Rx descriptors on 32 bytes boundary instead of PAGE_SIZE.
Also align setup descriptor on 32 bytes boundary. Tx buffer have no
alignment limitation so create dmamap without alignment
restriction[1]. Rx buffer still seems to require 4 bytes alignment
limitation but we can simply use MCLBYTES for size to map the
buffer instead of TULIP_DATA_PER_DESC as the buffer is allocated
with m_getcl(9).
de(4) supports up to TULIP_MAX_TXSEG segments for Tx buffers,
increase maximum dma segment size to TULIP_MAX_TXSEG * MCLBYTES.
While I'm here remove TULIP_DATA_PER_DESC as it is not used anymore.
This should fix de(4) breakage introduced after r176206.
Submitted by: jhb [1]
Reported by: WATANABE Kazuhiro < CQG00620 <> nifty dot ne dot jp >
Tested by: WATANABE Kazuhiro < CQG00620 <> nifty dot ne dot jp >,
Takahashi Yoshihiro < nyan <> jp dot freebsd dot org >
Approved by: re (kib)
Two more mxge watchdog fixes
1) Restore the PCI Express control register after a watchdog
reset. This is required because the device will come out
of watchdog reset with the pectl reg at its default state,
and important BIOS configuration (like max payload size)
could be lost.
2) Call mxge_start_locked() for every tx queue before dropping
the lock in the watchdog handler. This is required, as
the queue's buf ring may have filled during the reset.
Approved by: re (kib)
EHCI Hardware BUG workaround
The EHCI HW can use the qtd_next field instead of qtd_altnext when a short
packet is received. This contradicts what is stated in the EHCI datasheet.
Also the total-bytes field in the status field of the following TD gets
corrupted upon reception of a short packet! We work this around in software by
not queueing more than one job/TD at a time of up to 16Kbytes! The bug has been
seen on multiple INTEL based EHCI chips. Other vendors have not been tested
yet.
- Applications using /dev/usb/X.Y.Z, where Z is non-zero are affected, but not
applications using LibUSB v0.1, v1.2 and v2.0.
- Mass Storage (umass) is affected.
Approved by: re (kib)
As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features.
Approved by: re (bz, kensmith)
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.
Submitted by: pjd (response of my original patch but bugs are mine)
Approved by: re (kib)
Add no zero mapping feature.
NOTE: Unlike in the other branches where this change will be "merged"
to, the 'no zero mapping' is enabled by default in stable/8.
Errata: FreeBSD-EN-09:05.null
Approved by: re (kib)
Add '#define NFSCLIENT' into opt_nfs.h if the NFSCLIENT variable is 1
(the default is 1).
This makes the nfslockd module works for NFS client.
Reviewed by: dfr
Approved by: re (kib)
r197512, r197513, r197514, r197515, r197525:
r197287:
Purge namecache for the file system being rolled back, so it doesn't point at
invalid vnodes after the rollback resulting in EIO errors when trying to access
files which are in the namecache.
Reported by: des
r197289:
Purge file system namecache when receiving incremental stream and rolling back
to it.
r197351:
Purge namecache in the same place OpenSolaris does.
r197426:
Restore BSD behaviour - when creating new directory entry use parent directory
gid to set group ownership and not process gid.
This was overlooked during v6 -> v13 switch.
PR: kern/139076
Reported by: Sean Winn <sean@gothic.net.au>
r197458:
Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.
r197459:
Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.
The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.
r197497:
Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to
be a bit weak and OpenSolaris also switched to fletcher4.
r197498: head/cddl/contrib/opensolaris
Fletcher4 is not the default checksum algorithm.
r197512:
- Don't depend on value returned by gfs_*_inactive(), it doesn't work
well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.
PR: kern/139062
Reported by: trasz
r197513:
Use traverse() function to find and return mount point's vnode instead of
covered vnode when snapshot is already mounted.
r197514:
On lookup error VFS expects *vpp to be set to NULL, be sure to do that.
r197515:
Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.
PR: kern/139062
Reported by: trasz
r197525:
Ensure that tv_sec is between INT32_MIN and INT32_MAX, so ZFS won't object.
This completes the fix from r185586.
PR: kern/139059
Reported by: Daniel Braniss <danny@cs.huji.ac.il>
Submitted by: Jaakko Heinonen <jh@saunalahti.fi>
Tested by: Daniel Braniss <danny@cs.huji.ac.il>
Approved by: re (kib)
- According to Linux, the ALi M5451 can do 31-bit DMA instead of just
30-bit like the reset of the controllers supported by this driver.
Actually ALi M5451 can be setup up to generate 32-bit addresses by
setting the 31st bit via the accompanying ISA bridge, which allows
it to work in sparc64 machines whose IOMMU require at least 32-bit
DMA. Even though other architectures would also benefit from 32-bit
DMA, enabling this bit is limited to sparc64 as bus_dma(9) doesn't
generally guarantee that a low address of BUS_SPACE_MAXADDR_32BIT
results in a buffer in the 32-bit range.
- According to Tatsuo YOKOGAWA's ali(4), the the DMA transfer size of
ALi M5451 is fixed to 64k and in fact using the default size of 4k
causes the chip to overrun the mapping, triggering uncorrectable
DMA errors on sparc64.
- The 4DWAVE DX and NX require the recording buffer to be 8-byte
aligned so adjust the bus_dma_tag_create(9) accordingly.
- Unlike the rest of the controllers supported by this driver, the
ALi M5451 only has 32 hardware channels instead of 64 so limit the
loop in tr_intr() accordingly. [1]
Submitted by: yongari [1]
Reviewed by: yongari (superset of what is committed)
Approved by: re (kib)
alignment requirements. It is busdma task, to manage proper alignment by
loading data to bounce buffers.
PR: kern/127316
Reviewed by: current@
Tested by: Ryan Rogers
Approved by: re (kib)
Do not call BUS_DRIVER_ADDED() for detached buses (attach failed) on
driver load. This fixes crash on atapicam module load on systems, where
some ata channels (usually ata1) was probed, but failed to attach.
Reviewed by: jhb, imp
Tested by: many
Approved by: re (kib)
the work area was totally unsynchronized which means this driver only
had a chance of working on x86 when no bounce buffers were involved,
which isn't that likely given that support for 64-bit DMA is currently
broken throughout ata(4).
- Add necessary little-endian conversion of accesses to the work area,
making this driver work on big-endian hosts. While at it, use the
alignment-agnostic byte order encoders in order to be on the safe side.
- Clear the reserved member of the SG list entries in order to be on the
safe side. [1]
Submitted by: yongari [1]
Reviewed by: yongari
Approved by: re (kib)
The elements in the component arrays may be direct Package objects rather
than references to objects. In that case, simply use the Package directly.
Approved by: re (kib)
- Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.
Approved by: re (kib)
Don't reread the command register to see if enabling I/O or memory
decoding "took". Other OS's that I checked do not do this and it breaks
some amdpm(4) devices. Prior to 7.2 we did not honor the error returned
when this failed anyway, so this in effect restores previous behavior.
Approved by: re (kib)
Allocate space for the group array in a static credential used in
the quota code. One case was correctly handled in r194498, but
this one was missed.
PR: kern/138657
Tested by: PR submitter
MFC after: 3 days
Approved by: re@ (kib)
Re-remove the IBM0057 ID used for PS/2 mouse controllers. The asl for the
61p includes the hotkey device as IBM0068 and the mouse as IBM0057 similar
to other systems.
Approved by: re (kensmith)
Make the sudden motion sensor work on older models and add a bit of
debugging.
Submitted by: Christoph Langguth <christoph at rosenkeller.org>
Approved by: re (kib)
A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.
Submitted by: noted by bz
Reviewed by: hrs
Approved by: re (kib)
Different sub-kinds of PCI buses may have different rules and
thus it is up for the bus backends to do proper input checks.
For example, PCIe allows configuration register numbers < 0x1000,
while for PCI proper the limit is 0x100.
And, in fact, the buses already do the checks.
Reviewed by: jhb
Approved by: re (kib)
Add a few SCSI controllers to GENERIC that can be found in Powermacs.
This allows installation onto SCSI disks as shipped, for example,
as an option with the Powermac G3.
PR: powerpc/138543
Reviewed by: grehan
Approved by: re (kib)
Obtained from: sparc64
Remove some debugging (KTR_VERBOSE) that crept into ppc GENERIC long ago
and is present on no other architectures by default.
Reviewed by: grehan
Approved by: re (kib)
Fix some instances where CAM rescans get hung up or take a long time to
complete.
Also, allow xpt_rescan() to rescan a LUN instead of a full bus.
Sponsored by: Copan Systems, Inc.
Approved by: re (kib)
Fixes to mcast userland API.
--
Fix an API issue in leave processing for IPv4 multicast groups.
* Do not assume that the group lookup performed by imo_match_group()
is valid when ifp is NULL in this case.
* Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
membership we are being asked to leave.
Caveat user:
* The way IPv4 multicast memberships are implemented in the inpcb layer
at the moment, has the side-effect that struct ip_moptions will
still hold the membership, under the old ifp, until ip_freemoptions()
is called for the parent inpcb.
* The underlying issue is: the inpcb layer does not get notification
of ifp being detached going away in a thread-safe manner.
This is non-trivial to fix.
--
Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.
--
Tighten input checking in inp_join_group():
* Don't try to use the source address, when its family is unspecified.
* If we get a join without a source, on an existing inclusive
mode group, this is an error, as it would change the filter mode.
Fix a problem with the handling of in_mfilter for new memberships:
* Do not rely on imf being NULL; it is explicitly initialized to a
non-NULL pointer when constructing a membership.
* Explicitly initialize *imf to EX mode when the source address
is unspecified.
This fixes a problem with in_mfilter slot recycling in the join path.
--
Don't allow joins w/o source on an existing group.
This is almost always pilot error.
We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IP_MSFILTER
is also disallowed.
Deals with the case described in PR 137164 upfront, cumulative
with the fix in svn rev 197132 which only calls imo_match_source()
if the source address family was not unspecified.
--
Revision 197136 has a text conflict, however it is a comment only change.
PR: 137164, 138689, 138690, 138691
Submitted by: Stef Walter (with fixups)
Approved by: re (kib)
- Prevent a panic on modern controllers by increasing CISS_MAX_PHYSTGT to 256
- Fix MSI and PERFORMANT interrupt programming. Fixes hang on boot.
- Fix locking bugs in ioctl handler
Most of this has been soaking at Yahoo for several months, if not longer. The
quick MFC is due to the impending 8.0-RC1 build.
Approved by: re
Obtained from: Yahoo!
Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.
Approved by: re, rrs (mentor)
1) A lock issue, if we ever had to try again
we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
we cannot allow since 0 normally means in most socket
API calls that we are wishing to effect something on
the INP not TCB.
Approved by: re, rrs (mentor)
Calculate the amount of bytes to copy for select filedescriptor masks
taking into account size of fd_set for the current process ABI.
Approved by: re (kensmith)
Clean up Marvell platform code.
Introduce SheevaPlug support.
- The device is based on Marvell 88F6281 system on chip.
- More info about the platform at http://www.plugcomputer.org
- To build the FreeBSD kernel:
make buildkernel TARGET_ARCH=arm KERNCONF=SHEEVAPLUG
- Installation notes at: http://wiki.freebsd.org/FreeBSDMarvell
Submitted by: Michal Hajduk
Approved by: re (kib)
Obtained from: Semihalf
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.
Reviewed by: bz
Approved by: re
The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.
r197235 reverts file nfs_vfsops.c
Reviewed by: kmacy
Approved by: re
This patch fixes the following issues:
- Routing messages are not generated when adding and removing
interface address aliases.
- Loopback route installed for an interface address alias is
not deleted from the routing table when that address alias
is removed from the associated interface.
- Function in_ifscrub() is called extraneously.
Reviewed by: gnn, kmacy, sam
Approved by: re
Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.
Reviewed by: bz
Approved by: re
Use explicit int values for the device states in order to allow, if
necessary, in the future, adds of new states without breaking ABI
between revisions.
Please note that this is a special condition as we want this fix in
before RC1 as we assume it is critical and so it has been handled
as an instant-merge.
Approved by: re (kib)
Fix sched_switch_migrate() by assuming locks cannot be shared and a
deadlock between 3 different threads by acquiring both runqueue locks
when doing the migration.
Please note that this is a special condition as we want this fix in
before RC1 as we assume it is critical and so it has been handled
as an instant-merge. For the STABLE_7 branch, 1 week before the MFC
is assumed.
Approved by: re (kib)
The clear_remove() and clear_inodedeps() call vn_start_write(NULL, &mp,
V_NOWAIT) on the non-busied mount point. Unmount might free ufs-specific
mp data, causing ffs_vgetf() to access freed memory.
Busy mountpoint before dropping softdep lk.
Approved by: re (kensmith)
Remove 'ad:' prefix from disk serial number. We don't want serial number
to change when we reconnect the disk in a way that it is accessible through
CAM for example.
Discussed with: trasz
Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.
Discussed with: trasz
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)
Make serial numbers of daX disks visible by GEOM.
No objections from: scottl
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)
Approved by: re (kib)
r196943,r196944,r196947,r196950,r196953,r196954,r196965,r196978,r196979,
r196980,r196982,r196985,r196992,r197131,r197133,r197150,r197151,r197152,
r197153,r197167,r197172,r197177,r197200,r197201:
r196456:
- Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().
r196457:
Set priority of vdev_geom threads and zvol threads to PRIBIO.
r196458:
- Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
'vdev:worker da0' -> 'vdev da0'
r196662:
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.
r196702:
Remove empty directory.
r196703:
Backport the 'dirtying dbuf' panic fix from newer ZFS version.
Reported by: Thomas Backman <serenity@exscape.org>
r196919:
bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.
PR: kern/135480
Reported by: Emil Mikulic <emikulic@gmail.com>
r196927:
Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.
PR: kern/136942
Requested by: Bernard Buri <bsd@ask-us.at>
r196928:
Teach zdb(8) how to obtain GEOM provider size.
PR: kern/133134
Reported by: Philipp Wuensche <cryx-freebsd@h3q.com>
r196943:
- Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.
r196944:
Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.
Noticed by: kib
Reviewed by: kib
r196947:
Defer thread start until we set priority.
Reviewed by: kib
r196950:
Fix detection of file system being shared. Now zfs unshare/destroy/rename
command will properly remove exported file systems.
r196953:
When snapshot mount point is busy (for example we are still in it)
we will fail to unmount it, but it won't be removed from the tree,
so in that case there is no need to reinsert it.
Reported by: trasz
r196954:
If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.
r196965:
Fix reference count leak for a case where snapshot's mount point is updated.
r196978:
Call ZFS_EXIT() after locking the vnode.
r196979:
On FreeBSD we don't have to look for snapshot's mount point,
because fhtovp method is already called with proper mount point.
r196980:
When we automatically mount snapshot we want to return vnode of the mount point
from the lookup and not covered vnode. This is one of the fixes for using .zfs/
over NFS.
r196982:
We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.
r196985:
Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.
Reported by: Denis Ahrens <denis@h3q.com>
r196992:
Implement __assert() for Solaris-specific code. Until now Solaris code was
using Solaris prototype for __assert(), but FreeBSD's implementation.
Both take different arguments, so we were either core-dumping in assert()
or printing garbage.
Reported by: avg
r197131:
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain
NULL, but also can point to dead vnode, take that into account.
PR: kern/132068
Reported by: Edward Fisk <7ogcg7g02@sneakemail.com>, kris
Fix based on patch from: Jaakko Heinonen <jh@saunalahti.fi>
r197133:
- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
z_dbuf field is NULL - this might happen in case of rollback or forced
unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
done asynchronously via zfs_reclaim_complete().
r197150:
There is a bug where mze_insert() can trigger an assert() of inserting
the same entry twice. This bug is not fixed yet, but leads to situation
where when try to access corrupted directory the kernel will panic.
Until the bug is properly fixed, try to recover from it and log that it
happened.
Reported by: marck
OpenSolaris bug: 6709336
r197151:
Be sure not to overflow struct fid.
r197152:
Extend scope of the z_teardown_lock lock for consistency and "just in case".
r197153:
When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.
r197167:
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.
PR: kern/125149
Reported by: Weldon Godfrey <wgodfrey@ena.com>
Analysis by: Jaakko Heinonen <jh@saunalahti.fi>
r197172:
Add missing \n.
Reported by: marck
r197177:
Support both case: when snapshot is already mounted and when it is not yet
mounted.
r197200:
Modify mount(8) to skip MNT_IGNORE file systems by default, just like df(1)
does. This is not POLA violation, because there is no single file system in the
base that use MNT_IGNORE currently, although ZFS snapshots will be mounted with
MNT_IGNORE after next commit.
Reviewed by: kib
r197201:
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
mount(8) and df(1) output by default.
Reviewed by: kib
Approved by: re (bz)
Don't malloc a buffer while holding the prison0 mutex. Instead, use a loop
where we figure out the hostname length under the lock, malloc the buffer
with the lock dropped, then recheck the length under the lock and loop again
if the buffer is now too small.
Approved by: re (kib)
Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.
Suggested by: kib
Approved by: re (kensmith), kib (mentor)
Change 'dev.cpu.N.temperature', sysctl I (degC) to IK (Kelvin),
to match acpi_thermal(4) and amdtemp(4).
Approved by: re (rwatson)
Reviewed by: rpaulo
Suggested by: ume
Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook.
The hook calls vn_fullpath(9), that should not be executed with a vnode
lock held.
Approved by: re (kensmith)
In vfs_mark_atime(9), be resistent against reclaimed vnodes.
Assert that neccessary locks are taken, since vop might not be called.
Approved by: re (kensmith)
When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.
Reviewed by: bms
Approved by: re (kib)
for stable branches:
- shift to MALLOC_PRODUCTION
- turn off automatic crash dumps
- Remove kernel debuggers, INVARIANTS*[1], WITNESS* from
GENERIC kernel config files[2]
[1] INVARIANTS* left on for ia64 by request marcel
[2] sun4v was left as-is
Reviewed by: marcel, kib
Approved by: re (implicit)
insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.
Approved by: re (kensmith)
In fhopen, vfs_ref() the mount point while vnode is unlocked, to prevent
vn_start_write(NULL, &mp) from operating on potentially freed or reused
struct mount *.
Remove unmatched vfs_rel() in cleanup.
Approved by: re (kensmith)
SYSCTLs which are inappropriate for a daily use of the machine (mostly
useful only by a developer which wants to run benchmarks on it).
Remove them before the release as long as we do not want to ship with
them in.
Now that the SYSCTLs are gone, instead than use static storage for some
constants, use real numeric constants in order to avoid eventual compiler
dumbiness and the risk to share a storage (and then a cache-line) among
CPUs when doing adaptive spinning together.
Pleasse note that the sys/linker_set.h inclusion in lockmgr and sx lock
support could have been gone, but re@ preferred them to be in order to
minimize the risk of problems on future merging.
Please note that this patch is not a MFC, but an 'edge case' as commit
directly to stable/8, which creates a diverging from HEAD.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Approved by: re (kib)
fix adaptive spinning in lockmgr by using correctly GIANT_RESTORE and
continue statement and improve adaptive spinning for sx lock by just
doing once GIANT_SAVE.
Approved by: re (kib)