- add FreeBSD implementation of xdrmem_control needed by zfs
- have zfs define xdr_ops using FreeBSD's definition
- remove solaris xdr files from zfs compile
rather than unconditionally making partially dirty pages fully dirty, only
make partially dirty pages fully dirty if the pmap says that the page has
been modified.
(This change is also a small optimization. It eliminate an unnecessary call
to pmap_is_modified() on pages that are mapped read only.)
Suggested by: tegge
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.
This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.
PR: 135008
I'm not sure what series of events is leading up to this watch event
being received with no callback info and it should be investigated.
I'm triggering it somehow by registering an RTC device (which will
show up in a subsequent commit.)
Calculate the exact number of vectors we'll use before calling
pci_alloc_msix. Don't grab nine all the time.
Call cxgb_setup_interrupts once per T3, not once per port. Ditto
for cxgb_teardown_interrupts.
Don't leak resources when interrupt setup fails in the middle.
Obtained from: Navdeep Parhar
MFC after: 10 days
do them for NFSv4 and flush writes to the server before doing
the Close(s), as required. Also, use the a_td argument instead of
curthread.
Approved by: kib (mentor)
Last year I added SLIST_REMOVE_NEXT and STAILQ_REMOVE_NEXT, to remove
entries behind an element in the list, using O(1) time. I recently
discovered NetBSD also has a similar macro, called SLIST_REMOVE_AFTER.
In my opinion this approach is a lot better:
- It doesn't have the unused first argument of the list pointer. I added
this, mainly because OpenBSD also had it.
- The _AFTER suffix makes a lot more sense, because it is related to
SLIST_INSERT_AFTER. _NEXT is only used to iterate through the list.
The reason why I want to rename this now, is to make sure we don't
release a major version with the badly named macros.
- add key mappings for fn keys
- byte swapping for certain models
- Fix leds for keyboards which require an ID byte for the HID output structures
Submitted by: Hans Petter Selasky
ip6_input.c, in6.h:
* Add netinet6-specific mbuf flag M_RTALERT_MLD, shadowing M_PROTO6.
* Always set this flag if HBH Router Alert option is present for MLD,
even when not forwarding.
icmp6.c:
* In icmp6_input(), spell m->m_pkthdr.rcvif as ifp to be consistent.
* Use scope ID for verifying input. Do not apply SSM filters here, no inpcb.
* Check for M_RTALERT_MLD when validating MLD traffic, as we can't see
IPv6 hop options outside of ip6_input().
in6_mcast.c:
* Use KAME scope/zone ID in in6_multi.
* Update net.inet6.ip6.mcast.filters implementation to use scope IDs
for comparisons.
* Fix scope ID treatment in multicast socket option processing.
Scope IDs passed in from userland will be ignored as other less
ambiguous APIs exist for specifying the link.
* Tighten userland input checks in IPv6 SSM delta and full-state ops.
* Source filter embedded scope IDs need to be revisited, for now
just clear them and ignore them on input.
* Adapt KAME behaviour of looking up the scope ID in the default zone
for multicast leaves, when the interface is ambiguous.
mld6.c:
* Tighten origin checks on MLD traffic as per RFC3810 Section 6.2:
* ip6_src MAY be the unspecified address for MLDv1 reports.
* ip6_src MAY have link-local address scope for MLDv1 reports,
MLDv1 queries, and MLDv2 queries.
* Perform address field validation *before* accepting queries.
* Use KAME scope/zone ID in query/report processing.
* Break const correctness for mld_v1_input_report(), mld_v1_input_query()
as we temporarily modify the input mbuf chain.
* Clear the scope ID before handoff to userland MLD daemon.
* Fix MLDv1 old querier present timer processing.
With the protocol defaults, hosts should revert to MLDv2 after 260s.
* Add net.inet6.mld.v1enable sysctl, default to on.
ifmcstat.c:
* Use sysctl by default; -K requests kvm(3) if so compiled.
mld.4:
* Connect man page to build.
Tested using PCS.
Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.
Add support for kernel fault injection using KFAIL_POINT_* macros and
fail_point_* infrastructure. Add example fail point in vfs_bio.c to
simulate VM buf pressure.
Approved by: dfr (mentor)
... by moving two ~2KB structures from stack to heap allocation.
I experienced stack overflow in linux emulation on i386 (8K stack)
when LINUX_DVD_READ_STRUCT ioctl was performed on atapicam cd
device and there was an error that resulted in additional quite
heavy stack use in cam layer.
Reviewed by: dchagin
Approved by: jhb (mentor)
if a local file system supports NFSv4 ACLs. This allows the
NFSHASNFS4ACL() macro to be correctly implemented. The NFSv4 ACL
support should now work when the server exports a ZFS volume.
Approved by: kib (mentor)
and jail_get(2). Jail(8) can now create jails using a "name=value"
format instead of just specifying a limited set of fixed parameters; it
can also modify parameters of existing jails. Jls(8) can display all
parameters of jails, or a specified set of parameters. The available
parameters are gathered from the kernel, and not hard-coded into these
programs.
Small patches on killall(1) and jexec(8) to support jail names with
jail_get(2).
Approved by: bz (mentor)
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.
Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.
Approved by: bz (mentor)
of ip_forward(), if the IPSEC is compiled in. It is possible that there
is an SPD that this packets will go through, even if there is no matching
route. If not, ICMP will be sent anyway, after ip_output().
This is somewhat similar in purpose to r191621, except that one was
for the packets sent from the host, while this one is for packets
being forwarded by the host.
Reviewed by: bz@
Sponsored by: Wheel Sp. z o.o. (http://www.wheel.pl)
to dequeue a packet.
The tx path was trying to ensure that enough Xenbus TX ring slots existed but
it didn't check to see whether the mbuf TX ring slots were also available.
They get freed in xn_txeof() which occurs after transmission, rather than earlier
on in the process. (The same happens under Linux too.)
Due to whatever reason (CPU use, scheduling, memory constraints, whatever) the
mbuf TX ring may not have enough slots free and would allocate slot 0. This is
used as the freelist head pointer to represent "free" mbuf TX ring slots; setting
this to an actual mbuf value rather than an id crashes the code.
This commit introduces some basic code to track the TX mbuf ring use and then
(hopefully!) ensures that enough slots are free in said TX mbuf ring before it
enters the actual work loop.
A few notes:
* Similar logic needs to be introduced to check there are enough actual slots
available in the xenbuf TX ring. There's some logic which is invoked earlier
but it doesn't hard-check against the number of available ring slots.
Its trivial to do; I'll do it in a subsequent commit.
* As I've now commented in the source, it is likely possible to deadlock the
driver under certain conditions where the rings aren't receiving any changes
(which I should enumerate) and thus Xen doesn't send any further software
interrupts. I need to make sure that the timer(s) are running right and
the queues are periodically kicked.
PR: 134926
Slot 0 must always remain "free" and be a pointer to the first free entry in the
mbuf descriptor list. It is thus an error to have code allocate or push slot 0
back into the list.
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.
Approved by: kib (mentor)
get a quick snapshot of the kernel's symbol table including the symbols
from any loaded modules (the symbols are all merged into one symbol
table). Unlike like other implementations, this ksyms driver maps
memory in the process memory space to store the snapshot at the time
/dev/ksyms is opened. It also checks to see if the process has already
a snapshot open and won't allow it to open /dev/ksyms it again until it
closes first. This prevents kernel and process memory from being
exhausted. Note that /dev/ksyms is used by the lockstat(1) command.
Reviewed by: gallatin kib (freebsd-arch)
Approved by: gnn (mentor)
adds probes for mutexes, reader/writer and shared/exclusive locks to
gather contention statistics and other locking information for
dtrace scripts, the lockstat(1M) command and other potential
consumers.
Reviewed by: attilio jhb jb
Approved by: gnn (mentor)
request to arrive. Previously it would end up returning as soon as the
request length stored in the first two bytes had arrived.
Reviewed by: dwmalone
MFC after: 1 week
about removing a few #ifdefs and providing compatibility wrappers and
VOP implementations to get and set an ACL; ZFS does ACL enforcement all
by itself.
Note that the VOPs are ifdefed out for now, so this change should be
a no-op.
Reviewed by: pjd
that the range of versions of NFS handled by the server can
be limited. The nfsd daemon must be restarted after these
sysctl variables are changed, in order for the change to take
effect.
Approved by: kib (mentor)
that it handles the case of a non-exported NFSv4 root correctly.
Also, delete handling for the case where nd_repstat is already
set in nfs_proc(), since that no longer happens.
Approved by: kib (mentor)
o do not attach DLT_IEEE802_11_RADIO unless both tx and rx headers are
present; this is assumed in the capture code paths
o verify the above with asserts in ieee80211_radiotap_{rx,tx}
o add missing checks for active taps before calling ieee80211_radiotap_rx
PCB, simply embed it in the PCB, avoiding additional memory overhead,
memory allocation overhead, and removing one of the few remaining
uses of dtom() in the network stack.
Restore misplaced spx_ctlinput() from an earlier commit.
MFC after: 1 month
severe silicon bugs that can't handle VLAN hardware tagging as well
as status LE writeback bug. The status LE writeback bug is so
critical we can't trust status word of received frame. To accept
frames on Yukon FE+ A0 msk(4) just do minimal check for received
frames and pass them to upper stack. This means msk(4) can pass
corrupted frames to upper layer. You have been warned!
Also I supposed RX_GMF_FL_THR to be 32bits register but Linux
driver treated it as 16bit register so follow their leads. At least
this does not seem to break msk(4) on Yukon FE+.
Tested by: bz, Tanguy Bouzeloc ( the.zauron <> gmail dot com )
Bruce Cran ( bruce <> cran dot org dot uk )
Michael Reifenberger ( mike <> reifenberger dot com )
Stephen Montgomery-Smith ( stephen <> missouri dot edu )
Yukon FE+ is fast ethernet controller and uses new descriptor
format. Since I don't have this controller, the support code was
written from guess and various feedback from enthusiastic users.
Thanks to all users who patiently tested my initial patches.
Special thanks to Tanguy Bouzeloc who fixed critical bug of initial
patch.
Tested by: bz, Tanguy Bouzeloc ( the.zauron <> gmail dot com )
Bruce Cran ( bruce <> cran dot org dot uk )
Michael Reifenberger ( mike <> reifenberger dot com )
Stephen Montgomery-Smith ( stephen <> missouri dot edu )
The GM_GP_CTRL register may have stale content from previous link
information so clearing it will make hardware update the register
correctly when it established a valid link.
While I'm here remove stale comment.
does not guarantee established link. Also 1000baseT link report for
fast ethernet controller is not valid one so make sure gigabit link
is allowed for this controller.
Whenever we lost link, check whether Rx/Tx MACs were enabled. If both
MAC are not active, do not try to disable it again.
mark controller's capability. Controllers that have jumbo frame
support sets MSK_FLAG_JUMBO, and controllers that does not support
checksum offloading for jumbo frames will set MSK_FLAG_JUMBO_NOCSUM.
For Fast Ethernet controllers it will set MSK_FLAG_FASTETHER and it
would be used in link state handling.
While here, disable Tx checksum offloading if jumbo frame is used
on controllers that does not have Tx checksum offloading capability
for jumbo frame(e.g. Yukon EC Ultra).
FE+ controller. Due to the severe silicon bugs for Yukon FE+,
88E3016 seems to require more workarounds. However I'm not sure
whether the workaround is PHY specific or only applicable to Yukon
FE+. The datasheet for the PHY is publicly available but it lacks
several details for the workaround used in this change. The
workaround information was obtained from Linux. Many thanks to
Yukon FE+ users who helped me add 88E3016 support.
Tested by: bz, Tanguy Bouzeloc ( the.zauron <> gmail dot com )
Bruce Cran ( bruce <> cran dot org dot uk )
Michael Reifenberger ( mike <> reifenberger dot com )
Stephen Montgomery-Smith ( stephen <> missouri dot edu )
advertisement register. Some PHYs such as 88E3016 requires NEXT
Page capability to establish valid link. Also set protocol selector
field which is read only but it makes the intention clearer.
is valid only for auto-negotiation case so check the bit if we know
auto-negotiation is active. While I'm here explicitly checks
current speed with speed mask and set IFM_NONE if resolved speed
is unknown.
checks extended status register to see whether the PHY is fast
ethernet or not. This removes a lot of checks for specific PHY
models and it makes easy to add more PHYs to e1000phy(4).
While I'm here remove setting mii_anegticks as it is set with
mii_phy_add_media().
ReleaseLockOwner operations analagous to what is already
in place for SetClientID and SetClientIDConfirm. These are
the five NFSv4 operations that do not use file handle(s),
so the checks are done using the NFSv4 root export entries
in /etc/exports.
Approved by: kib (mentor)
Remove PAGE_SIZE alignment used in Rx buffer DMA tag creation. The
alignment restriction was used in old local jumbo allocator and
nfe(4) switched to UMA backed page allocator for jumbo frame.
This change should fix jumbo buffer allocation failure.
Reported by: Pascal Braun ( pascal.braun <> continum dot net )
to it for the client side reply. Hopefully this fixes the
problem with using the new krpc for arm for the experimental
nfs client.
Approved by: kib (mentor)
where a client is not allowed NFSv4 access correctly. This
restriction is specified in the "V4: ..." line(s) in
/etc/exports.
Approved by: kib (mentor)
on platforms with strict alignment requirements. In particular, this fixes the
problems with the new RPC transport on the arm platform.
Note: this adds yet another copy of nfs_realign(). I will attempt to refactor
after NFS_LEGACYRPC is removed.
Submitted by: sam
sleep waiting for conditions when the lock may be granted.
To prevent lf_setlock() from accessing possibly freed memory, add reference
counting to the struct lockf_entry. Bump refcount around the sleep.
Make lf_free_lock() return non-zero when structure was freed, and use
this after the sleep to return EINTR to the caller. The error code might
need a clarification, but we cannot return success to usermode, since
the lock is not owned anymore.
Reviewed by: dfr
Tested by: pho
MFC after: 1 month
is acquired. In the lf_purgelocks(), assert that vnode is doomed and set
*statep to NULL before clearing ls_pending list. Otherwise, we allow for
the thread executing lf_advlockasync() to put new pending entry after
state->ls_lock is dropped in lf_purgelocks().
Reviewed by: dfr
Tested by: pho
MFC after: 1 month
In the original MPSAFE TTY code, I changed the behaviour by returning
EBUSY. I thought this made more sense, because it's basically a race to
see who gets the TTY first.
It turns out this is not a good change, because it also causes EBUSY to
be returned when another process is closing the TTY. This can happen
during startup, when /etc/rc (or one of its children) is still busy
draining its data and /sbin/init is attempting to open the TTY to spawn
a getty.
Reported by: bz
Tested by: bz
case of a kerberized mount without a host based principal
name. This will only work for mounts being done by a user
other than root. Support for a host based principal name
will not work until proposed changes to the rpcsec_gss part
of the krpc are committed. It now builds for "options KGSSAPI".
Approved by: kib (mentor)
vimage options along with all the defines made things really hard to
read and get right; thus add comments for the #else/#endif cases.
Discussed with: zec
to optionally have overlapping unit numbers if attached in different
vnets.
At this stage if_loop is the only clonable ifnet class that has been
extended to allow for such overlapping allocation of unit numbers, i.e.
in each vnet it is possible to have a lo0 interface. Other clonable ifnet
classes remain to operate with traditional semantics, i.e. each instance
of a clonable ifnet will be assigned a globally unique unit number,
regardless in which vnet such an ifnet becomes instantiated.
While here, garbage collect unused _lo_list field in struct vnet_net,
as well as improve indentation for #defines in sys/net/vnet.h.
The layout of struct vnet_net has changed, therefore bump
__FreeBSD_version.
This change has no functional impact on nooptions VIMAGE kernel builds.
Reviewed by: bz, brooks
Approved by: julian (mentor)
be named "udp_inpcb" to avoid a naming conflict with tcp[1].
For consistency rename the uma zone for TCP from "inpcb" to "tcp_inpcb".
Found by: rwatson [1]
Discussed with: rwatson
So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel
tunneling callbacks. Move that into the udpcb and add a field for flags
there to be used by upcoming changes instead of sticking udp only flags
into in_pcb flags2.
Bump __FreeBSD_version for ports to detect it and because of vnet* struct
size changes.
Submitted by: jhb (7.x version)
Reviewed by: rwatson
kernel option.
This also permits tuning of the option per virtual network stack, as
well as separately per inet, inet6.
The kernel option is left for a transition period, marked deprecated,
and will be removed soon.
Initially requested by: phk (1 year 1 day ago)
MFC after: 4 weeks
Cryptodev uses UIO structure do get data from userspace and pass it to
cryptographic engines. Initially UIO size is equal to size of data passed to
engine, but if UIO is prepared for hash calculation an additional small space
is created to hold result of operation.
While creating space for the result, UIO I/O vector size is correctly
extended, but uio_resid field in UIO structure is not modified.
As bus_dma code uses uio_resid field to determine size of UIO DMA mapping,
resulting mapping hasn't correct size. This leads to a crash if all the
following conditions are met:
1. Hardware cryptographic accelerator writes result of hash operation
using DMA.
2. Size of input data is less or equal than (n * PAGE_SIZE),
3. Size of input data plus size of hash result is grather than
(n * PAGE_SIZE, where n is the same as in point 2.
This patch fixes this problem by adding size of the extenstion to uio_resid
field in UIO structure.
Submitted by: Piotr Ziecik kosmo ! semihalf dot com
Reviewed by: philip
Obtained from: Semihalf
write fault or while wiring a mapping that must support write access.
In general, this change should reduce the number of traps that occur for
the purpose of setting the modified bit. More specifically, this change
should prevent traps while holding locks in a sysctl handler. See
kern/kern_sysctl.c revisions 1.168 and 1.195 (svn r192160) for further
details.
Tested by: gonzo
the code will build when "options KGSSAPI" is specified
without requiring the proposed changes that add host based
initiator principal support. It will not handle the case where
the client uses a host based initiator principal until those
changes are committed. The code that uses those changes is
#ifdef'd notyet until the krpc rpcsec_changes are committed.
Approved by: kib (mentor)
because struct vnet_net holds the rt_tables[][] for MRT and array size
is compile time dependent. If you had ROUTETABLES set to >1 after
r192011 V_loif was pointing into nonsense leading to strange results
or even panics for some people.
Reviewed by: mz
for reassigning ifnets from one vnet to another.
if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.
if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.
While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.
Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.
This change should have no functional impact on nooptions VIMAGE kernel
builds.
Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)
use the newer nmount() style arguments, as is used by mount_nfs.c.
This prepares the kernel code for the use of a mount_nfs.c with
changes for the experimental client integrated into it.
Approved by: kib (mentor)
only if prepping the adapter failed.
Slight adjustment to comments.
Fix a bug whereby downing the interface didn't preven it from
processing packets.
Submitted by: Navdeep Parhar
MFC after: 1 week
writes upon close when a write delegation is held by the client.
This should be safe to do, now that nfsv4 Close operations are
delayed until ncl_inactive() is called for the vnode.
Approved by: kib (mentor)
- controller reset/firmware loading.
- controller level tracing and tracing of capi messages of applications
running with different user credentials.
Reviewed by: rwatson
MFC after: 2 weeks
Some hardware easily comes out of sync with regard to whether the current or
the next control transfer should be stalled, if a stall command is always
issued before receiving the SETUP packet. After this patch the stall command
will only be issued when a transfer should actually be stalled.
Submitted by: Hans Petter Selasky
When enabled all TTY input queue buffers are zeroed when flushing or
closing the TTY. Because TTY input queues are also used to store filled
in passwords, this may be an interesting switch to enable for security
minded people.
1) Add a sysctl that will say what type of PHYs exist on the card.
2) Fix a bug that occurs when an AEL 2005 PHY resets without a transciever
in the card.
3) Unify the PHY link detection code.
Obtained from: Navdeep Parhar
MFC after: 10 days
and down more cleanly. This addresses a problem where if we have the
link flap during boot the driver would lock up the system.
Reviewed by: jhb
MFC after: 1 week
the behaviour alredy present with the further malloc() call in
devctl_notify().
This fixes a bug in the CAM layer where the camisr handler finished to
call camperiphfree() (and subsequently destroy_dev() resulting in a new
dev notify) while the xpt lock is held.
PR: kern/130330
Tested by: Riccardo Torrini <riccardo dot torrini at esaote dot com>
thread. Multiple RAID events in quick succession can cause an additional
bus rescan to be scheduled before an earlier scan has completed. In this
case the driver was attempting to use the same CCB storage for two requests.
PR: kern/130330
Reviewed by: Riccardo Torrini riccardo.torrini | esaote com
MFC after: 1 week
access windows. This eliminates hangs on systems which are configured to use
interleaved mode: prior to this fix we were simply cutting ourselves from
access to the main memory in this case.
Obtained from: Freescale, Semihalf
last year or two's work on routing:
- Combine iproute initialization and flowtable lookup blocks, eliminating
unnecessary tests for known-zero'd iproute fields.
- Add a comment indicating (a) why the route entry returned by the
flowtable is considered stable and (b) that the flowtable lookup must
occur after the setup of the mbuf flow ID.
- Assert the inpcb lock before any use of inpcb fields.
Reviewed by: kmacy
o Header file cleanup.
o bus_dma(9) conversion.
- Removed all consumers of vtophys(9) and converted to use
bus_dma(9).
- 64bit DMA support was disabled because DP83821 is not capable
of handling the DMA request. 64bit DMA request on DP83820
requires different descriptor structures and it's hard to
dynamically change descriptor format at run time so I disabled
it. Note, this is the same behavior as previous one but
previously nge(4) didn't explicitly disable 64bit mode on
DP83820.
- Added Tx/Rx descriptor ring alignment requirements(8 bytes
alignment).
- Limit maximum number of Tx DMA segments to 16. In fact,
controller does not seem to have limitations on number of Tx
DMA segments but 16 should be enough for most cases and
m_collapse(9) will handle highly fragmented frames without
consuming a lot of CPU cycles.
- Added Rx buffer alignment requirements(8 bytes alignment). This
means driver should fixup received frames to align on 16bits
boundary on strict-alignment architectures.
- Nuked driver private data structure in descriptor ring.
- Added endianness support code in Tx/Rx descriptor access.
o Prefer faster memory mapped register access to I/O mapped access.
Added fall-back mechanism to use alternative register access.
The hardware supports both memory and I/O mapped access.
o Added suspend/resume methods but it wasn't tested as controller I
have does not support PCI PME.
o Removed swap argument in nge_read_eeprom() since endianness
should be handled after reading EEPROM.
o Implemented experimental 802.3x full-duplex flow-control. ATM
it was commented out but will be activated after we have generic
flow-control framework in mii(4) layer.
o Rearranged promiscuous mode settings and simplified logic.
o Always disable Rx filter prior to changing Rx filter functions as
indicated in DP83820/DP83821 datasheet.
o Added an explicit DELAY in timeout loop of nge_reset().
o Added a sysctl variable dev.nge.%d.int_holdoff to control
interrupt moderation. Valid ranges are 1 to 255(default 1) in
units of 100us. The actual delivery of interrupt would be delayed
based on the sysctl value. The interface has to be brought down
and up again before a change takes effect. With proper tuning
value, users do not need to resort to polling(4) anymore.
o Added ALTQ(4) support.
o Added missing IFCAP_VLAN_HWCSUM as nge(4) can offload Tx/Rx
checksum calculation on VLAN tagged frames as well as VLAN tag
insertion/stripping. Also add IFCAP_VLAN_MTU capability as nge(4)
can handle VLAN tagged oversized frames.
o Fixed media header length for VLAN.
o Rearranged nge_detach routine such that it's now used for general
clean-up routine.
o Enabled MWI.
o Accessing EEPROM takes very long time so read 6 bytes ethernet
address with one call instead of 3 separate accesses.
o Don't set if_mtu in device attach, it's already set in
ether_ifattach().
o Don't do any special things for TBI interface. Remove TBI
specific media handling in the driver and have gentbi(4) handle
it. Add glue code to read/write TBI PHY registers in miibus
method. This change removes a lot of PHY handling code in driver
and now its functionality is handled by mii(4).
o Alignment fixup code is now applied only for strict-alignment
architectures. Previously the code was applied for all
architectures except i386. With this change amd64 will get
instant Rx performance boost.
o When driver fails to allocate a new mbuf, update if_qdrops so
users can see what was wrong in Rx path.
o Added a workaround for a hardware bug which resulted in short
VLAN tagged frames(e.g. ARP) was rejected as if runt frame was
received. With this workaround nge(4) now accepts the short VLAN
tagged frame and nge(4) can take full advantage of hardware VLAN
tag stripping. I have no idea how this bug wasn't known so far,
without the workaround nge(4) may never work on VLAN
environments.
o Fixed Rx checksum offload logic such that it now honors active
interface capability configured with ifconfig(8).
o In nge_start()/nge_txencap(), always leave at least one free
descriptor as indicated in datasheet. Without this the hardware
would be confused with ring descriptor structure(e.g. no clue
for the end of descriptor ring).
o Removed dead-code that checks interrupts on PHY hardware. The
code was designed to detect link state changes but it was
disabled as driving nge_tick clock would break auto-negotiation
timer. This code is no longer needed as nge(4) now uses mii(4)
and link state change handling is done with mii callback.
o Rearranged ethernet address programming logic such that it works
on strict-alignment architectures.
o Added IFCAP_VLAN_HWTAGGING/IFCAP_VLAN_HWCSUM handler in
nge_ioctl() such that the functionality is configurable with
ifconfig(8). DP83820/DP83821 can do checksum offload for VLAN
tagged frames so enable Tx/Rx checksum offload for VLAN
interfaces.
o Simplified IFCAP_POLLING selection logic in nge_ioctl().
o Fixed module unload panic when bpf listeners are active.
o Tx/Rx descriptor ring address uses 64bit DMA address for
readability. High address part of DMA would be 0 as nge(4)
disabled 64bit DMA transfers so it's ok for DP83821.
o Removed volatile keyword in softc as bus_dmamap_sync(9) should
take care of this.
o Removed extra driver private structures in descriptor ring. These
extra elements are not part of descriptor structure. Embedding
private driver structure into descriptor ring is not good idea
as its size may be different on 32bit/64bit architectures.
o Added miibus_linkchg method handler to catch link state changes.
o Removed unneeded nge_ifmedia in softc. All TBI access is handled
in gentbi(4). There is no difference between TBI and non-TBI case
now.
o Removed "gigabit link up" message handling in nge_tick. Link
state change notification is already performed by mii(4) and
checking link state by accessing PHY registers in periodic timer
handler of driver is wrong. All link state and speed/duplex
monitoring should be handled in PHY driver.
o Use our own timer for watchdog instead of if_watchdog/if_timer
interface.
o Added hardware MAC statistics counter, users canget current MAC
statistics from dev.nge.%d.stats sysctl node(%d is unit number of
a device).
o Removed unused macros, NGE_LASTDESC, NGE_MODE, NGE_OWNDESC,
NGE_RXBYTES.
o Increased number of Tx/Rx descriptors from 128 to 256. From my
experience on gigabit ethernet controllers, number of descriptors
should be 256 or higher to get an optimal performance on gigabit
link.
o Increased jumbo frame length to 9022 bytes to cope with other
gigabit ethernet drivers. Experimentation shows no problems with
9022 bytes.
o Removed unused member variables in softc.
o Switched from bus_space_{read|write}_4 to bus_{read|write}_4.
o Added support for WOL.
lock sysid to be used for non-nlm remote locking. This is required
for the experimental nfsv4 server, so that it can acquire byte
range locks correctly on behalf of nfsv4 clients.
Reviewed by: dfr
Approved by: kib (mentor)
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.
Reviewed by: kmacy