- Document last change to ARP behavior.
- Document several undocumented sysctl variables.
- Fix spelling of few diagnostics.
- Improve the documentation of "proxyall" knob, somewhat: we do not
proxy for hosts that are reachable through the same interface the
request came in from. This feature is mainly for hosts reachable
through some P2P link, e.g. the gif(4) tunnel.
Rework ARP retransmission algorythm so that ARP requests are
retransmitted without suppression, while there is demand for
such ARP entry. As before, retransmission is rate limited to
one packet per second. Details:
- Remove net.link.ether.inet.host_down_time
- Do not set/clear RTF_REJECT flag on route, to
avoid rt_check() returning error. We will generate error
ourselves.
- Return EWOULDBLOCK on first arp_maxtries failed
requests , and return EHOSTDOWN/EHOSTUNREACH
on further requests.
- Retransmit ARP request always, independently from return
code. Ratelimit to 1 pps.
MFC 1.142:
Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.
- Do not raise IFF_DRV_OACTIVE flag in vlan_start, because this
can lead to stalled interface
- Explain this fact in a comment.
Reviewed by: rwatson, thompsa, yar
Fix several races between socket closure and node/hook
destruction:
- Backout 1.62, since it doesn't fix all possible
problems.
- Upon node creation, put an additional reference on node.
- Add a mutex and refcounter to struct ngsock. Netgraph node,
control socket and data socket all count as references.
- Introduce ng_socket_free_priv() which removes one reference
from ngsock, and frees it when all references has gone.
- No direct pointers between pcbs and node, all pointing
is done via struct ngsock and protected with mutex.
revision 1.117
date: 2005/11/02 15:23:47; author: glebius; state: Exp; lines: +47 -8
Fix two races which happen when netgraph is restructuring:
- Introduce ng_topo_mtx, a mutex to protect topology changes.
- In ng_destroy_node() protect with ng_topo_mtx the process
of checking and pointing at ng_deadnode. [1]
- In ng_con_part2() check that our peer is not a ng_deadnode,
and protect the check with ng_topo_mtx.
- Add KASSERTs to ng_acquire_read/write, to make more
understandible synopsis in case if called on ng_deadnode.
Reported by: Roselyn Lee [1]
----------------------------
revision 1.116
date: 2005/11/02 14:27:24; author: glebius; state: Exp; lines: +106 -121
Rework the ng_item queueing on nodes:
- Introduce a new flags NGQF_QREADER and NGQF_QWRITER,
which tell how the item should be actually applied,
overriding NGQF_READER/NGQF_WRITER flags.
- Do not differ between pending reader or writer. Use only
one flag that is raised, when there are pending items.
- Schedule netgraph ISR in ng_queue_rw(), so that callers
do not need to do this job.
- Fix several comments.
Submitted by: julian
As well as some lesser changes: ng_base.c 1.114, 1.113, 1.107, 1.118.
revision 1.83
date: 2005/11/09 08:43:18; author: yongari; state: Exp; lines: +41 -38
Make em(4) work on big-endian architectures.
- disable jumbo frame support on strict alignment architectures due
to the limitation of hardware. The driver needs a fix-up code for
RX side. The fix will show up in near future.
- fix endian issue for 82544 on PCI-X bus. I couldn't test this as
I don't have the NIC/hardware.
- prefer PCIR_BAR to hardcoded EM_MMBA.
- Properly checks for for 64bit BAR [1]
- replace inl/outl with bus_space(9) [1]
- fix endian issue on VLAN handling.
- reorder header files and remove unnecessary one.
Reviewed by: cognet
No response from: pdeuskar, tackerman
Obtained from: OpenBSD [1]
revision 1.84
date: 2005/11/09 15:23:54; author: glebius; state: Exp; lines: +7 -3
- Introduce two more stat counters, counting number of RX
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
this leads to counter slipping back, when if->if_oerrors
is recalculated in em_update_stats_counters(). Instead
increase watchdog counter in em_watchdog() and take it
into account in em_update_stats_counters().
revision 1.86
date: 2005/11/11 16:04:51; author: ru; state: Exp; lines: +1 -1
- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
revisions 1.85, 1.87
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
them to be multiple of 128.
revision 1.88
date: 2005/11/21 04:17:43; author: yongari; state: Exp; lines: +121 -83
busdma cleanup for em(4).
- don't force busdma to pre-allocate bounce pages for parent tag.
- use system supplied roundup2 macro instead of rolling its own version.
- TX/RX decriptor length should be multiple of 128. There is no
no need to expand the size with the multiple of 4096.
- don't create/destroy DMA maps in TX/RX handlers. Use pre-allocated
DMA maps. Since creating DMA maps on sparc64 is time consuming
operations(resource mananger overhead), this change should boost
performance on sparc64. I could get > 2x speedup on Ultra60.
- TX/RX descriptors could be aligned on 128 boundary. Aligning them
on PAGE_SIZE is waste of resource.
- don't blindly create TX DMA tag with size of MCLBYTES * 8. The size
is only valid under jumbo frame environments. Instead of using the
hardcoded value, re-compute necessary size on the fly.
- RX side bus_dmamap_load_mbuf_sg(9) support.
- remove unused macro EM_ROUNDUP and constant EM_MMBA.
Reviewed by: scottl
Tested by: glebius
revision 1.89
date: 2005/11/24 01:44:48; author: glebius; state: Exp; lines: +131 -77
Merge in new driver version from Intel - 3.2.18.
The most important change is support for adapters based on
82571 and 82572 chips.
Tested on: 82547EI on i386
Tested on: 82540EM on sparc64
revision 1.90
date: 2005/11/24 15:13:47; author: cognet; state: Exp; lines: +3 -1
Remember the bus_dmamap_t where we loaded the mbuf, and sync this map instead
of tx_buffer->map, or we could end up syncing the wrong map.
1. make_index no longer coredumps if input has zero lines.
2. phttpget no longer enters an infinite loop if the TCP connection is
closed while phttpget is reading HTTP headers.
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
Introduce /etc/rc.d/bluetooth script to start/stop Bluetooth devices. It
will be called from devd(8) in response to device arrival/departure events.
It is also possible to call it by hand to start/stop particular device
without unplugging it.
Introduce generic way to set configuration parameters for Bluetooth devices.
By default /etc/rc.d/bluetooth script has hardwired defaults compatible
with old rc.bluetooth from /usr/share/netgraph/bluetooth/examples. These
can be overridden using /etc/defaults/bluetooth.device.conf file (system
wide defaults). Finally, there could be another device specific override
file located in /etc/bluetooth/$device.conf (where $device is ubt0, btccc0
etc.)
The list of configuration parameters and their meaning described in the
/etc/defaults/bluetooth.device.conf file. Even though Bluetooth device
configuration files are not shell scripts, they must follow basic sh(1) syntax.
Add rc.d scripts for the hcsecd(8) and sdpd(8) daemons. Put defaults into
/etc/defaults/rc.conf. Both daemons can run even if no Bluetooth devices
are attached to the system. Both daemons depend on Bluetooth socket layer.
Fix Bluetooth assigned numbers URL in /etc/bluetooth/protocols file.
MFC revision 1.110
MFC revision 1.109
- Lock the object while traversing the list of it's backing objects
- Use the correct object while calculating offsets
- Conditionally pickup Giant if debug.mpsafevfs == 0 or if the file
system is not marked as being MP safe.
Print (total - used) as the amount of available swap for a swap device
when printing swapinfo output, rather than (total), as that is (strictly
speaking) more accurate.
Pointed out by: Rob <spamrefuse at yahoo dot com>
Modify netstat -mb to use libmemstat when accessing a core dump or live
kernel memory and not using sysctl. Previously, libmemstat was used
only for the live kernel via sysctl paths.
This results in netstat output becoming both more consistent between
core dumps and the live kernel, and also more information in the core
dump case than previously (i.e., mbuf cache information).
Statistics relating to sfbufs still rely on a kvm descriptor as they
are not currently exposed via libmemstat. netstat -m operating on a
core is still unable to print certain sfbuf stats available on the live
kernel.
Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().
Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.
Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.
Introduce the vm.pmap.pv_entries tunable on alpha and ia64.
Eliminate some unnecessary white space.
When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.
The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.
MFC revision 1.518
Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.
MFC revision 1.519
Eliminate unneeded diagnostic code.
MFC revision 1.520
Eliminate unneeded diagnostic code.
Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)
MFC revision 1.521
Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.
Eliminate a stale comment from pmap_enter().
MFC revision 1.522
Correct a performance bug in revision 1.462. The effect of the bug is to
execute the outer loop in procedures such as pmap_protect() many more times
than necessary.
MFC revision 1.523
Introduce pmap_pml4e_to_pdpe() and pmap_pdpe_to_pde() and use them to avoid
recomputation of the pml4e and pdpe in pmap_copy(), pmap_protect(), and
pmap_remove().
MFC revision 1.524
Change pmap_extract() and pmap_extract_and_hold() to use PG_FRAME rather
than ~PDRMASK to extract the physical address of a superpage from a PDE.
The use of ~PDRMASK is problematic if the PDE has PG_NX set. Specifically,
the PG_NX bit will be included in the physical address if ~PDRMASK is used.
MFC revision 1.525
Pass the PDE from pmap_remove() to pmap_remove_page() so that the latter
procedure doesn't have to recompute it.
MFC revision 1.526
Remedy the following three problems:
1. The amd64 pmap, unlike the i386 pmap, maintains a reference count
for each page directory (PD) page. However, in the transformation
of the i386 pmap into the amd64 pmap, operations, such as
pmap_copy() and pmap_object_init_pt(), that create 2MB "superpage"
mappings by setting the PG_PS bit in a PD entry were not modified
to adjust the underlying PD page's reference count. Consequently,
superpage mappings could disappear prematurely.
2. pmap_object_init_pt() could crash or corrupt memory if either the
virtual address range being mapped crosses a 1GB boundary in the
virtual address space or nothing is mapped in the 1GB area.
3. When pmap_allocpte() destroys a 2MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should. (This
bug is inherited from i386.)
MFC revision 1.528
Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.
date: 2005/11/12 11:45:01; author: krion; state: Exp; lines: +9 -2
Add -P flag, it does the same as the -p option, except that the
given prefix is also used recursively for the dependency packages,
if any. If the -P flag appears after any -p flag on the
command line, it overrides it's effect, causing pkg_add to use the
given prefix recursively.
PR: bin/75742
Submitted by: Frerich Raabe <raabe AT kde DOT org>
Add a DDB "show files" command to list the current open file list, some
state about each open file, and identify the first process in the process
table that references the file. This is helpful in debugging leaks of
file descriptors.
Expand the set of details printed for each file descriptor to include
it's garbage collection flags. Reformat generally to make this fit and
leave some room for future expansion.
Add the f_msgcount field to the set of struct file fields printed in show
files.