families.
The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time. This made it impossible to dynamically add or remove any
protocols to or from it. We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time. The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP. This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer. Instead of fixing all potential
callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller. Upon successful registration
the pr_init() function will be called to finish initialization of the protocol. The
unregister function restores the SPACER in place of the protocol again. It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767
o Prototypes for all pru_*_notsupp() functions
o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the
pru_*_notsupp() functions
o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
the ATA pccard locking function. This makes pccard devices like
Compact Flash cards work again.
PR: kern/72805
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
MFC in: 2 days
frames. BGE hardware with the rx alignment bug will still be handled by the
calls to m_adj() that already exist. m_adj() is probably better suited for
this task anyways. Just as with if_em, this saves a malloc + several locks
per packet and prevents unneeded data copying within busdma.
Since the e1000 DMA engines hava no constraints on the alignment of buffer
transfers, there is no reason to tell busdma that there is. This save a
minimum of 1 malloc call per packet, which translates to eliminating 4 locks.
It also means that buffers are not needlessly bounced when transfered. The
end result is a 38% improvement in pps in a 4 way bridging environment.
Obtained from: Sandvine, Inc.
(usually taking 20 seconds to transmit a packet).. no longer fall back
to only transmitting one packet (instead of the entire queue) after we
have processed the entire send queue... I have no idea why we didn't
start seeing this problem ~6 years ago when this code was introduced...
(sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
modes on a tty structure.
Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.
The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.
Use the new function throughout.
List of functional changes:
- Make a single device per single node with a single hook.
This gives us parrallelizm, which can't be achieved on a single
node with many devices/hooks. This also gives us flexibility - we
can play with a particular device node, not affecting others.
- Remove read queue as it is. Use struct ifqueue instead. This change
removes a lot of extra memcpy()ing, m_devget()ting and m_copymem()ming.
In ng_device_receivedata() we enqueue an mbuf and wake readers.
In ngdread() we take one mbuf from qeueue and uiomove() it to
userspace. If no mbuf is present we optionally block. [1]
- In ngdwrite() we create an mbuf from uio using m_uiotombuf().
This is faster then uiomove() into buffer, and then m_copydata(),
and this is much better than huge m_pullup().
- Perform locking of device
- Perform locking of connection list.
- Clear out _rcvmsg method, since it does nothing good yet.
- Implement NGM_DEVICE_GET_DEVNAME message.
- #if 0 ioctl method, while nothing is done here yet.
- Return immediately from ngdwrite() if uio_resid == 0.
List of tidyness changes:
- Introduce device2priv(), to remove cut'n'paste.
- Use MALLOC/FREE, instead of malloc/free.
- Use unit2minor().
- Use UID_ROOT/GID_WHEEL instead of 0/0.
- Define NGD_DEVICE_DEVNAME, use it.
- Use more nice macros for debugging. [2]
- Return Exxx, not -1.
style(9) changes:
- No "#endif" after short block.
- Break long lines.
- Remove extra spaces, add needed spaces.
[1] Obtained from: if_tun.c
[2] Obtained from: ng_pppoe.c
Reviewed by: marks
Approved by: julian (mentor)
MFC after: 1 month
failure in the NFS server would result in a leaked instance of the NFS
server subsystem lock. Liberally sprinkle assertions in all target
labels for error unwinding to assert the desired locking state.
RELENG_5_3 candidate.
MFC after: 3 days
Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
freeing it. The page already has PG_BUSY set by vm_page_alloc().
Setting it again could cause an assertion failure.
MFC after: 2 weeks
vm_page_io_finish(). The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
invalidate the TLB(s) if the old mapping wasn't used by the CPU. With
network interfaces that implement checksum off-loading, the old mapping is
almost never used by the CPU, only by the device driver for setting up the
DMA operation.
Reviewed by: tegge@
critical_exit as the process is getting scheduled to run. This is subotimal
but for now avoid the LOR between the scheduler and the sleepq systems.
This is a 5.3 candidate.
Submitted by: davidxu
MFC After: 3 days
constrained to a small number of sessions by the small on-card memories found
in newer devices. This is really a stopgap solution as having session state
in main memory incurs a (small but noticeable) performance penalty. The better
solution is to manage session state so that it's cached on chip.
Obtained from: openbsd
* Get flags first, in case there is no devclass.
* Reset flags after each probe in case the next driver has no hints so it
doesn't inherit the old ones.
* Set them again before the winning probe.
Tested ok both with and without ACPI for ISA device flags.
Reviewed by: imp
MFC after: 1 day
providers for tasting. Before this hack, race below is possible:
SI_SUB_RAID (no not-fully-configured geoms, so don't block)
GEOM tasting (now geoms are created)
SI_SUB_MOUNT_ROOT (if root file system is placed on a mirror, it is
possible that this mirror is not fully configured yet)
There is a lot of work to do to avoid such hacks and I need a working
solution before 5.3, sorry.
Reported by: John Hay <jhay@icomtek.csir.co.za>
We have to use our own destroy_geom method, because default one, which
is a part of geom_slice is broken.
MT5 candidate.
PR: kern/72467
Submitted by: Vladimir Novoseltsev