is called. It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on. The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.
Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.
Add functions to rename objects and to move a subdisk from one drive
to another.
Obtained from: Chris Jones <chris.jones@ualberta.ca>
Sponsored by: Google Summer of Code 2005
MFC in: 1 week
have any know to enable it from userland and could only be enabled by
either setting it to 1 at compile time or through the kernel debugger.
In the future it may be brought back as KTR tracing points.
Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005
directory by default) without requiring the user to load them by hand using
e.g iwicontrol. Get rid of the old ioctl crud.
Updated iwi-firmware port coming soon.
Obtained from: OpenBSD
synonyms for "shortname" and "longname" mount options. The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.
Discovered by: Rainer Hurling <rhurlin at gwdg dot de>
include ip_options.h into all files making use of IP Options functions.
From ip_input.c rev 1.306:
ip_dooptions(struct mbuf *m, int pass)
save_rte(m, option, dst)
ip_srcroute(m0)
ip_stripoptions(m, mopt)
From ip_output.c rev 1.249:
ip_insertoptions(m, opt, phlen)
ip_optcopy(ip, jp)
ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)
No functional changes in this commit.
Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005
to list corruption, which can be difficult to unravel in a post-mortem
analysis. These checks verify that prev and next pointers are consistent
when inserting or removing elements, thus catching any corruption earlier.
Also use TRASHIT to break LIST and SLIST link pointers on element removal,
from mlaier via -hackers.
Reviewed by: mlaier
Approved by: rwatson (mentor)
mystery traps. If we don't have a message for a given trap, just use
UNKNOWN for the message.
- Add trap messages for T_XMMFLT and T_RESERVED.
MFC after: 1 week
have free space in it. Allocate correct mbuf from the beginning.
This allows icmp_error() to quote the entire TCP header in error
messages.
Sponsored by: TCP/IP Optimization Fundraise 2005
callpath via vfs_getopt(), and set the appropriate MNT_* flag:
-> acls, async, force, multilabel, noasync, noatime,
-> noclusterr, noclusterw, snapshot, update
- Allow errmsg as a valid mount option via vfs_getopt(),
so we can later add a hook to propagate mount errors back
to userspace via vfs_mount_error().
the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.
This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.
Sponsored by: Isilon Systems
Reviewed by: phk
MFC after: 1 week
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set. This is very important
when we want to do an update mount of the root filesystem, using nmount().
Prevent backup CARP hosts from replying to arp requests, fixes strangeness
with some layer-3 switches. From Bill Marquette.
Tested by: Kazuaki Oda <kaakun highway.ne.jp>
for a notebook with em(4) adapter.
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
them to be multiple of 128.
[*] According to comments in if_em.h the 82540EM/82541ER chips can handle
more than 256 descriptors. Since we don't have this hardware to test,
we decided to mimic NetBSD wm(4) driver, that limits these chips to
256 descriptors.
In collaboration with: yongari
IPI_STOP handling code use atomic_readandclear() to execute the restart
function on the first CPU to resume and restore the behavior of always
executing the restart function on the BSP since this is in fact what the
non-NMI IPI_STOP handler does. I did add back in a statement to clear
the restart function pointer after it is executed to match the behavior
of the non-NMI IPI_STOP handler.
I/O APIC that doesn't exist, then a read of the version register is going
to return -1 which is 0xffffffff not 0xffffff.
Tested on: i386
Tested by: Nikos Ntarmos ntarmos at ceid dot upatras dot gr
MFC after: 1 week
that can potentially be mapped to negative ieee #'s.
NB: before operation on the latter can be supported we need to cleanup
various code that assumes ieee channel #'s are >= 0
stale when called to reset rate control state causing us to
pickup an invalid index, check for this and skip 'em (things
will eventually get fixed up so this is not harmful)
of volumes that might need administrator attention through device
specific sysctl to simplify device monitoring.
Submitted by: Deomid Ryabkov <myself at rojer dot pp dot ru>
execute a ET_DYN binary (shared object).
This does not make much sense, but some linux scripts expect to be able to
execute /lib/ld-linux.so.2 (ldd comes to mind).
The sysctl defaults to 0.
MFC after: 3 days
currently present is minor and offers no real semantic issues, it also
doesn't make sense since an earlier lockless check has already
occurred. Also hold the mutex longer, over a manipulation of
per-process ktrace state, which requires synchronization.
MFC after: 1 month
Pointed out by: jhb
This one simply tries to simplify the logic to select the
buffer sizes. I am not sure it is necessary but the code
seems a bit more readable to me. And at least i have tried
to document how the buffer sizes are computed.
Thanks to luigi for deciphering one of the most cryptic part of
sound driver.
Submitted by: luigi
Approved by: netchild (mentor)
In SNDCTL_DSP_SETFRAGMENT, if you specify both read and
write channels, the existing code first acts on the
read channel, but as a side effect it updates the
arguments (maxfrags, fragsz) passed by the caller according
to acceptable values for the read channel, and then uses the
modified values to act on the write channel.
The problem with this approach is that, given a
(maxfrags, fragsz) user-specified value, the actual
values computed by the read and write channels may differ:
e.g. the read channel might want to allocate more fragments
than what the user specified because it has no side-effects
on the delay and it helps in case of slow readers,
whereas the write channel needs to use as few fragments
as possible to keep the audio latency low (very important
with telephony apps).
This patch stores the values computed by the read channel
into temproary variables so the write channel will use
the actual arguments of the ioctl.
This patch is very helpful with telephony apps such as asterisk.
Submitted by: luigi
Approved by: netchild (mentor)
- Added new codec id for CX20468-21 and VIA1617A.
Submitted by: Chen Lihong <lihong.chen@gmail.com>
- Re-enable SOUND_MIXER_IGAIN, but set the default level as 0 (mute)
Suggested by: luigi
mixer.c:
- Set default value for SOUND_MIXER_IGAIN as 0 (mute) to avoid
feedback problems on some laptops (was disabled by jhb during
ac97.c revision 1.42).
Approved by: netchild (mentor)
erratic system slowdown (beaten to a pulp) and possible panic. This
issue has bugged me for as long as I could remember, until I
realized that it is possible for register base offset to hold zero
value which is definitely a "FALSE".
Approved by: netchild (mentor)
compatible AC97 codec.
- As the driver supports so many variants, create a table ids for
ease of probing and maintenance.
Submitted by: yongari
Reviewed/Tested by: multimedia@
- From luigi:
The code to compute fragment sizes in the ich driver almost
invariably ends up using the full buffer available, no matter
how the user specifies fragment size and number.
With audio telephony (8khz, 16bit-stereo) and the 16k buffer
size this results in an unbearable 500ms delay.
This patch makes sure that we never use more than 4 fragments,
(i don't think we need more unless there are huge interrupt
servicing latencies), and obey to the requested fragment size,
so that latency is acceptable.
Based on this (and after much regression tests), I can conclude
that this driver works best with 2 fragments, thus solving various
long standing issues of ICH driver not capable to flush or play
short files perfectly.
Suggested by: luigi (the idea of smaller fragments)
- MPSAFE conversion.
Approved by: netchild (mentor)
distinct hardware playback channels. DAC configuration can be
accessed through kernel hint - hint.pcm.<unit>.dac="val" with
following possible values:
0 = Enable both DACs (default)
1 = Enable single DAC (DAC1)
2 = Enable single DAC (DAC2)
3 = Enable both DACs, swap position (DAC2 comes first instead
of DAC1)
Special case for ES1370:
Unlike ES1371,2,3/CT5880, volume for each DAC 1 and 2 can be
controlled indepedently (synth for DAC1, pcm for DAC2). It is
possible that user will confuse by this behaviour, since both
DACs are enabled by default. Thus, provide a knob through sysctl
hw.snd.pcm<unit>.single_pcm_mixer:
0 = each DACs will be controlled separately (synth/pcm).
1 = combine both DACs volume mixer controller into a single
"pcm" (default)
As a side note, fixed rate operation (provided by previous
commit) is not a mandatory if the configuration space does not
involve DAC2 (perhaps disabled by user through the above kernel
hint). Unlike DAC2, DAC1 has its own register / control space,
not affected by the speed settings of ADC.
Tested by: multimedia@
Approved by: netchild (mentor)
mask to recdev_l and recdev_r, since each have its own unique mask.
Submitted by: Watanabe Kazuhiro <CQG00620@nifty.ne.jp>
Approved by: netchild (mentor)
verbs. Only the create verb operates on a provider. All other verbs
operate on a GPT geom. Also, the GPT entry oriented verbs require
a non-downgraded GPT.
o Have all verbs take an optional flags parameter. The flags parameter
is a string of single-letter flags. The typical use of these flags
is to enable certain behaviour in support fo the gpt(8) tool.
o Add dummy implementations for the destroy and recover verbs.
This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.
If the packet is rejected from pfil(9) then continue the loop rather than
returning, this means that we can still try to send it out the remaining
interfaces but more importantly the mbuf is freed and refcount decremented on
exit.
just discard the received frame and reuse the old mbuf.
This should prevent the connection from stalling after high network traffic.
MFC after: 2 weeks
a new mbuf, just discard the received frame and reuse the old mbuf.
This should fix kernel panics on high network traffic.
Obtained from: NetBSD (joerg@)
MFC after: 2 weeks
It may be the case that you may hear some unwanted noise while
playing back with 24/32 bit. This is a problem in the USB system.
Explanation from Hans Petter Selasky:
---snip---
The current USB sound driver only uses one isochronous
buffer, that is restarted when it is completed. This will lead to a short
period of time, +1ms, where no sound data is sent to the external USB device.
Depending on the load of your computer, this can be as much as 50ms. So the
USB sound driver must use 2 isochronous transfers. At the beginning one will
queue both. Then these are restarted on completion. This will result in a
constant-rate data stream to the external sound device, a minimum sound
buffer equal to the size of the isochronous buffer, and possibly the sound
will reach your ears with less delay. Little delay is a result of constant
data rate. Currently only my USB driver will support that. If one tries that
with the USB driver in *BSD, then it will crash at the first moment one gets
a buffer underrun.
---snip---
Submitted by: Kazuhito HONDA <kazuhito@ph.noda.tus.ac.jp>
Mono-recording still not tested by: julian
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.
These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler). Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.
- Eliminate the ktrace worker thread and global record queue, as they
are no longer used. Keep the global free record list, as records
are still used.
- Add a per-process record queue, which will hold any asynchronously
generated records, such as from context switches. This replaces the
global queue as the place to submit asynchronous records to.
- When a record is committed asynchronously, simply queue it to the
process.
- When a record is committed synchronously, first drain any pending
per-process records in order to maintain ordering as best we can.
Currently ordering between competing threads is provided via a global
ktrace_sx, but a per-process flag or lock may be desirable in the
future.
- When a process returns to user space following a system call, trap,
signal delivery, etc, flush any pending records.
- When a process exits, flush any pending records.
- Assert on process tear-down that there are no pending records.
- Slightly abstract the notion of being "in ktrace", which is used to
prevent the recursive generation of records, as well as generating
traces for ktrace events.
Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on. I.e., performing a
drain every (n) records.
MFC after: 1 month
Discussed with: jhb
Requested by: Marc Olzheim <marcolz at stack dot nl>
- several TerraTec TValue [1]
- PixelView PlayTV Pro REV-4C [2]
In case you have the PixelView card, please tell us the "pciconf -v -l"
output on multimedia@FreeBSD.org if it works. There are revisions out there
which may not work and we need to know which ones work.
PR: 53383 [1], 76002 [2]
Submitted by: Tanja Wittke <tawi@gruft.de> [1], barner [1],
Dan Angelescu <mrhsaacdoh@yahoo.com> [2]
MFC after: 2 months
This is the only file of > 1700 files in a buildkernel here doing that.
It makes reproducible builds (same source => same binary) impossible.
Spotted by: devel/ccache
is that gdb knows SIGLWP and will pass it to program, otherwise gdb
will print out "unknown signal" and discard it, and then thread
cancellation won't work for libthr under gdb.
MFC: 3 days
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.
from the printer and discarding the data even if the ulpt device
was opened for reading. This resulted in crashes because two
conconcurrent read transfers were using the same transfer structure.
PR: usb/88886
Reported By: Alex Pivovarov
MFC after: 1 week
happiness, as well as correct other bugs:
- Replace notion of current and saved accounting credential/vnode with a
single credential/vnode and an acct_suspended flag. This simplifies the
accounting logic substantially.
- Replace acct_mtx with acct_sx, a sleepable lock held exclusively during
reconfiguration and space polling, but shared during log entry
generation. This avoids holding a mutex over sleepable VFS operations.
- Hold the sx lock over the duration of the I/O so that the vnode I/O
cannot occur after vnode close, which could occur previously if
accounting was disabled as a process exited.
- Write the accounting log entry with Giant conditionally acquired based
on the file system where the log is stored. Previously, the accounting
code relied on the caller acquiring Giant.
- Acquire Giant conditionally in the accounting callout based on the file
system where the accounting log is stored. Run the callout MPSAFE.
- Expose acct_suspended via a read-only sysctl so it is possibly to
programmatically determine whether accounting is suspended or not without
attempting to parse logs.
- Check both acct_vp and acct_suspended lock-free before entering the
accounting sx lock in acct().
- When accounting is disabled due to a VBAD vnode (i.e., forceable unmount),
generate a log message indicating accounting has been disabled.
- Correct a long-standing bug in how free space is calculated and compared
to the required space: generate and compare signed results, not unsigned
results, or negative free space will cause accounting to not be suspended
when required, or worse, incorrectly resumed once negative free space is
reached.
MFC after: 2 weeks
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
The following repo-copies were made (by Mark Murray):
sys/i386/isa/spkr.c -> sys/dev/speaker/spkr.c
sys/i386/include/speaker.h -> sys/dev/speaker/speaker.h
share/man/man4/man4.i386/spkr.4 -> share/man/man4/spkr.4
copy of Ethernet address.
- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
state about each open file, and identify the first process in the process
table that references the file. This is helpful in debugging leaks of
file descriptors.
MFC after: 1 week
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.
This issue affects HEAD and RELENG_6 only.
PR: 88249
Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com>
MFC after: 3 days
with the file descriptor. When a file descriptor is closed as a result
of garbage collecting a UNIX domain socket, the file descriptor will
not have any associated thread, so the logic to identify advisory locks
held by that thread is not appropriate. Check the thread for NULL to
avoid this scenario. Expand an existing comment to say a bit more about
this.
MFC after: 1 week
thread context. While it doesn't matter too much at the moment, in
the future we could be back in the same boat if/when more restrictions
are placed (or enforced) in a SWI.
Suggested by: njl, bde, jhb, scottl
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
this leads to counter slipping back, when if->if_oerrors
is recalculated in em_update_stats_counters(). Instead
increase watchdog counter in em_watchdog() and take it
into account in em_update_stats_counters().
Submitted by: ade [1]
- disable jumbo frame support on strict alignment architectures due
to the limitation of hardware. The driver needs a fix-up code for
RX side. The fix will show up in near future.
- fix endian issue for 82544 on PCI-X bus. I couldn't test this as
I don't have the NIC/hardware.
- prefer PCIR_BAR to hardcoded EM_MMBA.
- Properly checks for for 64bit BAR [1]
- replace inl/outl with bus_space(9) [1]
- fix endian issue on VLAN handling.
- reorder header files and remove unnecessary one.
Reviewed by: cognet
No response from: pdeuskar, tackerman
Obtained from: OpenBSD [1]
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.
Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.
Reviewed by: tegge
in the hardware interrupt context (even if it is likely just an
ithread). We don't document that suspend/resume routines are run from
such a context and some of the things that happen in those routines
aren't interrupt safe. Since there's no real need to run from that
context, this restores assumptions that suspend routines have made.
This fixes Thierry Herbelot's 'Trying to sleep while sleeping is
prohibited' problem.
nearly identical to wintel/ia32, with a couple of tweaks. Since it is
so similar to ia32, it is optionally added to a i386 kernel. This
port is preliminary, but seems to work well. Further improvements
will improve the interaction with syscons(4), port Linux nforce driver
and future versions of the xbox.
This supports the 64MB and 128MB boxes. You'll need the most recent
CVS version of Cromwell (the Linux BIOS for the XBOX) to boot.
Rink will be maintaining this port, and is interested in feedback.
He's setup a website http://xbox-bsd.nl to report the latest
developments.
Any silly mistakes are my fault.
Submitted by: Rink P.W. Springer rink at stack dot nl and
Ed Schouten ed at fxq dot nl
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.
Discussed with: phk, pjd
have started aio, instead, initialize aio management structure
if it hasn't been done, the reason to adjust this behavior is
to make it a bit friendly for threaded program, consider two
threads, one submits aio_write, and another just calls
aio_waitcomplete to wait any I/O to be completed and recycle the
aio requests, before submitter doing any I/O, the recycler wants
to wait in kernel. This also fixes inconsistency with other aio
syscalls.
This hasn't been true on i386 for at least a decade, probably longer, but
I'm too lazy to look up the exact year that PAE support was introduced.
Thus, this driver doesn't work on PAE.
X-MFC After: now
softc lists and associated mutex are now unused so these have been removed.
Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.
Idea by: brooks
Reviewed by: brooks
- Use curthread for calls to knlist_delete() and add a big comment
explaining why as well as appropriate assertions.
- Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them.
- Use fget() family of functions to lookup file objects instead of
grovelling around in file descriptor tables.
- Destroy the aio_freeproc mutex if we are unloaded.
Tested on: i386
-Change unconditional aquisition of Giant to only pickup Giant if the vnode
for the controlling tty resides on a non-mpsafe file system.
-Pickup Giant around executable vnode reference counting operations only if
the executable resides on a non-mpsafe file system.
-If this process is being traced, pickup Giant for trace file reference count
operations only if it resides on a non-mpsafe file system.
Discussed with: jhb
Tested by: kris
the RocketPort unit number in the name of the devices. This means that
unit 0 device names will change from ttyR0 .. ttyRf to ttyR00 .. ttyR0f.
Reviewed by: phk
retransmitted without suppression, while there is demand for
such ARP entry. As before, retransmission is rate limited to
one packet per second. Details:
- Remove net.link.ether.inet.host_down_time
- Do not set/clear RTF_REJECT flag on route, to
avoid rt_check() returning error. We will generate error
ourselves.
- Return EWOULDBLOCK on first arp_maxtries failed
requests , and return EHOSTDOWN/EHOSTUNREACH
on further requests.
- Retransmit ARP request always, independently from return
code. Ratelimit to 1 pps.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.
The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.
There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.
Tested on: i386 (SMP and UP)
the interface. This allows run-time selection of MMU code, based
on CPU-type detection, or tunable-overrides when testing new code.
Pre-requisite for G5 support.
conf/files.powerpc
- remove pmap.c
- add mmu_if.h, mmu_oea.c, pmap_dispatch.c
powerpc/include/mmuvar.h
- definitions for MMU implementations
powerpc/include/pmap.h
- remove pmap_pte_spill declaration
- add pmap_mmu_install declaration
- size the phys_avail array
- pmap_bootstrapped is now global-scope
powerpc/powerpc/machdep.c
- call kobj_machdep_init early in the boot sequence to allow
kobj usage prior to SI_SUB_LOCK
- install the OEA pmap code. This will be moved to CPU-specific
init code in the future.
powerpc/powerpc/mmu_if.m
- Kobj MMU interface definitions
powerpc/powerpc/pmap_dispatch.c
- central dispatch for pmap calls
- contains the global mmu kobj and the routine to locate the
the mmu implementation and init the kobj
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on. Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.
Observed by: tegge
acpi_resource change was a minor nit offered as an early candidate for
the recent ACPICA import problem and the acpi.c change is one I need to
test still that makes the ordered probing of system devices actually work
as advertised (probe devices in order based on the type of device rather
than in the order we encounter them in the device tree).
entry that is not zero, assume that it is really a hard-wired IRQ (commonly
used for APIC routing) and not a source index. In practice, we've only
ever seen source indices of 0 for legitimate non-hard-wired _PRT entries.
Reviewed by: njl
Tested by: Alex Lyashkov shadow at psoft dot net
MFC after: 2 weeks
After a number of tests using nop's to change the alignment, it was
confirmed that the mtibat instructions should be cache-aligned.
FreeScale app note AN2540 indicates that the isync before and after
the mtdbat is the right thing to do, but sync/isync isn't required
before the mtibat so it has been removed.
Fix by using a ".balign 32" to pull the code in question to the correct
alignment.
MFC after: 3 days
Intel's web site requires some minor tweaks to get it to work:
- The driver seems to have been released with full WMI tracing enabled,
and makes references to some WMI APIs, namely IoWMIRegistrationControl(),
WmiQueryTraceInformation() and WmiTraceMessage(). Only the first
one is ever called (during intialization). These have been implemented
as do-nothing stubs for now. Also added a definition for STATUS_NOT_FOUND
to ntoskrnl_var.h, which is used as a return code for one of the WMI
routines.
- The driver references KeRaiseIrqlToDpcLevel() and KeLowerIrql()
(the latter as a function, which is unusual because normally
KeLowerIrql() is a macro in the Windows DDK that calls KfLowewIrql()).
I'm not sure why these are being called since they're not really
part of WDM. Presumeably they're being used for backwards
compatibility with old versions of Windows. These have been
implemented in subr_hal.c. (Note that they're _stdcall routines
instead of _fastcall.)
- When querying the OID_802_11_BSSID_LIST OID to get a BSSID list,
you don't know ahead of time how many networks the NIC has found
during scanning, so you're allowed to pass 0 as the list length.
This should cause the driver to return an 'insufficient resources'
error and set the length to indicate how many bytes are actually
needed. However for some reason, the Intel driver does not honor
this convention: if you give it a length of 0, it returns some
other error and doesn't tell you how much space is really needed.
To get around this, if using a length of 0 yields anything besides
the expected error case, we arbitrarily assume a length of 64K.
This is similar to the hack that wpa_supplicant uses when doing
a BSSID list query.
Move what can be moved (UMA zones creation, pv_entry_* initialization) from
pmap_init2() to pmap_init().
Create a new function, pmap_postinit(), called from cpu_startup(), to do the
L1 tables allocation.
pmap_init2() is now empty for arm as well.
sio(4) will claim it. This change therefore only affects how ports
are handled when they are not claimed by sio(4), and in principle
will improve hardware support.
MFC after: 2 months
from there. All others get broken up and free'd individually to the mbuf
and cluster zones.
The packet zone is a secondary zone to the mbuf zone. There is currently
a limitation in UMA which prevents decreasing the packet zone stock when
the mbuf and cluster zone are drained and all their members are part of
packets. When this is fixed this change may be reverted.
- Fix a typo in rsmisc.c and a style change for consistency.
This patch will also appear in future ACPI-CA release.
Submitted by: Robert Moore <robert dot moore at intel dot com>
Tested by: ru
descriptor. This should fix the "memory modified after free" panics. This
patch will appear in a future acpi-ca distribution.
Submitted by: Robert Moore <robert.moore / intel.com>
Tested by: Peter Holm
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().
Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.
Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.
Introduce the vm.pmap.pv_entries tunable on alpha and ia64.
Eliminate some unnecessary white space.
Discussed with: tegge (item #1)
Tested by: marcel (ia64)
a synchronous reprogramming of hardware MAC filters if the physical
interface are up and running. Previously, MAC filters would be
reconfigured only when the fec interface was brought up.
- Disallow bundle reconfiguration when virtual
interface is running; otherwise, removing a
port from a running configuration will cause
a panic in the start() method on the next packet
on an assumption that a bundle has an even
number of ports (2 or 4).
- Disallow bringing of virtual interface to a
running state when a bundle size is 0; otherwise,
adding and then removing the port will similarly
cause a panic.
- Add missing initialization of fec_ifstat when
adding a new port and fix media status reporting
when virtual interface isn't yet up (check for
fec_status of 1 rather than != 0).
previously, ifp->if_type was set to IFT_ETHER by
ether_ifattach(), now it's done by if_alloc() so
an assignment of if_type to IFT_PROPVIRTUAL after
if_alloc() but before ether_ifattach() broke it.
This makes arp(8) and friends happy about the fec
interfaces, and will allow us to use if_setlladdr()
on the fec interface.
- Set/reset IFF_DRV_RUNNING/IFF_DRV_OACTIVE in init()
and stop() methods rather than in ioctl(), like the
rest of the drivers do. This fixes a bug when an
"ifconfig fec0 ipv4_address" would not have made
the interface running, didn't launch the ticker
function to track media status of bundled ports,
etc.
used in the base system. This has been much discussed in the past
(typically people giving me a hard time for it). Since all that was
added to config was nocpu, and since we don't use it, we don't need to
bump the version.
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.
Tested on: alpha, sparc64
buffers *and* there are no buffers queued up for writing. The bug
was that NMODIFIED was being cleared even while there were buffers
scheduled to be written out, which leads to all sorts of interesting
bugs - one where the file could shrink (because of a post-op getattr
load, say) causing data in buffer(s) queued for write to be tossed,
resulting in data corruption.
Submitted by: Mohan Srinivasan
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
source is first enabled similar to how intr_event's now allocate ithreads
on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins. On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors. Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.
Tested on: amd64, i386
for code to start out on one CPU when thunking into Windows
mode in ctxsw_utow(), and then be pre-empted and migrated to another
CPU before thunking back to UNIX mode in ctxsw_wtou(). This is
bad, because then we can end up looking at the wrong 'thread environment
block' when trying to come back to UNIX mode. To avoid this, we now
pin ourselves to the current CPU when thunking into Windows code.
Few other cleanups, since I'm here:
- Get rid of the ndis_isr(), ndis_enable_interrupt() and
ndis_disable_interrupt() wrappers from kern_ndis.c and just invoke
the miniport's methods directly in the interrupt handling routines
in subr_ndis.c. We may as well lose the function call overhead,
since we don't need to export these things outside of ndis.ko
now anyway.
- Remove call to ndis_enable_interrupt() from ndis_init() in if_ndis.c.
We don't need to do it there anyway (the miniport init routine handles
it, if needed).
- Fix the logic in NdisWriteErrorLogEntry() a little.
- Change some NDIS_STATUS_xxx codes in subr_ntoskrnl.c into STATUS_xxx
codes.
- Handle kthread_create() failure correctly in PsCreateSystemThread().
based jumbo 9k and jumbo 16k cluster support.
All mbuf's with external storage attached are mandatory reference
counted. For clusters and jumbo clusters UMA provides the refcnt
storage directly. It does not have to be separatly allocated. Any
other type of external storage gets its own refcnt allocated from
an UMA mbuf refcnt zone instead of normal kernel malloc.
The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer
publically accessible. The proper m_* functions have to be used.
mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well
as 9k and 16k clusters.
Clusters and jumbo clusters may be obtained without attaching it
immideatly to an mbuf. This is for high performance cluster
allocation in network drivers where mbufs are attached after the
cluster has been filled.
Tested by: rwatson
Sponsored by: TCP/IP Optimizations Fundraise 2005
destruction:
- Backout 1.62, since it doesn't fix all possible
problems.
- Upon node creation, put an additional reference on node.
- Add a mutex and refcounter to struct ngsock. Netgraph node,
control socket and data socket all count as references.
- Introduce ng_socket_free_priv() which removes one reference
from ngsock, and frees it when all references has gone.
- No direct pointers between pcbs and node, all pointing
is done via struct ngsock and protected with mutex.
- Introduce ng_topo_mtx, a mutex to protect topology changes.
- In ng_destroy_node() protect with ng_topo_mtx the process
of checking and pointing at ng_deadnode. [1]
- In ng_con_part2() check that our peer is not a ng_deadnode,
and protect the check with ng_topo_mtx.
- Add KASSERTs to ng_acquire_read/write, to make more
understandible synopsis in case if called on ng_deadnode.
Reported by: Roselyn Lee [1]
- Introduce a new flags NGQF_QREADER and NGQF_QWRITER,
which tell how the item should be actually applied,
overriding NGQF_READER/NGQF_WRITER flags.
- Do not differ between pending reader or writer. Use only
one flag that is raised, when there are pending items.
- Schedule netgraph ISR in ng_queue_rw(), so that callers
do not need to do this job.
- Fix several comments.
Submitted by: julian
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it. It only adds a layer of confusion. The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.
Non-native code is not changed in this commit. For compatibility
MT_HEADER is mapped to MT_DATA.
Sponsored by: TCP/IP Optimization Fundraise 2005
ktr_tid as part of gathering of ktr header data for new ktrace
records. The continued use of intptr_t is required for file layout
reasons, and cannot be changed to lwpid_t at this point.
MFC after: 1 month
Reviewed by: davidxu
intptr_t. The buffer length needs to be written to disk as part
of the trace log, but the kernel pointer for the buffer does not.
Add a new ktr_buffer pointer to the kernel-only ktrace request
structure to hold that pointer. This frees up an integer in the
ktrace record format that can be used to hold the threadid,
although older ktrace files will have a garbage ktr_buffer field
(or more accurately, a kernel pointer value).
MFC after: 2 weeks
Space requested by: davidxu
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.
Submitted by: tegge
MFC after: 1 week
fails, reclaim a pv entry by destroying a mapping to an inactive
page.
Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.
since the link takes a bit to negotiate, the information is pretty
much never available during the probe. As such, the boot output
pretty much always prints N/A for speed and duplex. Since we print
out the output of ifconfig during the user space boot, this early
boot information is also generally redundant, and added to the noise.
MFC after: 2 weeks
before dereferencing it. Certain corrupt kernel modules might not have
a valid hash table, and would cause a kernel panic when they were loaded.
Instead of panic'ing, the kernel now prints out a warning that it is
missing the symbol hash table.
Tested by: Benjamin Close Benjamin dot Close at clearchain dot com
MFC after: 1 week
PCI-ISA bridge. Thus, when viapm0 or viapropm0 attaches, isab0 dosen't
attach so there is no isa0 bus hung off of that bridge. In the non-ACPI
case, legacy0 will add an isa0 anyway as a fail-safe, but ACPI assumes that
any ISA bus will be enumerated via a bridge. To fix this, call
isab_attach() to attach an isa0 ISA child bus device if the pm or propm
device we are probing is a PCI-ISA bridge. Both drivers now have to
implement the bus_if interface via the generic methods for resource
allocation, etc. to work. Also, we now add 2 new ISA bus drivers that
attach to viapm and viapropm devices.
PR: kern/87363
Reported by: Oliver Fromme olli at secnetix dot de
Tested by: glebius
MFC after: 1 week
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
in bytes to start off with. This caused the GPT geom sniffer to attempt
a seek just back from the end of the 'disk', which resulted in a > 4G
seek, causing gdb psim to exit since it only supports 32-bit seeks.
The size of the disk should really be specified in the psim device tree,
but for now do the minimal amount of work to get psim to run again.
aac_alloc_sync_fib(). aac_alloc_sync_fib() will assert that the I/O locks
are held. This fixes a panic on system boot up when the aac(4) device's
bus_generic_attach() routine is called.
Reviewed by: scottl
OpenFirmware. FreeBSD/ppc uses SPRG0 as the per-cpu data area pointer,
and SPRG1-3 as temporary registers during exception handling. There
have been a few instances where OpenFirmware does require these to
be part of it's context, such as cd-booting an eMac.
reported by: many
MFC after: 3 days
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN. This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit. This continues a move towards the socket layer
acting as a library for the protocol.
Bump __FreeBSD_version due to a change in the in-kernel protocol
interface. This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.
that caused a premature exit after calling a fast interrupt handler
and bypassing a much needed critical_exit() and the scheduling of
the interrupt thread for non-fast handlers. In short: unbreak :-)
- Return EINVAL if play_format or rec_format is set but the corresponding
sample rate is 0.
- Don't try to set the playback or recording format to 0. Previously,
issuing an AIOSFMT ioctl with an all-zeroes snd_chan_param would
trigger a KASSERT in chn_fmtchain(); I'm unsure about the effects on
a kernel without INVARIANTS. After this commit, issuing AIOSFMT with
an all-zeroes snd_chan_param is equivalent to issuing AIOGFMT.
MFC after: 2 weeks
trying to access user-space stack addresses when a user fault
is encountered, as occurs when GEOM KTR code is handling a page fault
and is using stack_save() to capture a trace for debug purposes.
It may be possible to walk beyond the trap-frame if it is a kernel fault,
as db_backtrace() does, but I don't think that complexity is needed in
this routine.
MFC after: 3 days
convert to or from timeval frequently.
Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.
Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.
Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
actual resource values we received from the system rather than the range
we requested. Since we request a range starting at 0, we would record
that number. Later, since this == 0, we'd allocate again. However,
we wouldn't write the new resource into the BAR. This resulted in
a resource leak as well as a BAR that couldn't access the resource at
all since rman_get_start, et al, were wrong.
MFC After: 1 week (assuming RELENG_6 is open for business)
the ifp, so you can't call it before doing if_alloc(). Also, there's
really no need to call it here anyway: the code I originally ported from
OpenBSD incorrectly set the station address only once at device attach
time, instead of setting in txp_init(). This meant you couldn't change
the address with ifconfig txp0 ether xx:xx:xx:xx:xx:xx. I added the
call to txp_set_filter() in txp_init() to correct this, but forgot to
remove the call from txp_attach(). Until now, it never mattered.
With this fix, the txp driver tests good:
txp0: <3Com 3cR990-TX-97 Etherlink with 3XP Processor> port 0xb800-0xb87f mem 0xe6800000-0xe683ffff irq 12 at device 10.0 on pci0
txp0: Ethernet address: 00:01:03:d4:91:4f
and channel to ifconfig. Also use the SSID and channel info from
the association info that we already have instead of using ndis_get_info()
to ask the driver for it again.
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.
Submitted by: green
kern/87959 cracauer ext2fs: no cp(1) possible, mmap returns EINVAL
ext2fs was missing vnode_create_vobject.
(Reisefs probably has the same problem but I want to get this in quick
for 6-release)
drivers I started quite some time before.
Retire the old i386-only pcf driver, and activate the new general
driver that has been sitting in the tree already for quite some
time.
Build the i2c modules for sparc64 architectures as well (where I've
been developing all this on).
o Fix typo in comment
o s/-100/BUS_PROBE_GENERIC/
o s/err/error/ for consistency
o Remove non-applicable comment
o Allow uart_bus_probe() to return the predefined BUS_PROBE_*
contants. In this case: explicitly test for error > 0.
With this change, the driver tests good (at least on i386):
wb0: <Winbond W89C840F 10/100BaseTX> port 0xb800-0xb87f mem 0xe6800000-0xe680007f irq 12 at device 10.0 on pci0
miibus1: <MII bus> on wb0
amphy0: <Am79C873 10/100 media interface> on miibus1
amphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
wb0: Ethernet address: 00:00:e8:18:2a:02
wb0: link state changed to DOWN
wb0: link state changed to UP
- Add locked variants of init() and start().
- Use callout_*() to manage callout.
- Test IFF_DRV_RUNNING rather than IFF_UP in wb_intr() to see if we are
still active when an interrupt comes in.
I couldn't find any of these cards anywhere to test on myself, and google
turns up references to FreeBSD and OpenBSD manpages for this driver when
trying to locate a card that way. I'm not sure anyone actually uses these
cards with FreeBSD.
Tested by: NO ONE (despite repeated requests)
I had to initialize the ifnet a bit earlier in attach so that the
if_printf()'s in vr_reset() didn't explode with a page fault.
- Use M_ZERO with contigmalloc() rather than an explicit bzero.
even initialized it, but it never used it.
- Use callout_*() to manage the callout.
- Use m_devget() to copy data out of the rx buffers rather than doing it
all by hand.
- Use m_getcl() to allocate mbuf clusters rather than doing it all by hand.
- Don't free the software descriptor for a rx ring entry if we can't
allocate an mbuf cluster for it. We left a dangling pointer and never
reallocated the entry anyway. OpenBSD's code (from which this was
derived) has the same bug.
Tested by: NO ONE (despite repeated requests)
Reviewed by: wpaul (5)