A [hopefully] conforming style(9) revamp of mb_alloc and related code.
(This was possible due to bde's remarkable patience.)
Submitted by: (in large part) bde
Reviewed by: (the other part) bde
see people trip over it. Do not set the FIFO trigger to just before it
would otherwise overflow. Give it a little more slop so characters aren't
lost if the interrupt is delayed by other system activities.
MFC maybe: 7 days
- Temporary fix a bug of Intel ACPI CA core code.
- Add OS layer ACPI mutex support. This can be disabled by
specifying option ACPI_NO_SEMAPHORES.
- Add ACPI threading support. Now that we have a dedicate taskqueue for
ACPI tasks and more ACPI task threads can be created by specifying option
ACPI_MAX_THREADS.
- Change acpi_EvaluateIntoBuffer() behavior slightly to reuse given
caller's buffer unless AE_BUFFER_OVERFLOW occurs. Also CM battery's
evaluations were changed to use acpi_EvaluateIntoBuffer().
- Add new utility function acpi_ConvertBufferToInteger().
- Add simple locking for CM battery and temperature updating.
- Fix a minor problem on EC locking.
- Make the thermal zone polling rate to be changeable.
- Change minor things on AcpiOsSignal(); in ACPI_SIGNAL_FATAL case,
entering Debugger is easier to investigate the problem rather than panic.
and a generic resource_list_print_type() function to print all resouces
of a certain type in a resource list.
Use ulmin()/ulmax() instead of min()/max() in two places to handle
u_longs correctly.
argument specifying the boundary for the resource allocation.
Use ulmin()/ulmax() instead of min()/max() in some places to correctly
deal with the u_long resource range specifications.
code only passed up the connection to the tcp stack when it was complete,
so it went directly into the so_comp (complete) queue. However, with
accept filters, there is an additional phase before calling it "complete".
Reviewed by: jlemon
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.
against VM_WAIT in the pageout code. Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.
Hopefully MFC: before the 4.5 release
for this file, but here goes nothing. This was my first attempt at
tidying up this file. Unfortunately, it just exposes many more horrors
in the code itself that had been masked by the eyesore that was there
before. I think this just needs to be put out of its misery.
the wrong VOP descriptor. This misuse caused VFS-cached vnodes to be
re-cached, resulting in the leak. This commit is an interim fix until DES
has a chance to rework the code involved.
to Phil Kernick:
"The problem is that in full duplex mode, the Conexant chip always reports a
carrier lost error, even when the frame is successfully sent. So, if we
have a Conexant chip, then ignore carrier lost when in full duplex
mode."
Since the Xircom chips seem to have the same issue and since we already
have a workaround for this, just expand the workaround test to also
check for DC_IS_CONEXANT().
to a label is inside an #ifdef block, then the label should *also* be
inside an #ifdef block. Hide the "done:" label which is only used if
DEVICE_POLLING is enabled under #ifdef DEVICE_POLLING.
must be cleared to prevent machine hanging (presently aflicts -current
and -stable).
Problem reported by Bruce Montague <brucem@cse.iitkgp.ernet.in>
PR: kern/29769 (probably)
if we've been given an RTA_IFP or changed RTA_IFA sockaddr.
This fixes the following bug:
>/dev/tun100
>/dev/tun101
ifconfig tun100 1.2.3.4 5.6.7.8
ifconfig tun101 1.2.3.4 6.7.8.9
route change 6.7.8.9 -ifa 1.2.3.4 -iface -mtu 500
which erroneously changed tun101's host route to have an ifp of tun100
(rt_getifa() sets the ifp after calling ifa_ifwithnet(1.2.3.4))
This incarnation submitted by: ru
An old route will be NULL at that point if a packet were initially
routed to an interface (using the IP_ROUTETOIF flag.)
Submitted by: Igor Timkin <ivt@gamma.ru>
All TCP ISNs that are sent out are valid cookies, which allows entries
in the syncache to be dropped and still have the ACK accepted later.
As all entries pass through the syncache, there is no sudden switchover
from cache -> cookies when the cache is full; instead, syncache entries
simply have a reduced lifetime. More details may be found in the
"Resisting DoS attacks with a SYN cache" paper in the Usenix BSDCon 2002
conference proceedings.
Sponsored by: DARPA, NAI Labs
the shutdown request at reboot/halt time.
Disable the printf 'vnlru process getting nowhere, pausing...' and instead
export the count to the debug.vnlru_nowhere sysctl.
of polling interfaces at the lowest possible priority
(this might result in softnetisr being scheduled, but there is
no risk of livelock because they have a higher priority than
this thread).
otherwise breaks on the Alpha arch. I think this is wrong since i'd
actually like to probe for a PC architecture, not for a particular CPU
type. Anyway, now it's again the way it used to be.
by me to make it more efficient. The original code had serious balancing
problems and could also deadlock easily. This code relegates the vnode
reclamation to its own kproc and relaxes the vnode reclamation requirements
to better maintain kern.maxvnodes. This code still doesn't balance as well
as it could, but it does a much better job then the original code.
Approved by: re@freebsd.org
Obtained from: ps, peter, dillon
MFS Assuming: Assuming no problems crop up in Yahoo testing
MFC after: 7 days
remove the check from addupc_task(). It would need sched_lock while
testing the flag anyways.
- Always read sticks while holding sched_lock using a temporary variable
where needed.
- Always init prticks to 0 in ast() to quiet a warning.
sense, and mode select into their 10 byte equivalents. Eventually the
da(4) driver will become more intelligent about this, or at least allow
umass(4) to pass quirks in directly. However, this is a functional
workaround until a better fix is implemented.
- Use the 6 to 10 conversion function to allow the ATAPI and UFI command
sets to emulate 6 byte commands with 10 byte commands.
- Use the ATAPI command set rather than UFI for the ScanLogic SL11R-IDE
as it supports the SYNCH_CACHE command.
- Enable ATAPI command set support.
- Pass READ/WRITE_12 commands through for UFI support as the UFI spec
says they should be supported.
- Update a comment in the UFI translation function since we handle
MODE_SELECT.
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.
The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.
Reviewed by: peter
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().
Tested on: i386, alpha
- Axe inlvtlb_ok as it was completely redundant with smp_active.
- Remove references to non-existent variable and non-existent file
in i386/include/smp.h.
- Don't perform initializations local to each CPU while holding the
ap boot lock on i386 while an AP bootstraps itself.
- Reorganize the AP startup code some to unify the latter half of the
functions to bring an AP up. Eventually this might be broken out into
a MI function in subr_smp.c.
a major slowdown, and re-enable stats overflow interrupts.
For future reference, the bug was in our code, and not
some bug in the 3com chips.
Reviewed by: wpaul
MFC after: 2 days
the link rate - some ich motherboards overclock ac97 out of the box.
Will hopefully replace this with a callibration loop in time for 4.5R
freeze.
Problem reported by Luigi Rizzo and fix derived from his code (put
diff in ich.c rather than ac97.c).
MFC after: 3 days
superblock that is already set up to handle pointer types. This
fixes an accidental change in the superblock size on 64-bit platforms
caused by revision 1.24.
The description field is unused in -stable, so the MFC there is equivalent
to a comment. It can be done at any time, i am just setting a reminder
in 45 days when hopefully we are past 4.5-release.
MFC after: 45 days
variables. Use the -d flag in sysctl(8) to see this information.
Possible extensions to sysctl:
+ report variables that do not have a description
+ given a name, report the oid it maps to.
Note to developers: have a look at your code, there are a number of
variables which do not have a description.
Note to developers: do we want this in 4.5 ? It is a very small change
and very useful for documentation purposes.
Suggested by: Orion Hodson
. The main device node now supports automatic density selection for
commonly used media densities. So you can stuff your 1.44 MB and
720 KB media into your drive and just access /dev/fd0, no questions
asked. It's all that easy, isn't it? :)
. Device density handling has been completely overhauled. The old way
of hardwired kernel density knowledge is no longer there. Instead,
the kernel now implements 16 subdevices per drive. The first
subdevice uses automatic density selection, while the remaining 15
devices are freely programmable. They can be assigned an arbitrary
name of the form /dev/fd[:digit]+.[:digit:]{1,4}, where the second
number is meant to either implement device names that are mnemonic
for their raw capacity (as it used to be), or they can alternatively
be created as "anonymous" devices like fd0.1 through fd0.15,
depending on the taste of the administrator. After creating a
subdevice, it is initialized to the maximal native density of the
respective drive type, so it needs to be customized for other
densities by using fdcontrol(8). Pseudo-partition devices (fd0a
through fd0h) are still supported as symlinks.
. The old hack to use flags 0x1 to always assume drive 0 were there is
no longer supported; this is now supposed to be done by wiring the
devices down from the loader via device flags. On IA32
architectures, the first two drives are looked up in the CMOS
configuration records though. On PCMCIA (i. e., the Y-E Data
controller of the Toshiba Libretto), a single drive is always
assumed.
. Other specialities like disabling the FIFO and not probing the drive
at boot-time are selected by per-controller or per-drive flags, too.
. Unit attentions (media has been changed) are supposed to be detected
now; density autoselection only occurs after a unit attention. (Can
be turned off by a per-drive flag, this will cause each Fdopen() to
perform the autoselection.)
. FM floppies can be handled now (on controllers that actually support
it -- not all do these days).
. Fdopen() can be told to avoid density selection by setting
O_NONBLOCK; this leaves the descriptor in a half-opened state where
only a few ioctls are accepted. This is necessary to run fdformat
on a device that uses automatic density selection (since you cannot
autoselect on an unformatted medium, obviously).
. Just differentiate between a plain old NE765 and the enhanced chips,
but don't try more; the existing code was wrong and only misdetected
the chips anyway.
BUGS and TODOs:
. All documentation update still needs to be done.
. Formatting not-so-standard format yields unpredictable results; i
have yet to figure out why this happens. "Standard" formats like
720 and 1440 KB do work, however.
. rc scripts are needed to setup device nodes with nonstandard
densities (like the old /dev/fdN.MMM we used to have).
. Obtaining device flags from the kernel environment doesn't work yet,
thus currently only drives that are present in (IA32) CMOS are
really detected. Someone who knows the odds and ends about device
flags is needed here, i can't figure out what i'm doing wrong.
. 2.88 MB still needs to be done.
of mi_switch:
- Set the oncpu value for the current thread.
- Always set switchticks, not just in the SMP case.
- Add a KTR entry for fork_exit that is the same as the "new proc"
entry in mi_switch().
- Release sched_lock a bit later like we do with mi_switch().
select/poll, and therefore with pthreads. I doubt there is any way
to make this 100% semantically identical to the way it behaves in
unthreaded programs with blocking reads, but the solution here
should do the right thing for all reasonable usage patterns.
The basic idea is to schedule a callout for the read timeout when a
select/poll is done. When the callout fires, it ends the select if
it is still in progress, or marks the state as "timed out" if the
select has already ended for some other reason. Additional logic in
bpfread then does the right thing in the case where the timeout has
fired.
Note, I co-opted the bd_state member of the bpf_d structure. It has
been present in the structure since the initial import of 4.4-lite,
but as far as I can tell it has never been used.
PR: kern/22063 and bin/31649
MFC after: 3 days
Now that we've increased the size of our send / receive buffers, bursting
an entire window onto the network may cause congestion. As a result,
we will slow start beginning with a flightsize of 4 packets.
Problem reported by: Thomas Zenker <thz@Lennartz-electronic.de>
MFC after: 3 days
Non-SMP, i386-only, no polling in the idle loop at the moment.
To use this code you must compile a kernel with
options DEVICE_POLLING
and at runtime enable polling with
sysctl kern.polling.enable=1
The percentage of CPU reserved to userland can be set with
sysctl kern.polling.user_frac=NN (default is 50)
while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.
Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at
http://info.iet.unipi.it/~luigi/polling/
and also supports polling in the idle loop.
NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.
NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.
Quick description of files touched by this commit:
sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications
A similar thing has been in -stable for weeks and is completely safe.
This has very good performance implications as it saves some data
copying, and sometimes avoids triggering performance bugs in devices
(such as the "dc" and other Tulip clones) which do not like scattered
data.
for use on machines with untrusted local users, for security as well
as stability reasons.
o Lack of clarity pointed out by: David Rufino <dr@soniq.net> via bugtraq.
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.
* An edge case with shared R+W mmap()'s and truncate whereby
the system would inappropriately clear the dirty bits on
still-dirty data. (applicable to all filesystems)
THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
see vm/vm_page.c line 1641
* The straddle case for VM pages and buffer cache buffers when
truncating. (applicable to NFS client side)
* Possible SMP database corruption due to vm_pager_unmap_page()
not clearing the TLB for the other cpu's. (applicable to NFS
client side but could effect all filesystems). Note: not
considered serious since the corruption occurs beyond the file
EOF.
* When flusing a dirty buffer due to B_CACHE getting cleared,
we were accidently setting B_CACHE again (that is, bwrite() sets
B_CACHE), when we really want it to stay clear after the write
is complete. This resulted in a corrupt buffer. (applicable
to all filesystems but probably only triggered by NFS)
* We have to call vtruncbuf() when ftruncate()ing to remove
any buffer cache buffers. This is still tentitive, I may
be able to remove it due to the second bug fix. (applicable
to NFS client side)
* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
to set n_size before calling nfs_vinvalbuf or the NFS code
may recursively vnode_pager_setsize() to the original value
before the truncate. This is what was causing the user mmap
bus faults in the nfs tester program. (applicable to NFS
client side)
* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
by Kirk).
Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after: 1 week
when taking a snapshot. The two time consuming operations are
scanning all the filesystem bitmaps to determine which blocks
are in use and scanning all the other snapshots so as to be able
to expunge their blocks from the view of the current snapshot.
The bitmap scanning is broken into two passes. Before suspending
the filesystem all bitmaps are scanned. After the suspension,
those bitmaps that changed after being scanned the first time
are rescanned. Typically there are few bitmaps that need to be
rescanned. The expunging of other snapshots is now done after
the suspension is released by observing that we can easily
identify any blocks that were allocated to them after the
suspension (they will be maked as `not needing to be copied'
in the just created snapshot). For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
always deriving the credential for a newly accepted connection from
the listen socket. Previously, the selection of the credential
depended on the protocol: UNIX domain sockets would use the
connecting process's credential, and protocols supporting a creation
of the socket before the receiving end called accept() would use
the listening socket. After this change, it is always the listening
credential.
Reviewed by: green
With this change, mounting an smb share (using mount_smb, which is not
yet included in the tree) without any of smbfs, libiconv or libmchain
compiled into the kernel or loaded works.
a KTR log entry. Any KTR requests made while working on an entry are
ignored/discarded to prevent recursion. This is a better fix for the
hack to futz with the CPU mask and call getnanotime() if KTR_LOCK or
KTR_WITNESS was on. It also covers the actual formatting of the log entry
including dumping it to the display which the earlier hacks did not.
new file end will land in the middle of a file hole. Since the last
block of a file must always be allocated, the hole is filled by
allocating a block at that location. If the hole being filled is
a direct block, then the truncation may eventually reduce the
full sized block down to a fragment. When running with soft
updates, it is necessary to FSYNC the file after allocating the
block and before creating the fragment to avoid triggering a
soft updates inconsistency when the block unexpectedly shrinks.
Found by: Matthew Dillon <dillon@apollo.backplane.com>
MFC after: 1 week
where our security related sysctl tuneables are located. Also, this
will help if/when we move _security node out from under _kern as to help
make _kern less cluttered.
Approved by: rwatson
Review by: rwatson
- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.
Tested on: alpha, i386
Reviewed by: peter, jake
a GetAllNext response. Otherwise, we won't unswizzle
it correctly. This was found on linux/PPC.
This mandated creating another inline: isp_get_gan_response.
Introduce an additional device flag for those NICs which require the
transmit buffers to be aligned to 32-bit boundaries.
(the equivalen fix for STABLE is slightly simpler because there are
no supported chips which require this alignment there.)