registers are control bits or depending on the model contain additional
time bits with a different meaning than the lower ones. In order to
only read the desired time bits and not change the upper bits on write
use appropriate masks in the gettime and settime function respectively.
Due to the polarity of the stop oscillator bit and the fact that the
century bits aren't used on sparc64 not masking them didn't cause
problems so far.
- Fix two off-by-one errors in the handling of the day of week. The
genclock code represents the dow as 0 - 6 with 0 being Sunday but the
mk48txx use 1 - 7 with 1 being Sunday. In the settime function when
writing the dow to the clock the range wasn't adjusted accordingly but
the clock apparently played along nicely otherwise the second bug in
the gettime function which mapped 1 - 7 to 0 - 6 but with 0 meaning
Saturday would have been triggered. Fixing these makes the date being
stored in the same format Sun/Solaris uses and cures the "Invalid time
in real time clock. Check and reset the date immediately!" when the
date was set under Solaris prior to booting FreeBSD/sparc64. [1]
Looking at other clock drivers/code e.g. FreeBSD/alpha the former "bug",
i.e. storing the dow as 0 - 6 even when the clock uses 1 - 7, seems to
be common but might be on purpose for compatibility when multi-booting
with other OS which do the same. So it might make sense to add a flag
to handle the dow off-by-one for use of this driver on platforms other
than sparc64.
- Check the state of the battery on mk48txx that support this in the
attach function.
- Add a note that use of the century bit should be implemented but isn't
required at the moment because it isn't used on sparc64.
Problem noted by: joerg [1]
MT5 candidate.
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:
1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.
2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.
3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.
Reviewed by: tegge@
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called. Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code. I have retained
vm_proc_dispose(), since I consider its disuse a bug.
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.
The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory. Once the VM/buf thing is sorted out it is next on
the list.
Retire most of vnode method. ffs_getpages(). It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.
Retire specfs_getpages() as well, as it has no users now.
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.
Analogous to the drive level, give each volume and plex a worker thread
that picks up and processes incoming and completed BIOs.
This should fix the data corruption issues that have come up a few
weeks ago and improve performance, especially of RAID5 plexes.
The volume level needs a little work, though.
to 4.0 and RELENG_3), the BTX mini-kernel used paging rather than flat
mode and clients were limited to a virtual address space of 16 megabytes.
Because of this limitation, boot2 silently masked all physical addresses
in any binaries it loaded so that they were always loaded into the first
16 Meg. Since BTX no longer has this limitation (and hasn't for a long
time), remove the masking from boot2. This allows boot2 to load kernels
larger than about 12 to 14 meg (12 for non-PAE, 14 for PAE).
Submitted by: Sergey Lyubka devnull at uptsoft dot com
MFC after: 1 month
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.
These normally only manifest if the ndis compat module is statically
compiled into a kernel image by way of 'options NDISAPI'.
Submitted by: Dmitri Nikulin
Approved by: wpaul
PR: kern/71449
MFC after: 1 week
driver. Trim its fingernails by removing some useless bits before
fixing the 'thread not terminated on detach' problem.
o dmacnt is no longer used now that we allocate at attach time. Remove
it from struct fdc_data.
o ISPNP was only ever set, but never tested. It used to be used for the
allocation routines to change how it allocated resources. Since that's
no longer necessary, retire the flag.
o ISPCMICA was only ever tested, but never set. GC it. This removes
a special case in determining the drive type. The drive type is
now set in fdc_pcmcia.c, so the hack isn't needed anymore. Sadly,
this isn't tested with a Y-E Data pcmcia floppy drive because there
are a number of other issues that preclude it from working.
o Fix ifdef for reading from the rtc. I'm of the opinion that this ifdef
should be moved into fdc_isa.c, but not today as ideally there'd be
other fixes to the probing of children. So now we just read it on
i386 ! pc98 (there's no #define for MACHINE_ARCH, just MACHINE, hence
this slightly inelegant kludge) and amd64. The PC98 exclusion likely
isn't meaningful since pc98 uses a different driver, but will be when
merging of the pc98 floppy code into this driver is complete (this is the
other reason I think this block of code belongs outside fdc.c).
All of these changes are safe to MT5.
most if not all of our tty drivers in the future.
Centralizing this stuff enables us to remove about 100 lines of
almost but not quite perfectly copy&paste code from each tty driver.
This is necessary so source upgrades use the correct binary.
MFC after: 3 days
For the record: Problem spotted by Scott Long, who mentioned
that source upgrades from 4.7 to recent 5.x and 6.0 are broken.
Detailed analysis shows that 4.7 has a broken make(1) binary.
A breakage was fixed in RELENG_4 in make/main.c,v 1.35.2.7 by
imp@, though the commit log erroneously stated "MFC 1.68"
while in fact it should have been spelled as "MFC 1.67".
copper NICs, a link change event is posted whenever MII autopolling is
toggled off and on, which happens whenever someone calls
bge_miibus_readreg() or bge_miibus_writereg() to access the PHY
registers. This means anytime someone called the SIOCGIFMEDIA ioctl
on a bge interface, the link would reset. Even a simple "ifconfig bge0"
would do it, though other apps like dhclient or the PPPoE daemon could
trigger it as well. An obvious symptom of this problem is lots of
"bgeX: gigabit link up" messages appearing on the console for no
apparent reason.
Through experimentation, I determined that when a real link change
event occurs, the BGE_MIMODE_AUTOPOLL in the BGE_MI_MODE register
is always set, so now if we have a copper NIC and an link change
event occurs and the BGE_MIMODE_AUTOPOLL bit is clear, we ignore
the event.
Note that this does not apply to the original BCM5700 chip since we
use a different method for sensing link changes with that chip (the
status block method was broken), nor to fiber optic NICs since they
don't use the GMII PHY access registers.
another way to misinterpret the spec. Also, always fall back to the hints
probe on any attach failure, not just when _FDE fails.
Thanks to imp and scottl for finding this.
Tested by: rwatson (minimally)
MFC after: 5 days
functions that can be called from enable/disable pf as well. This improves
switching from non-altq ruleset to altq ruleset (and the other way 'round)
by a great deal and makes pfctl act like the user would except it to.
PR: kern/71746
Tested by: Aurilien "beorn" Rougemont (PR submitter)
MFC after: 3 days
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).
Add more comments ip_init().
This make "_NEC" devices appear as "NEC" which is more corrent.
The reason is tha NEC originally screwed up on the byteorder in the
model string, so now that they have realized that they prefixed the '_'
so that not every ATA driver on the planet would call them "EN C" :)
reserving it at use time is more miserly, low memory (< 16MB)
evaporates quickly on many systems, so there may not be any suitable
buffers available. This specifically doesn't use the newer, fancier
isa_dma_init to ease merging to 5.
Reviewed by: tegge, phk
the firmware status register on the card to see if the firmware is still
running. There is no way to recover from this, but at least it can give
a hint as whether the car has crashed (which happens all too often).
MFC after: 3 days
errors for the attachment process for the floppy controller. This is
a band-aide because it doesn't try any of the fallback methods when
_FDE isn't long enough, but should be sufficient for people
experiencing the dreaded mutex not initialized panic.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.
As the patch says:
This is not a pretty patch and only meant as a temporary
fix until a better solution is committed.
PR: i386/71715
Submitted by: Stephan Uphoff <ups@tree.com>
MFC after: 1 week
it travels through the IP stack. This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.
The IP source route options of a packet are now stored in a mtag instead of the
global variable.
and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.
Problem uncovered by: tegge
problems:
1) Add locking for SMP, code provided by Alan Cox
2) While testing Alan's patches, I observed serious problems with
the jumbo buffer allocation code (machine crashed twice), so I gutted
it and rewrote the receive handler to use multiple chained descriptors.
Each RX descriptor gets a single 2K cluster, and the chip will fill in
as many as it needs to hold the complete packet.
User reports that this corrects the data corruption issues previously
observed and discussed on -current.
Note that this driver still needs to be hit with the busdma stick.
I intend to inflict said beating in the near future.
MFC after: 1 week
careful with the skip condition this time. Addresses are only not taken into
account if:
- The interface is POINTTOPOINT
- There is no route installed for the address
- The user specified noalias (:0)
and - We are looking at an IPv4 address.
This should be enough paranoia to not cause any false positives.
PR: misc/69954
Discussed with: yongari
MFC after: 4 days
o Allow for up to 3 resource I/O ranges to be given for the floppy
controller, rather than just two that are allowed for now.
o Make sure that we can work with either a base address of 0x3f0 or 0x3f2.
o Create new inline functions to access the YE DATA's unique BDCR register.
o Update pccard attachment to add the fd device.
o Do some minor style(9) polishing.
# I'm guessing that the fdc pccard attachment broke some time ago, since
# there are a number of issues with it still.
save to call if_attachdomain from if_attach() (as done for if_loop.c). We
will now end up with a properly initialized if_afdata array and the nd6
callout will no longer try to deref a NULL pointer.
Still this is a temp workaround and the locking for if_afdata should be
revisited at a later point.
Requested by: rwatson
Discussed with and tested by: yongari (a while ago)
PR: kern/70393
MFC after: 5 days
filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS
tagged packets to have ip_len and ip_off in host byte order instead of
network byte order.
PR: kern/71652
Submitted by: mlaier (patch)
and sent to the DIVERT socket while the original packet continues with the
next rule. Unlike a normally diverted packet no IP reassembly attemts are
made on tee'd packets and they are passed upwards totally unmodified.
Note: This will not be MFC'd to 4.x because of major infrastucture changes.
PR: kern/64240 (and many others collapsed into that one)
doesn't do this is beyond me, but that will be investigated later. This
results in programming the chip with the correct frequency, which in turn
allows devices to negotiate up to the full 20MB/s.
and the previously malloc'ed snapshot lock.
Malloc struct snapdata instead of just the lock.
Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes).
While here, give the private readblock() function a vnode argument
in preparation for moving UFS to access GEOM directly.
preparation for integration of p4::phk_bufwork. In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly. Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
the loss of a page modified (PG_M) bit in a race between processors.
Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.
Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.
In collaboration with: tegge@
MT5 candidate
PR: 61852
calls in sb_cmd2() and sb_getmixer(). The lock has already be grabbed
before these functions are called.
This is a RELENG_5 candidate.
PR: 71189
Submitted by: stephane
MFC after: 3 days
the pseudo header. We really need the TCP packet length here. This happens
to end up in ip->ip_len in tcp_input.c, but here we should get it from the
len function variable instead.
Submitted by: yongari
Tested by: Nicolas Linard, yongari (sparc64 + hme)
MFC after: 5 days
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.
Approved by: julian
value was only enough for 8GB of RAM, the new value can do 16GB. This still
isn't optimal since it doesn't scale. Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.
Reviewed by: peter
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.
Submitted by: tegge (of course!) (with changes)
MFC after: 3 days
VT6122 gigabit ethernet chip and integrated 10/100/1000 copper PHY.
The vge driver has been added to GENERIC for i386, pc98 and amd64,
but not to sparc or ia64 since I don't have the ability to test
it there. The vge(4) driver supports VLANs, checksum offload and
jumbo frames.
Also added the lge(4) and nge(4) drivers to GENERIC for i386 and
pc98 since I was in the neighborhood. There's no reason to leave them
out anymore.
to device driver unloading. Adding these two now and MT5'ing them will
allow us to preserve binary compatibility on RELENG_5 when these facilities
are MFC'ed.
MT5 Candiate.
holds sndstat_lock across a call to uiomove(), which is not legal
to do with a mutex because of the possibility that the data transfer
could sleep because of a page fault. It is not possible to just
unlock the mutex for the uiomove() call without introducing another
locking mechanism to prevent the body of sndstat_read() from being
re-entered. Converting sndstat_lock to an sx lock is the least
complicated change.
This is a candidate for RELENG_5.
LOR: 030
MFC after: 4 days
example the maximum segment size is 64K while the boundary is set
to 8K due to controller limitations. It is impossible to NOT cross
the boundary for any segment size that's larger than the boundary.
So, once we inherited the boundary from the parent tag, make sure
to reduce the maximum segment size to the boundary if it was larger.
MT5 candidate.
branch prediction optimization for LINT, because the kernel was too
large. This commit now removes it altogether since it causes build
failures for GENERIC kernels and the various applicable trends are
such that one can expect that it these failure will cause more
problems than they're worth in the future. These trends include:
1. Alpha was demoted from tier 1 to tier 2 due to lack of active
support. The number of people willing to fix build breakages
is not likely to increase and those developers that do have the
gumption to test MI changes on alpha are not likely to spend
time fixing unexpected build failures first.
2. The kernel will only increase in size. Even though stripped-down
kernels do link without problems now, compiler optimizations (like
inlining) and new (non-optional) functionality will likely cause
stripped-down kernels to break in the future as well.
So, with my asbestos suit on, get rid of potential problems before
they happen.
MT5 candidate.
redundant at this point and should be retired). Don't free subdevs if
we don't attach any devices. This was leaving stale device_t's
around. Don't touch the device if it isn't attached since the name
isn't meaningful then. Switch from strncpy (properly used) to
strlcpy.
From a patch submitted by Peter Pentchev
device_t instances when no driver attaches. They are left around, and
we need to remember them.
# The usbd_device_handle->subdevs[] array likely is completely bogus
# at this point, but one change at a time, since its removal will need
# to have similar code replace it extracted from newbus.
Part of the patch submitted by Peter Pentchev after an excellent
analysis of the underlying problems.
MFC After: 1 week
across frames. Basically, if the current frame is for the
'dblfault_handler' function, then get the next %eip and %ebp values to use
from the original TSS of the thread that has the saved state when the
double fault triggered.
MFC after: 4 days
produced better results for a test program I had here, it didn't
substantially change the number of crashes that I saw. Both the old
code and the new code seemed to produce the same crashes from the usb
layer. Since the new code also solves a close() crash, go with it
until the underlying issues wrt devices going away can be addressed.
it only if we weren't UP before. In some cases xl_init causes long media
re-negotiation, and ppp(8) fails to open PPPoE connection because it sets
IFF_UP every time before opening PPPoE connection.
PR: kern/69133
Patch by: mdodd
Approved by: wpaul, julian (mentor)
MFC after: 1 week
BPFD_LOCK() when removing a descriptor from an interface descriptor
list. Hold both over the operation, and do a better job at
maintaining the invariant that you can't find partially connected
descriptors on an active interface descriptor list.
This appears to close a race that resulted in the kernel performing
a NULL pointer dereference when BPF sessions are detached during
heavy network activity on SMP systems.
RELENG_5 candidate.
to use queue(3) list macros rather than hand-crafted lists. While
here, move to doubly linked lists to eliminate iterating lists in
order to remove entries. This change simplifies and clarifies the
list logic in the BPF descriptor code as a first step towards revising
the locking strategy.
RELENG_5 candidate.
Reviewed by: fenner
(disabled) defid_gen members from u_long to u_int32_t so that alignment
requirements don't cause the structure to become larger than struct fid
on LP64 platforms. This fixes NFS exports of msdos filesystems on at
least amd64.
PR: 71173
Fix a problem in previous: we can't blindly assume that we have
wincnt entries available at the offset the file has been found. If the dos
directory entry is not preceded by appropriate number of long name
entries (happens e.g. when the filesystem is corrupted, or when
the filename complies to DOS rules and doesn't use any long name entry),
we would overwrite random directory entries.
There are still some problems, the whole thing has to be revisited and solved
right.
Submitted by: Xin LI
Fix a panic that occurred when trying to traverse a corrupt msdosfs
filesystem. With this particular corruption, the code in pcbmap()
would compute an offset into an array that was way out of bounds,
so check the bounds before trying to access and return an error if
the offset would be out of bounds.
Submitted by: Xin LI
The reference counts are there to block detach until the sleepers in
read/write/ioctl have gotten out, not to prevent the open device from
going away. Restore the old behavior so that we have a chance to wake
up sleepers when the usb device goes away, so they can properly return
EIO back to the caller when this happens.
Otherwise, we have a guarnateed panic waiting to happen when a device
detaches with an active read channel.
This should be merged to 5 asap.
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.
This is a MT5 candidate.
Reviewed by: marcel
Submitted by: sos (in part)
to avoid ABI changes. It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called. It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.
Due to space constraints ifi_epoch is a time_t rather then a struct
timeval. SNMP would prefer higher precision, but this unlikely to be
useful in practice.
the alignment and boundary constraints are being respected, which
fixes the reported ATA problems with SiI chips.
I consider the busdma implementation worrisome nonetheless. Not
only is there too much MI code duplicated in MD files, there's a
lot of questionable code. I smell a wholesale, cross-platform
overhaul coming...
MT5 candidate.
It can be switched back once 5.3 is tested and released. Also turn on
PREEMPTION as many of the stability problems with it have been fixed.
MT5: 3 days.
would turn off all fans when initializing a zone. However, the HP Omnibook
500 generates a notify saying the zone needs to be re-evaluated whenever
its fan is switched on or off. This produced an infinite loop. Also, note
that running _SCP can generate the same notify.
Since we need to make sure old fan references are turned off when getting
new ones, run acpi_tz_monitor() first. This will turn off any unneeded
fans. Then, check for new settings. After that, run acpi_tz_monitor()
again to turn on/off any fans referenced by the new settings.
Tested by: brooks
returned and then infer the state from calls to _ON/_OFF. This works
around a problem in systems that don't correctly report the state (i.e.
the HP Omnibook 500 reports "on" for its fan always after it has been
turned on once).
field.
Replace three instances of longhaired initialization va_filerev fields.
Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
I was unable to test this as the PAE kernel crashed with a "cannot copy
LDT" before coming up. When this gets a bit more testing, I'll fix the PAE
conf file to allow isp devices.
PR: 59728
loss links, and 1 second appeared to be too small for high latency links.
If we will receive more complaints, we should make this parameter configurable.
PR: kern/69536
Approved by: archie, julian (mentor)
MFC after: 3 days
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...
MFC after: 5 days
and you botch a call to nmount(2).
This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL. The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.
Fix the code to honor the INVARIANT.
Silence on: fs@
sectorsize in order to avoid a lot of checks around various divisions etc.
Enforce the sectorsize being > 0 with a KASSERT on successful open.
Fix scsi_cd.c to return 2k sectors when no media inserted.
the flags field will be improperly initialized resulting in inconsistent
operation (sometimes with Giant, sometimes without, et al).
RELENG_5 candidate.
state test as well as set, or we risk a race between a socket wakeup
and registering for select() or poll() on the socket. This does
increase the cost of the poll operation, but can probably be optimized
some in the future.
This appears to correct poll() "wedges" experienced with X11 on SMP
systems with highly interactive applications, and might affect a plethora
of other select() driven applications.
RELENG_5 candidate.
Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru>
Debugged with help of: dwhite
cd9660_readdir() to return the address of the file's first data block as
the inode number instead of the address of the directory entry, but
neglected to update cd9660_vget_internal() for the new inode numbering
scheme.
Since the NFS server calls VFS_VGET (cd9660_vget()) with inode numbers
returned through VOP_READDIR (cd9660_readdir()) when servicing a READDIRPLUS
request, these two interfaces must agree on the numbering scheme; failure to
do so caused panics and/or bogus information about the entries to be returned
to clients using READDIRPLUS (Solaris, FreeBSD w/ mount -o rdirplus).
PR: 63446
to RS232 bridges, such as the one found in the DeLorme Earthmate USB GPS
receiver (which is the only device currently supported by this driver).
While other USB to serial drivers in the tree rely heavily on ucom, this
one is self-contained. The reason for that is that ucom assumes that
the bridge uses bulk pipes for I/O, while the Cypress parts actually
register as human interface devices and use HID reports for configuration
and I/O.
The driver is not entirely complete: there is no support yet for flow
control, and output doesn't seem to work, though I don't know if that is
because of a bug in the code, or simply because the Earthmate is a read-
only device.
but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week
write and zero-fill faults to run without holding Giant. It is still
possible to disable Giant-free operation by setting debug.mpsafevm to 0 in
loader.conf.
This essentially merges revs 1.143-1.1445 of sys/pci/if_rl.c.
This now marks the interrupt MPSAFE along with making the mutex non-recursive.
Looked over by: bms
In places where we have long delays that doesn't depend on too accurate
timing, use ata_udelay() instead of DELAY() so we dont uselessly spin
the CPU if not nessesary;
It makes no sense in a PAE environment and is impossible to handle correctly.
This case is also never used right now. This should make the iir(4) driver
ready for PAE.
number generator, which was re-seeded via a timeout. Now centralized
randomness/entropy is used, we can garbage collect the timeout and
re-seeding code (which was largely a no-op).
Discussed with: itojun, suz, JINMEI Tatuya < jinmei at isl dot rdc dot toshiba dot co dot jp >
FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c). This
is a possible MT5 candidate.
returning incompletely initialized processes. This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.