masks for files and directories. This should make some
of the Midnight Commander users happy.
Remove an extra ')' in the manual page.
PR: 35699
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> (original version)
Tested by: simon
cpu_switch() where both the old and new threads are passed in as
arguments. Only powerpc uses the old conventions now.
- Update comments in the Alpha swtch.s to reflect KSE changes.
Tested by: obrien, marcel
new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels
for IP over ATM, but also CBR, VBR and ABR. Change the format of the
link layer address to specify the channel characteristics. The old
format is still supported and opens UBR channels.
integer value and then to construct the integer from it. This buffer
was sizeof(int) bytes long, which was fine until the (undocumented) 'g'
modifier for 8-byte integers was introduced. Change this to sizeof(uint64_t).
found only many tv-cards.
We currently use more ore less evil hacks (slow_msp_audio sysctl) to
configure the various variants of these chips in order to have
stereo autodetection work. Nevertheless, this doesn't always work
even though it _should_, according to the specs.
This is, for example, the case for some popular Hauppauge models sold
sold in Germany.
However, the Linux driver always worked for me and others. Looking at
the sourcecode you will find that the linux-driver uses a very much
enhanced approach to program the various msp34xx chipset variants,
which is also found in the specs for these chips.
This is a port of the Linux MSP34xx code, written by Gerd Knorr
<kraxel@bytesex.org>, who agreed to re-release his code under a
BSD license for this port.
A new config option "BKTR_NEW_MSP34XX_DRIVER" is added, which is required
to enable the new driver. Otherwise the old code is used.
The msp34xx.c file is diff-reduced to the linux-driver to make later
modifications easier, thus it doesn't follow style(9) in most cases.
Approved by: roger (committing this, no time to test/review),
keichii (code review)
o Differentiate between CPU family and CPU model. There are multiple
Itanium 2 models and it's nice to differentiate between them.
o Seperately export the CPU family and CPU model with sysctl.
o Merced is the only model in the Itanium family.
o Add Madison to the Itanium 2 family. We already knew about McKinley.
o Print the CPU family between parenthesis, like we do with the i386
CPU class.
My prototype now identifies itself as:
CPU: Merced (800.03-Mhz Itanium)
pluto1 and pluto2 will eventually identify themselves as:
CPU: McKinley (900.00-Mhz Itanium 2)
These are 10/100 only NICs found on the IBM Thinkpad R40E and
G40. These seem to be based on the BCM5705 MAC but with a PHY
that doesn't support 1000Mbps modes.
Submitted by: Igor Sviridov <sia@nest.org>
compare the zone element size (+1 for the byte of linkage) against
UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE.
Add a KASSERT in zone_small_init to make sure that the computed
ipers (items per slab) for the zone is not zero, despite the addition
of the check, just to be sure (this part submitted by: silby)
- UMA_ZONE_VM used to imply BUCKETCACHE. Now it implies
CACHEONLY instead. CACHEONLY is like BUCKETCACHE in the
case of bucket allocations, but in addition to that also ensures that
we don't setup the zone with OFFPAGE slab headers allocated from the
slabzone. This means that we're not allowed to have a UMA_ZONE_VM
zone initialized for large items (zone_large_init) because it would
require the slab headers to be allocated from slabzone, and hence
kmem_map. Some of the zones init'd with UMA_ZONE_VM are so init'd
before kmem_map is suballoc'd from kernel_map, which is why this
change is necessary.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks
From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues. (Bad explanation provided by me.)
From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.
Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)
application could cause a wired page to be freed. In general,
vm_page_hold() should be preferred for ephemeral kernel mappings of pages
borrowed from a user-level address space. (vm_page_wire() should really be
reserved for indefinite duration pinning by the "owner" of the page.)
Discussed with: silby
Submitted by: tegge
psignal()/tdsignal(). The test was historically in psignal(). It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.
Reviewed by: iedowse, jhb
ioctls.
In the particular case of ptrace(), this commit more-or-less reverts
revision 1.53 of sys_process.c, which appears to have been erroneous.
Reviewed by: iedowse, jhb
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.
Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.
to configure this correctly yields many watchdog timeouts even on lightly
loaded machines. This is a common complaint from users with Dell 1750
servers with built-in dual 5704 NICs.
magic from exec_setregs(). In set_mcontext() we now also don't have
to worry that we entered the kernel with more that 512 bytes of
dirty registers on the kernel stack. Note that we cannot make any
assumptions anymore WRT to NaT collection points in exec_setregs(),
so we have to deal with them now.
word between the 8139C+ and the 8169. The 8139C+ has a 'frame alignment
error bit' (bit 27) but the 8169 does not. Rather than simply mark this
bit as reserved, RealTek removed it completely and shifted the remaining
status bits one space to the left. This was causing rl_rxeofcplus()
to misparse the error and checksum bits.
To workaround this, rl_rxeofcplus() now shifts the rxstat word one
bit to the right before testing any of the status bits (but after
the frame length has been extracted).
the time the card is inserted and the time that the card is
configured. This can lead to interrupt storms. The O2Micro suggested
workaround is to route the card function interrupt to IRQ1. It
appears from my testing that this is an acceptable workaround for most
chipsets (there's still some issue with the ricoh chipset).
Also, only look at the NOT_A_CARD bit when the bridge tells us there's
a card present. At least one test caused this to be true after the
card was removed, but the author couldn't recreate it with the
workaround in place. The change is more conservative than the
previous code, but still has the work around that wasn't present in
the older code.
note the existence of the 8169S and 8110S components. (The 8169
is just a MAC, the 8169S and 8110S contain both a MAC and PHY.)
- Properly handle list and buffer addresses as 64-bit. The RX and
TX DMA list addresses should be bus_addr_t's. Added RL_ADDR_HI()
and RL_ADDR_LO() macros to obtain values for writing into chip
registers.
- Set a slightly different TIMERINT value for 8169 NICs for improved
performance.
- Change left out of previous commit log: added some additional
hardware rev codes for other 10/100 chips and for the 8169S/8110S
'rev C' gigE MACs.
BGE_MACSTAT_MI_COMPLETE bit in the MAC status register as a link
change indicator. We turn this bit on now because some of the newer
chips need it, but it usually just means that reading/writing
an MII/GMII register has completed, not that a link change has
occured.
the standard.
1) When the bridge tells us that we have a card that isn't recognized, we
use the force register to force the CV_TEST to run. This test causes the
bridge to re-evaluate the card. Once this re-evaluation process happens,
we get a new interrupt that may say it is ready to process. We try this up
to 20 times. Tests have shown that this appears to correctly reset the
'Unknown card type' problem that I saw on my Sony PCG-505TS.
2) Take a page from OLDCARD and always read the CSC register in the ISR.
Some TI (and it seems maybe Ricoh) chipsets require this to behave
properly. This work around appears to work due to some power management
protocols that were improperly implemented. Maybe it can be removed when
this driver supports the full PME# protocol described in the standards.
3) Minor additional debug printf when debugging is enabled.
4) Minor additional commentary for things that are obvious only after study.
# I'm committing this from my Sony PCG-505TS using shared PCI interrupts
# and NEWCARD, but there are some issues with the Ricoh bridge still, but
# at least now I can boot with the card inserted and have it work.
a reference to the containing object. The purpose of the reference
being to prevent the destruction of the object and an attempt to free
the wired page. (Wired pages can't be freed.) Unfortunately, this
approach does not work. Some operations, like fork(2) that call
vm_object_split(), can move the wired page to a difference object,
thereby making the reference pointless and opening the possibility
of the wired page being freed.
A solution is to use vm_page_hold() in place of vm_page_wire(). Held
pages can be freed. They are moved to a special hold queue until the
hold is released.
Submitted by: tegge
PCG-505BX (for example) has one of those:
wi0: <Intersil Prism3> mem 0xf8000000-0xf8000fff at device 2.0 on pci2
wi0: 802.11 address: 00:02:8a:94:d8:73
wi0: using RF:PRISM3(Mini-PCI)
wi0: Intersil Firmware: Primary (1.1.1), Station (1.5.6)
wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
1) avoid immediately calling bzero() after malloc() by passing M_ZERO
2) do not initialize individual members of the global context to zero
3) remove an unused assignment of ifctx in bootpc_init()
Reviewed by: tegge
has the same product id, but different vendor id. It also appears
that the MELCO's id should be 0x18a instead of 0x8a01. Fix this.
Submitted by: Shizuka Kudo-san
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):
options MROUTING # Multicast routing
options PIM # Protocol Independent Multicast
2. Add support for advanced multicast API setup/configuration and
extensibility.
3. Add support for kernel-level PIM Register encapsulation.
Disabled by default. Can be enabled by the advanced multicast API.
4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
semaphore and doing so can lead to a possible reversal. WITNESS would have
caught this if semaphores were used more often in the kernel.
Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler
and up commands. When configuring the interface down only the
connections that are currently closing are deleted from the connection
table. When the interface is configured up, all connections that
are in the table are re-opened.
connections that have been open (and were not closing) when
the interface was stopped. This makes the behaviour of fatm(4) more like
the behaviour of en(4).
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
_MC_FLAGS_ASYNC_CONTEXT
_MC_FLAGS_HIGHFP_VALID
_MC_FLAGS_KSE_SET_MBOX
_MC_FLAGS_RETURN_VALID
_MC_FLAGS_SCRATCH_VALID
Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
o For trap-based upcalls the argument (the kse_mailbox) to
the UTS must be written onto the kernel stack, not the
user stack. While here, deal with the fact that we may
be at a NaT collection point.
- Fix a bug in rl_dma_map_desc(): set the 'end of ring' bit in the
right descriptor (DESC_CNT - 1, not DESC_CNT). The 8139C+ is limited
to 64 descriptors and automatically wraps at 64 descriptors even
if the EOR bit isn't set, but the 8169 NIC can have up to 1024
descriptors per ring, so we must set the wrap point in the right
place.
- RealTek moved the RL_TIMERINT register from offset 0x54 to 0x58 in
the 8169 -- account for this.
- Added rl_gmii_readreg() and rl_gmii_writereg() routines.
- Fix rl_probe() to deal with the case where the base type is
not RL_8139.
The next step is to add jumbo buffer support.
Tested with the Xterasys XN-152 NIC (hard to beat $29 for a gigE NIC).
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().
While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
(kernel register stack). I don't know what I was smoking then,
but if you unconditionally restore ar.k6, you also want to
compute its value unconditionally. By having the computation
predicated and dependent on whether we return to user mode, we
would end up writing junk (= invalid value for ar.bspstore) if
we would return to kernel mode. But the whole point of the
unconditional restoration was that there is a grey area where
we still need to have ar.k6 restored. If we restore with a junk
value, we would end up wedging the machine on the next interrupt.
So, unconditionally calculate the value we unconditionally write
to ar.k6.
o The previous braino was found while making the following change:
We used to clear the lower 9 bits of the value we write to ar.k6.
The meaning being that we know that the kernel register stack is
at least 512 byte aligned and simply clearing the lower 9 bits
allows us to return to a context of which we don't have dirty
registers on the kernel stack, even though the context that
entered the kernel does have dirty registers on the kernel stack.
By masking-off the lower bits, we correctly obtain the base of
the register stack without having to worry that we didn't actually
reached the base while unwinding it.
The change is to mask off the lower 13 bits, knowing that the
kernel register stack is always 8KB aligned. The advantage is that
we don't have to worry anymore if there's more than 512 bytes of
dirty registers on the kernel stack. A situation that frequently
occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
deal with that situation by copying the active portion of the
register stack down in multiples of 512 bytes. Now that we mask off
the lower 13 bits we don't have to do that at all. Contemporary
IPF processors have a register file that can hold up to 96 stacked
registers (=784 bytes [incl. 2 NaT collections]). With no indication
that register files grow beyond a couple of hundred registers, we
should not have to worry about it anymore... and yes, 640KB is
enough for everybody :-)
This change helps setcontext(2) and cpu_set_upcall_kse() in that
they can return to completely different contexts without having to
mess with the kernel stack. Of course exec_setregs() doesn't need
to do that anymore as well.
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)
(short) types for the port arg of inb() (rev.1.56). The warning started
working for u_short types with gcc-3.3. The pessimizations exposed
by this been fixed except for the cx and oltr drivers where the breakage
of the warning has been pushed to the drivers.
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
This change allows one to specify almost the complete traffic parameters
for IPoverATM channels through the routing table. Up to now we used
4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed.
If the address is longer, however, the 5th byte is interpreted as the
traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the
parameters for this traffic class:
UBR: 0 byte or 3 byte PCR
CBR: 3 byte PCR
VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS
ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM,
1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF
A script to generate the corresponding 'route add' arguments will follow soon.
connection is to be established asynchronously, behave as in the
case of non-blocking mode:
- keep the SS_ISCONNECTING bit set thus indicating that
the connection establishment is in progress, which is the case
(clearing the bit in this case was just a bug);
- return EALREADY, instead of the confusing and unreasonable
EADDRINUSE, upon further connect(2) attempts on this socket
until the connection is established (this also brings our
connect(2) into accord with IEEE Std 1003.1.)
function prototypes. Use LIST_FOREACH instead of explicit loops.
The indentation of functions indendet by 4 space have been left alone.
2-space indented functions have been re-indented.
Eliminate a lot of checkes to make sure requests are not cross-device
which is unnecessary with the new layout. We know a sequential request
cannot possibly be cross-device because there is a reserved page between
the devices.
Remove a couple of comments which no longer are relevant.
i/o ports by calling the implementation-detail level below inb() and
outb() instead of inb() and outb(). Unpessimizing the types is too
hard since they are mainly used in microcode.
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
- Build SGL's for ATA_PASSTHROUGH commands
- Fallback to using the sgl_offset when the opcode is unknown for building
SGL's/
- Add ioctl calls for adding and removing units.
- Define previously undefined AEN's
- Allocate memory for the ioctl payload in multiples of 512bytes.
MFC after: 1 week
need this for swapcontext(), KSE upcalls initiated from ast()
also need to save them so that we properly return the syscall
results after having had a context switch. Note that we don't
use r11 in the kernel. However, the runtime specification has
defined r8-r11 as return registers, so we put r11 in the context
as well. I think deischen@ was trying to tell me that we should
save the return registers before. I just wasn't ready for it :-)
o The EPC syscall code has 2 return registers and 2 frame markers
to save. The first (rp/pfs) belongs to the syscall stub itself.
The second (iip/cfm) belongs to the caller of the syscall stub.
We want to put the second in the context (note that iip and cfm
relate to interrupts. They are only being misused by the syscall
code, but are not part of a regular context).
This way, when the context is switched to again, we return to
the caller of setcontext(2) as one would expect.
o Deal with dirty registers on the kernel stack. The getcontext()
syscall will flush the RSE, so we don't expect any dirty registers
in that case. However, in thread_userret() we also need to save
the context in certain cases. When that happens, we are sure that
there are dirty registers on the kernel stack.
This implementation simply copies the registers, one at a time,
from the kernel stack to the user stack. NAT collections are not
dealt with. Hence we don't preserve NaT bits. A better solution
needs to be found at some later time.
We also don't deal with this in all cases in set_mcontext. No
temporay solution is implemented because it's not a showstopper.
The problem is that we need to ignore the dirty registers and we
automaticly do that for at most 62 registers. When there are more
than 62 dirty registers we have a memory "leak".
This commit is fundamental for KSE support.
sysctl:
- sysctlbyname("net.inet.ip.mfctable", ...)
- sysctlbyname("net.inet.ip.viftable", ...)
This change is needed so netstat can use sysctlbyname() to read
the data from those tables.
Otherwise, in some cases "netstat -g" may fail to report the
multicast forwarding information (e.g., if we run a multicast
router on PicoBSD).
* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast
routing daemon, set properly "im->im_vif" to the receiving
incoming interface of the packet that triggered that upcall
rather than to the expected incoming interface of that packet.
* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"
* Few formatting nits (e.g., replace extra spaces with TABs)
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
code used to call rtrequest(RTM_DELETE, ...). This is a problem, because
the function that just has called us (route_output)
is not really happy with the route it just is creating beeing ripped out
from under it. Unfortunately we also cannot return an error from
ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
control whether to accept RAs per-interface basis.
the new stuff ensures the backward compatibility;
- the kernel does not accept RAs on any interfaces by default.
- since the default value of the flag bit is on, the kernel accepts RAs
on all interfaces when net.inet6.ip6.accept_rtadv is 1.
Obtained from: KAME
MFC after: 1 week
preparation for supporting the OPENVCC and CLOSEVCC ioctls which
are needed for ng_atm. This required some re-organisation of the code
(mostly converting array indexes to pointers). This also gives us
an array of open vccs that will help in using the generic GETVCCS handler.
than i386 or AMD64, TP register points to thread mailbox, and they can not
atomically clear km_curthread in kse mailbox, in this case, thread retrieves
its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread
mailbox to indicate a critical region.
the macro definition, and cause the generation of syntactically
incorrect code that gcc happens to accept.
Reviewed by: schweikh (mentor)
MFC after: 4 weeks
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.
Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.
Spotted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by: mckusick
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.
Thanks to Paul Saab for beating his head against this for a few hours =-)
user space region. Hence, we need to test if 5 is greater than the
region; not greater equal.
This bug caused us to call ast() while interrupting kernel mode.
malloc and mbuf allocation all not requiring Giant.
1) ostat, fstat and nfstat don't need Giant until they call fo_stat.
2) accept can copyin the address length without grabbing Giant.
3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit.
4) move Giant grabbing from each indivitual recv* syscall to recvit.
- Move the enabling of interrupts out of assembly and into C a few
instructions later at cpu_critical_fork_exit(). This puts more of the
MD critical section implementation under the MD critical section API
making it easier to test and develop alternative implementations.
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.
Also change "Auto mode" to use a "special" value
instead of 0, and define and document it.
I had thought libpthread had already been switched to use auto mode but
it appears that patch hasn't been committed yet.
Discussed with: Davidxu
to not get any cross-device I/O requests. (The unallocated first page
protecting BSD labels already gave us this, but that hack may go away
at some point in time).
Remove the check for cross-device I/O requests in swap_pager_strategy.
Move the repeated statistics updating into flushchainbuf().
larger than normal frames, to account for the case where a bge(4) NIC
is used with VLANs. Since we set the IFCAP_VLAN_MTU flag, we must allow
reception of frames up to 1522 bytes in size rather than 1518.
Note that it is possible to work around this bug by doing:
# ifconfig bge0 mtu 1504
prior to configuring any VLAN interfaces.
o Remove alpha specific timer code (mc146818A) and compiled-out
calibration of said timer.
o Remove i386 inherited timer code (i8253) and related acquire and
release functions.
o Move sysbeep() from clock.c to machdep.c and have it return
ENODEV. Console beeps should be implemented using ACPI or if no
such device is described, using the sound driver.
o Move the sysctls related to adjkerntz, disable_rtc_set and
wall_cmos_clock from machdep.c to clock.c, where the variables
are.
o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
bother faking a stathz that's 1/8 of that. Keep it simple: hz
defaults to HZ and stathz equals hz. This is also how it's done
for sparc64.
o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
to calculate ITC skew and corrections. On average, we adjust the
ITC match register once every ~1500 interrupts for a duration of
2 consequtive interruprs. This is to correct the non-deterministic
behaviour of the ITC interrupt (there's a delay between the match
and the raising of the interrupt).
o Add 4 debugging sysctls to monitor clock behaviour. Those are
debug.clock_adjust_edges, debug.clock_adjust_excess,
debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
counts the individual adjustment cycles (when the skew first
crosses the threshold), the second counts the number of times the
adjustment was excessive (any non-zero value is to be considered
a bug), the third counts lost clock interrupts and the last counts
the number of interrupts for which we applied an adjustment
(debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
avarage duration of an individual adjustment -- should be ~2).
While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.
swapbkva. Swapbkva mappings are explicitly managed using pmap_qenter(),
not on-demand by vm_fault(), making kmem_alloc_nofault() more appropriate.
Submitted by: tegge
generate the inode mode from a default ACL and creation mask,
implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode().
Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no
longer need to explicitly pass in a "preserve_mask" field: this
is implicit in the use of POSIX.1e semantics.
Note: this change contains a semantic bugfix for new file creation:
we now intersect the ACL-generated mode and the cmode requested by
the user process. This means permissions on newly created file
objects will now be more conservative. In the future, we may want
to provide alternative semantics (similar to Solaris and Linux) in
which the ACL mask overrides the umask, permitting ACLs to broaden
the rights beyond the requested umask.
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
support routines in kern_acl.c:
- Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the
mode bits that are (and aren't) stored in the ACL.
- Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate
a compatibility mode (only the bits supported by the POSIX.1e ACL).
- acl_posix1e_newfilemode(): Given a requested creation mode and default
ACL, calculate the mode for the new file system object (only the bits
supported by the POSIX.1e ACL).
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
cases:
- Setting sticky bit on non-directory
- Setting setgid on a file with a group that isn't in the effective
or extended groups of the authorizing credential
I.e., test the requirement first, then do the privilege test,
rather than doing the privilege test regardless of the need for
privilege.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
interrupting user mode. The net effect of this bug is that a clock
interrupt does not cause rescheduling and processes are not
preempted. It only takes a "while (1);" to render the machine
useless.
This bug was introduced by the context changes and EPC syscall code.
Handling of ASTs was moved to C for clarity and ease of maintenance,
but was not added for the external interrupt case.
This needs to be revisited. We now have calls to do_ast() in trap(),
break_syscall() and ivt_External_Interrupt(). A single call in
exception_restore covers these 3 places without duplication. This
is where we handled ASTs prior to the overhaul, except that the
meat has been moved to do_ast(), a C function. This was the goal
to begin with.
Pointy hat: marcel
Use ->bio_children to count child buffers, rather than abuse the
bio_caller1 pointer.
Expand the relevant bits of waitchainbuf() inline, this clarifies
the code a little bit.
striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.
A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.
concurrent invocations from acquiring the same address(es). Also, in case
of an incomplete allocation, free any allocated pages.
In collaboration with: tegge
sure that uma_dbg_free() is called if we're about to call
uma_zfree_internal() but we're asking it to skip the dtor and
uma_dbg_free() call itself. So, if we're about to call
uma_zfree_internal() from uma_zfree_arg() and skip == 1, call
uma_dbg_free() ourselves.
EFI file system. When booting from a CD and there's already an EFI
system partition on the disk, setting the current device to unit 0
will select the harddisk. This invariably breaks installing FreeBSD
when other operating systems have been installed before.
We obviously want to do the same when we're booting over the network.
Maybe later.
Based on a patch (from memory) from: arun
there is code that blindly allocates LDTEs starting at slot 6
and I quess it doesn't really matter to us if they overwrite the BSDI
syscall slot, since it isn't a BSDI binary. Also add some code to help track
down other such users (commented out for now).
Reviewed by: deischen@
o correct BSSID setup in ah_writeAssocid for 5211 and 5212 (fixes
reception of broadcast frames after association)
o correct transmit retry counts returned by 5211 in ah_procTxDesc
o add missing regulatory domain support that caused use of 11b channels to be
disallowed with some cards (e.g. mini-pci cards in certain IBM laptops)
o miscellaneous fixes to regulatory domain support
o increase size of 5212 ANI table to avoid overflow
o add monitor mode
o remove OS_QSORT support
o fix handling of HAL_RXDESC_INTREQ in ah_setupRxDesc
o rewrite 5212 descriptor handling for portability
o FreeBSD: track alq_open API change
type. We know about header types 0, 1 and 2. Ignore the rest in the
MD i386 code when we're looking for bridges. You cannot look at the
vendor tag. And if you don't you certainly can't look at function > 0
if the device isn't there.
The new soekris boards' GEODE cpu has issues with the old way. This
is reported to have fixed it.
MFC After: 2 days
considered to be good to try when it otherwise has no clue about which
interrupts to try. This is a band-aide and we really should try to
balance the IRQs that we arbitrarily pick, but it should help some
people that would otherwise get bad IRQs.
recompiling the driver. See the comments near the top of "if_em.h"
for descriptions of these delays. Four new loader tunables control
the system-wide default values:
hw.em.tx_int_delay
hw.em.rx_int_delay
hw.em.tx_abs_int_delay
hw.em.rx_abs_int_delay
The tunables are specified in microseconds. The valid range is
0-67108 usec., and 0 means that the timer is disabled.
There are also four new sysctls (actually, a set of four for each
"em" device in the system) to query and change the interrupt delays
after the system is up:
hw.em0.tx_int_delay
hw.em0.rx_int_delay
hw.em0.tx_abs_int_delay (not present for 82542/3/4 adapters)
hw.em0.rx_abs_int_delay (not present for 82542/3/4 adapters)
It seems to be OK to change these values even while the adapter is
passing traffic.
Approved by: Prafulla Deuskar <pdeuskar@FreeBSD.ORG>
MFC after: 4 weeks
- Move isa/ppc* to sys/dev/ppc (repo-copy)
- Add an attachment method to ppc for puc
- In puc we need to walk the chain of parents.
Still to do, is to make ppc(4) & puc(4) work on other platforms. Testers
wanted.
PR: 38372 (in spirit done differently)
Verified by: Make universe (if I messed up a platform please fix)
mac_mls_subject_equal_ok() to mac_mls_subject_privileged(),
which more consistently reflects the fact that this is really
about our notion of privilege in the MLS policy.
Since we don't use suser() for privilege in MLS, remove
the suser check from the ifnet relabel ioctl, and replace it
with an MLS privilege check.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories