kernel map and object in a manner that contigfree() is actually able to
free. Previously contigfree() freed up the KVA space but could not
unwire & free the underlying VM pages due to mismatched pageability between
the map entry and the VM pages.
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
MFC after: 3 days
would sometimes prevent a dirty page from being cleaned, even when synced,
resulting in the dirty page being re-flushed to disk every 30-60 seconds or
so, forever. The problem is that when the filesystem flushes a page to
its backing file it typically does not clear dirty bits representing areas
of the page that are beyond the file EOF. If the file is also mmap()'d and
a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty
bits to VM_PAGE_BITS_ALL. This combination could leave us with an uncleanable,
unfreeable page.
The solution is to have the vnode_pager detect the edge case and manually
clear the dirty bits representing areas beyond the file EOF. The filesystem
does the rest and the page comes up clean after the write completes.
MFC after: 3 days
instruction. Stefan Keller <dres@earth.serd.org> noticed that CPU
identification was broken when compiled with -O2, and tracked it
down to the asm statement, which was storing values into memory
without specifying that memory was modified. He submitted a patch
which added "memory" as a clobber, but I refined it further to
arrive at this version.
MFC after: 3 days
be set. We need to check isr.w before isr.r so that we can correctly
handle a cmpxchg to a copy-on-write page.
This fixes the hang-after-fork problem for dynamically linked programs.
This stops panics on unloading modules which define their own sysctl sets.
However, this also removes the protection against somebody actually
defining a static sysctl with an oid in the range of the dynamic ones,
which would break badly if there is already a dynamic sysctl with
the requested oid.
Apparently, the algorithm for removing sysctl sets needs a bit more work.
For the present, the panic I introduced only leads to Bad Things (tm).
Submitted by: many users of -current :(
Pointy hat to: roam (myself) for not testing rev. 1.112 enough.
- Add proc locking to the jail() syscall. This mostly involved shuffling
a few things around so that blockable things like malloc and copyin
were performed before acquiring the lock and checking the existing
ucred and then updating the ucred as one "atomic" change under the proc
lock.
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
refcount is > 1 and false (0) otherwise.
C calling conventions. This allows crt1.c to be written nearly without
any inline assembler.
* Initialise cpu_model[] so that the hw.model sysctl works properly.
to do with "dropped packets." Any packets matching rules with the
'log' directive are logged regardless of the action, drop, pass,
divert, pipe, etc.
MFC after: 1 day
Updated by peter following KSE and Giant pushdown.
I've running with this patch for two week with no ill side effects.
PR: kern/12014: Fix SysV Semaphore handling
Submitted by: Peter Jeremy <peter.jeremy@alcatel.com.au>
existing devices (e.g.: tunX). This may need a little more thought.
Create a /dev/netX alias for devices. net0 is reserved.
Allow wiring of net aliases in /boot/device.hints of the form:
hint.net.1.dev="lo0"
hint.net.12.ether="00:a0:c9:c9:9d:63"
for initializing the parts. Since I don't have any of these parts in
any of my working laptops, I'm committing this to allow people to test
it. Will MFC when I receive reports of it working.
a single kern.security.seeotheruids_permitted, describes as:
"Unprivileged processes may see subjects/objects with different real uid"
NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
an API change. kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
domain socket credential instead of comparing root vnodes for the
UDS and the process. This allows multiple jails to share the same
chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().
Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions. This also better-supports
the introduction of additional MAC models.
Reviewed by: ps, billf
Obtained from: TrustedBSD Project
code in ipl.s and icu_ipl.s that used them was removed when the
interrupt thread system was committed. Debuggers also knew about
Xresume* because these labels hide the real names of the interrupt
handlers (Xintr*), and debuggers need to special-case interrupt
handlers to get the interrupt frame.
Both gdb and ddb will now use the Xintr* and Xfastintr* symbols to
detect interrupt frames. Fast interrupt frames were never identified
correctly before, so this fixes the problem of the running stack
frame getting lost in a ddb or gdb trace generated from a fast
interrupt - e.g. when debugging a simple infinite loop in the kernel
using a serial console, the frame containing the loop would never
appear in a gdb or ddb trace.
Reviewed by: jhb, bde
processes to attach debugging to themselves even though the
global kern_unprivileged_procdebug_permitted policy might disallow
this.
o Move the kern_unprivileged_procdebug_permitted check above the
(p1==p2) check.
Reviewed by: des
already does the initialization (though it didn't set pca_initialized, so
we always initialized twice) and since attach calls make_dev(), there's no
way that pcaopen() can be called before pcaattach().
because the IN_RENAME flag only fixes a few of the huge number of race
conditions that can result in the source path becoming invalid even
prior to the VOP_RENAME() call. The panics created a serious security
issue whereby an attacker could fairly easily cause the panic to
occur, crashing the machine.
The correct solution requires a great deal of work in the namei
path cache code.
MFC after: 0 days
Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.
This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().
It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".
the target process was being held locked during the uiomove() call. If the
process calling readdir() was the same as the target process (for instance
'ls /proc/curproc/'), and uiomove() caused a page fault, the result would
be a proc lock recursion. I have no idea how long this has been broken -
possibly ever since pfind() was changed to lock the process it returns.
Also replace the one and only call to procfs_findtextvp() with a direct
test of td->td_proc->p_textvp.
appropriate cache flush that provides MEMORY_BARRIER in between handoffs
between host && RISC processor for the shared memory request/response
queues.
Submitted by: dfr@nlsystems.com
confused. Since sa_sigaction and sa_handler alias each other in a
union, the bug was completely harmless. This had been fixed as part
of the SIGCHLD changes in revision 1.125, but it was reverted when
they were backed out in revision 1.126.
- Flesh out ofw_readin routine.
- Add OpenFirmware load and exec routines.
- Make sure memory allocation for the kernel is done correctly.
- Change the way the heap is allocated so as to make it easier to deallocate
when we hand over.
- Add a command to print memory maps similar to the one for ia64.
With this patch, I can now load and hand over to a kernel on my iMac. There
are some problems with OpenFirmware routines failing after the hand over that
still need to be addressed.
The type of bus_space_tag_t is now a pointer to bus_space_tag structure,
and the bus_space_tag structure saves pointers to functions for direct
access and relocate access.
Added bsh_bam member to the bus_space_handle structure, it saves access
method either direct access or relocate access which is called by
bus_space_* functions.
Added the mecia device support. If the bs_da and bs_ra in bus tag are set
NEPC_io_space_tag and NEPC_mem_space_tag respectively, new bus_space stuff
changes the register of mecia automatically for 16bit access.
Obtained from: NetBSD/pc98
'regression.*'.
o Add 'regression.securelevel_nonmonotonic', conditional on 'options
REGRESSION', which allows the securelevel to be lowered for the purposes
of efficient regression testing of securelevel policy decisions.
Regression tests for securelevels will be committed shortly.
NOTE: 'options REGRESSION' should never be used on production machines, as
it permits violation of system invariants so as to improve the ability to
effectively test edge cases, and improve testing efficiency.
syscalls are of type NODEF but not in a way that fits the given
definition of that type. The exact difference of lkmressys and
lkmnosys is unclear, which makes it all the more confusing. A
reevaluation of what we have and what we really need is in order.
Spotted by: Maxime Henrion <mux@qualys.com>
Pointy hat: marcel
to see if there's an interrupt (avoids PCI parity errors
which can occur on the 2312 if you access some registers
from the host at the same time the RISC on the 2312 is
C accessing them).
MFC after: 1 day
This fixes the panic when receiving a packet with an unknown tag, and
also allows reception of packets with known tags.
- Allow overlapping tag number spaces when using multiple hardware-assisted
VLAN parent devices (by comparing the parent interface in
vlan_input_tag() just as in vlan_input() ).
- fix typo in comment
MFC after: 1 week
to send all its data, especially when the data is less than one MSS.
This fixes an issue where the stack was delaying the sending
of data, eventhough there was enough window to send all the data and
the sending of data was emptying the socket buffer.
Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp)
Submitted by: Jayanth Vijayaraghavan
wait for both read AND write I/O to complete. Only NFS calls vinvalbuf()
on an active vnode (when the server indicates that the file is stale), so
this bug fix only effects NFS clients.
MFC after: 3 days
kern.ipc.showallsockets is set to 0.
Submitted by: billf (with modifications by me)
Inspired by: Dave McKay (aka pm aka Packet Magnet)
Reviewed by: peter
MFC after: 2 weeks
Fix acpi_DeviceIsPresent to check for valid _STA data and to check
the "present" and "functioning" bits.
Use acpi_DeviceIsPresent in acpi_pcib rather than rolling our own
(also broken) version.
been a no-op for as long as our CVS history goes back. Processes in
state SSLEEP could only be counted if p_slptime == 0, but immediately
before loadav() is called, schedcpu() has just incremented p_slptime
on all SSLEEP processes.
missed in the previous commit; a line that exceeded 80 characters. No
functional changes, but the object file's md5 checksum changes because some
lines have been displaced.
1) Allow the sending of more than one control message at a time
over a unix domain socket. This should cover the PR 29499.
2) This requires that unp_{ex,in}ternalize and unp_scan understand
mbufs with more than one control message at a time.
3) Internalize and externalize used to work on the mbuf in-place.
This made life quite complicated and the code for sizeof(int) <
sizeof(file *) could end up doing the wrong thing. The patch always
create a new mbuf/cluster now. This resulted in the change of the
prototype for the domain externalise function.
4) You can now send SCM_TIMESTAMP messages.
5) Always use CMSG_DATA(cm) to determine the start where the data
in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1)
in some places, which gives the wrong alignment on the alpha.
(NetBSD made this fix some time ago).
This results in an ABI change for discriptor passing and creds
passing on the alpha. (Probably on the IA64 and Spare ports too).
6) Fix userland programs to use CMSG_* macros too.
7) Be more careful about freeing mbufs containing (file *)s.
This is made possible by the prototype change of externalise.
PR: 29499
MFC after: 6 weeks
We only support the size of frame we are currently allocating, which
is MCLBYTES - sizeof (struct ether_header) usable, so don't set an
MTU that would go over this.
first thought and may require serious work to the VOP_RENAME() api itself.
Basically, by the time the VOP_RENAME() function is called, it's already
too late.
it now really gets in the way.
This allows us to fix several problems- not least of which was problems
of ordering about when you'd have a device softc for an miibus child
available or not. Move some steps of things around.
Put the ifnet/arpcom structure at the head of the softc (PR 29249).
Don't do tx gc in the interrupt service routine- that seems to make
things a bit more efficient.
Enable jumbo support by default- but this version of 'jumbo' is broken
because it really is just using multiple tfd/rfd's to match a packet,
which will never be > CLSIZE anyway.
This should begin the first steps toward cleaning this driver up.
PR: 29249
MFC after: 1 week
with an ifnet structure (so device_get_softc will get one).
If memory allocation fails in mii_phy_probe, don't just march ahead into
a panic- return ENOMEM.
MFC after: 1 week
will be one for struct proc in stable. those drivers needing to have
cross version portability should use d_thread_t instead of inventing
their own means. Non-drivers, and drivers that either only run on
-current or must look under the covers of the struct proc/thread
should must not use this.
As noted in arch@, this minorly violates style(9), but the sys/conf.h
devsw already violates this and all I'm doing is extending the
violation to ease the burdon on device driver writers. It was judged
that this minor violation, which doesn't impact userland or those
people not using it, was preferable to the alternatives (eg #define
proc thread). C does not allow a way to rename or alias structs
easily, so we fall back to using a typedef.
Bump FreeBSD_version to reflect this change (porters guide to be done
in a separate commit).
in vfs_syscalls.c:
if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
(error = suser_td(td)) != 0) {
unwrap_lots_of_stuff();
return (error);
}
to:
if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
error = suser_td(td);
if (error) {
unwrap_lots_of_stuff();
return (error);
}
}
This makes the code more readable when complex clauses are in use,
and minimizes conflicts for large outstanding patchsets modifying the
kernel authorization code (of which I have several), especially where
existing authorization and context code are combined in the same if()
conditional.
Obtained from: TrustedBSD Project
- When the video BIOS is called to clear the region (x, y)-(79, 24)
(by scrolling), the slashed region in Fig.1 is cleared. CD() is
supposed to clear the region shown in Fig.2.
x x
+-------+ +-------+
| | | |
y| ////| y| ////|
| ////| |///////|
| ////| |///////|
+-------+ +-------+
Fig.1 Fig.2
- Don't move the cursor during this operation.
- Be consistent about placing spaces around keywords and
operators; don't mix statements like "if(A==B)" and "if (X == Y)",
"return(0)" and "return (-1)", "P=10" and "Q = 0", etc.
- Consitently indent lines. It's not good to indent by 8 columns
in one part of the file, and by 4 columns in the other part.
to avoid removing higher level directory vnodes from the namecache has
no perceivable effect and will be removed. This is especially true
when vmiodirenable is turned on, which it is by default now. ( vmiodirenable
makes a huge difference in directory caching ). The vfs.vmiodirenable and
vfs.nameileafonly sysctls have been left in to allow further testing, but
I expect to rip out vfs.nameileafonly soon too.
I have also determined through testing that the real problem with numvnodes
getting too large is due to the VM Page cache preventing the vnode from
being reclaimed. The directory stuff made only a tiny dent relative
to Poul's original code, enough so that some tests succeeded. But tests
with several million small files show that the bigger problem is the VM Page
cache. This will have to be addressed by a future commit.
MFC after: 3 days
YA pseudofs megacommit, part 2:
- Merge the pfs_vnode and pfs_vdata structures, and make the vnode cache
a doubly-linked list. This eliminates the need to walk the list in
pfs_vncache_free().
- Add an exit callout which revokes vnodes associated with the process
that just exited. Since it needs to lock the cache when it does this,
pfs_vncache_mutex needs MTX_RECURSE.
- Add a third callback to the pfs_node structure. This one simply returns
non-zero if the specified requesting process is allowed to access the
specified node for the specified target process. This is used in
addition to the usual permission checks, e.g. when certain files don't
make sense for certain (system) processes.
- Make sure that pfs_lookup() and pfs_readdir() don't yap about files
which aren't pfs_visible(). Also check pfs_visible() before performing
reads and writes, to prevent the kind of races reported in SA-00:77 and
SA-01:55 (fork a child, open /proc/child/ctl, have that child fork a
setuid binary, and assume control of it).
- Add some more trace points.
per-command component that we *don't* try and pass thru CAM. CAM just
is too risky and too much of a pain- structures get copied, but not
all info of interest can be considered safely transported thru all
consumers (including user space) from the incoming ATIO to the outgoing
CTIO- it's just much safer to have a buddy structure, identified by the
command's tag which *does* make it thru safely.
Pay attention to link speed and report 200MB/s xfer speed for a
23XX card in 2GPs mode.
MFC after: 1 week
Handle overlap in bcopy.
Add routines for copying and zeroing pages using physical addresses
directly.
Remove all the hacks to account for calling the firmware on its own
trap table, we use the kernel trap table. There is still a problem
with OF_exit().
- Rearrange the flag constants a little to simplify specifying and testing
for readability and writeability.
pseudofs_vnops.c:
- Track the aforementioned change.
- Add checks to pfs_open() to prevent opening read-only files for writing
or vice versa (pfs_{read,write} would block the actual reads and writes,
but it's still a bug to allow the open() to succeed). Also, return
EOPNOTSUPP if the caller attempts to lock the file.
- Add more trace points.
easier and hopefully this code is done changing radically.
Don't use the mmu tlb register to address the kernel page table, nor
the 8k pointer register. The hardware will do some of the page table
lookup by storing the the base address in an internal register and
calculating the address of the tte in the table. However it is limited
to a 1 meg tsb, which only maps 512 megs. The kernel page table only
has one level, so its easy to just do it by hand, which has the advantage
of supporting abitrary amounts of kvm and only costs a few more instructions.
Increase kvm to 1 gig now that its easy to do so and so we don't waste
most of a 4 meg page.
Fix some traces. Fix more proc locking.
Call tsb_stte_promote if we get a soft fault on a mapping in the upper
levels of the tsb. If there is an invalid or unreferenced mapping
in the primary tsb, it will be replaced.
Immediately fail for faults occuring in {f,s}uswintr.
one 4 meg page can map both the kernel and the openfirmware mappings.
Add the openfirmware mappings to the kernel tsb so we can call the firmware
on the kernel trap table and access kernel memory normally.
Implement pmap_swapout_proc, pmap_swapin_proc, pmap_swapout_thread,
pmap_swapin_thread, pmap_activate, pmap_page_exists, and pmap_phys_address.
Add a guard page at the bottom of the kernel stack. Its unclear how easy
it will be to detect these faults and do something useful.
Setup the registers on exec how the c runtime expects.
Implement various {fill,set}_*regs.
Fix proc locking.
when I changed the allocator bits. This implements per-CPU mbtypes
stats by keeping net number of decrements/increments of a given mbtype
per-CPU and then summing all of the per-CPU mbtypes to produce the total
net number of allocated mbufs of the given mbtype.
Counters are carefully balanced to avoid/prevent underflows/overflows.
mbtypes stats are re-enabled with the idea that we may occasionally
(although very rarely) observe slight inconsistencies in the stat
reporting. Most of the time, we should be fine, though.
Also make appropriate modifications to netstat(1) and systat(1) to do
the necessary reporting.
Submitted by: Jiangyi Liu <jyliu@163.net>
built without support for miibus PHYs. Most ed cards don't need
miibus support, so it's useful to be able to avoid the bloat of
all the mii devices for small fixed-purpose kernels.