were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a "copy" operation. This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain. This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.
These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block. This
introduces an incompatibility with openbsd which we may want to revisit.
Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them. We may want to support this;
for now we watch for it with an assert.
Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.
Supported by: Vernier Networks
Reviewed by: Robert Watson <rwatson@FreeBSD.org>
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of
order / reverse-time-indexed packet should be acknowledged as specified
in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994)
simply acknowledged the segment unconditionally, which is incorrect, and
was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN
in addition to (tlen != 0), which may or may not be correct, but the
worst that ought to happen should be a retry by the sender.
do this avoid m_getcl so we can copy the packet header to a clean mbuf
before adding the cluster
o move an assert to the right place
Supported by: Vernier Networks
- Add a mtx_destroy() to vm_object_collapse(). (This allows a bzero()
to migrate from _vm_object_allocate() to vm_object_zinit(), where it
will be performed less often.)
__acl_get_link(), __acl_set_link(), acl_delete_link(), and
__acl_aclcheck_link(), with almost identical implementations to
the existing __acl_*_file() variants on these calls. Update
copyright.
Obtained from: TrustedBSD Project
__acl_get_link() Retrieve an ACL by name without following
symbolic links.
__acl_set_link() Set an ACL by name without following
symbolic links.
__acl_delete_link() Delete an ACL by name without following
symbolic links.
__acl_aclcheck_link() Check an ACL against a file by name without
following symbolic links.
These calls are similar in spirit to lstat(), lchown(), lchmod(), etc,
and will be used under similar circumstances.
Obtained from: TrustedBSD Project
work. The interface was gleaned from the Linux driver. Currently only
one RX & one TX buffer are used. Firmware support is not tested so for the
MPI-350 so it is disabled. Signal cache and monitor mode are not supported
yet. Signal cache is not supported since in encapsulation mode ethernet
frames are returned by the chip. LAN monitor mode support will be added
shortly. Thanks to Warner for the MPI-350 card he sent me.
Add support for RSSI map from PR kern/32880 which was incomplete. Enhanced
with the ability to select the cache mode of raw, dbm or per-cent.
Clean up Signal/Noise/Quality structures and units with help from
Marco Molteni.
Change flash to use a malloc'ed buffer when needed.
PR: kern/32880
Submitted by: Douglas S. J. De Couto decouto@pdos.lcs.mit.edu,
Marco Molteni
MFC: 3 weeks
call is in progress on the vnode. When vput() or vrele() sees a
1->0 reference count transition, it now return without any further
action if this flag is set. This flag is necessary to avoid recursion
into VOP_INACTIVE if the filesystem inactive routine causes the
reference count to increase and then drop back to zero. It is also
used to guarantee that an unlocked vnode will not be recycled while
blocked in VOP_INACTIVE().
There are at least two cases where the recursion can occur: one is
that the softupdates code called by ufs_inactive() via ffs_truncate()
can call vput() on the vnode. This has been reported by many people
as "lockmgr: draining against myself" panics. The other case is
that nfs_inactive() can call vget() and then vrele() on the vnode
to clean up a sillyrename file.
Reviewed by: mckusick (an older version of the patch)
MUTEX_PROFILING is in opt_global.h, so this does not introduce a risk of
variant structure sizes unless foreign kernel modules are used.
This saved 16 bytes per vnode and 16 bytes per vm object for a total of
4MB on a 2GB machine.
Idea from: alc
to treat desiredvnodes much more like a limit than as a vague concept.
On a 2GB RAM machine where desired vnodes is 130k, we run out of
kmem_map space when we hit about 190k vnodes.
If we wake up the vnode washer in getnewvnode(), sleep until it is done,
so that it has a chance to offer us a washed vnode. If we don't sleep
here we'll just race ahead and allocate yet a vnode which will never
get freed.
In the vnodewasher, instead of doing 10 vnodes per mountpoint per
rotation, do 10% of the vnodes distributed evenly across the
mountpoints.
here. It manifests itself by sendmail hanging in "fifoow" during
boot on a diskless machine with sendmail disabled.
Giving the sleep a 1sec timout breaks the deadlock, but does not solve
the underlying problem.
XXX comment applied.
- Restore %g6 and %g7 for kernel traps if we are returning to prom code.
This allows complex traps (ones that call into C code) to be handled from
the prom.
files which might be included together.
Things like debuggers and lint-like programs get their knickers in
a twist (rightly so one might add) when they find different locations
for the same named struct depending on which .h file were included
first.
This is a stellar example of Very Bad Thinking on the part of the
standards dudes who wrote that both sys/uio.h and sys/socket.h
should define struct iovec the same way.
Fix this by putting struct iovec into its own miniature sys/_iovec.h
file and #include that from sys/socket.h and sys/uio.h.
Sensible people could just put iovec into sys/_types.h but there
is probably some standard or other which will be violated if we
did something that horrible.
comes along and flushes a file which has been mmap()'d SHARED/RW, with
dirty pages, it was flushing the underlying VM object asynchronously,
resulting in thousands of 8K writes. With this change the VM Object flushing
code will cluster dirty pages in 64K blocks.
Note that until the low memory deadlock issue is reviewed, it is not safe
to allow the pageout daemon to use this feature. Forced pageouts still
use fs block size'd ops for the moment.
MFC after: 3 days
than hard-coded uids and gids.
Switch the device to a group of wheel instead of operator.
Narrow down the permissions on the device to require root privilege
to manipulate the system power state. It may be that we can broaden
access to the device after review of the access control in ACPI.
Submitted by: kris
Reviewed by: takawata
(show thread {address})
Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.
All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().
kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.
Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.
be used for zones that allocate objects of less 1 page. The biggest advantage
of this is that all of a sudden the majority of kernel malloc-ed data doesn't
need kva allocated for it. Besides microbenchmarks I haven't seen a measurable
performance improvement from doing this.
with make_dev(). Use OPERATOR instead of implicit WHEEL to match
other storage devices. Use a mode of 0640 to be consistent
with other storage devices.
Submitted by: kris
Reviewed by: scottl
in network byte order, but icmp_error() expects the IP header to
be in host order and the code here did not perform the necessary
swapping for the bridged case. This bug causes an "icmp_error: bad
length" panic when certain length IP packets (e.g. ip_len == 0x100)
are rejected by the firewall with an ICMP response.
MFC after: 3 days
revision 1.62. It was checking for M_PREPEND() failing, not for the
case of a NULL mbuf pointer being supplied to the macro. Back out
that revision, and fix the NULL dereference by not calling EH_RESTORE()
in the case where the mbuf pointer is NULL because the firewall
rejected the packet.
Remove the setgid bit from the tga device (?).
Synchronize mode with modes used for related frame buffer devices
in MAKEDEV (tga doesn't appear in MAKEDEV).
Submitted by: kris
improves protection consistency with other storage devices (generally
root:operator,660). This driver appears not to have an active
maintainer.
Submitted by: kris
Remove the malloctype from the ufs mount structure, instead add a callback
to the storage method for freeing inodes: UFS_IFREE().
Add vfs_ifree() method function which frees an inode.
Unvariablelize the malloc type used for allocating inodes.
command in case this setting was not saved. Since bandwidth reclamation
(-current only) often results in bus activity continuing to the end
of every frame, most transfers would fail with IOERROR if this
setting is missed.
Reviewed by: n_hibma
MFC after: 1 week
- Fix permission of device node.
fwochi.c, fwohcireg.h
- Detect phy access failure correct way.
- Set root hold-off bit before initiating bus reset.
This should fix the problem with VIA6306.
fwohcivar.h
- Fix over-allocation of array. (fwohcivar.h)
sbp.c
- Return CAM_DEV_NOT_THERE rather than CAM_TID_INVALID to prevent retry.
when julian@ added it, but the commented out code had at least
one bug -- not freeing the allocated mbuf.
Anyway, this comment no longer applies as of revision 1.67, so
remove it.
the entry being removed (ret_nrt != NULL), increment the entry's
rt_refcnt like we do it for RTM_ADD and RTM_RESOLVE, rather than
messing around with 1->0 transitions for rtfree() all over.
useful for accessing more than 1 page of contiguous physical memory, and
to use 4mb tlb entries instead of 8k. This requires that the system only
use the direct mapped addresses when they have the same virtual colour as
all other mappings of the same page, instead of being able to choose the
colour and cachability of the mapping.
- Adapt the physical page copying and zeroing functions to account for not
being able to choose the colour or cachability of the direct mapped
address. This adds a lot more cases to handle. Basically when a page has
a different colour than its direct mapped address we have a choice between
bypassing the data cache and using physical addresses directly, which
requires a cache flush, or mapping it at the right colour, which requires
a tlb flush. For now we choose to map the page and do the tlb flush.
This will allows the direct mapped addresses to be used for more things
that don't require normal pmap handling, including mapping the vm_page
structures, the message buffer, temporary mappings for crash dumps, and will
provide greater benefit for implementing uma_small_alloc, due to the much
greater tlb coverage.
- Added support for ITR (interrupt throttle register). This feature is available on
adapters based on 82545 and above
- Fixed problem with vlan support when traffic has priority bits set. (kern/45907)
PR: kern/45907
MFC after: 1 week
to current leaves because function may vanish the current node.
If parent RTA_GENMASK route has a clone (a "cloning clone"), an
rn_walktree_from() starting from parent will cause another walk
starting from clone. If a function is either rt_fixdelete() or
rt_fixchange(), this recursive walk may vanish the leaf that is
remembered by an outer walk (the "next leaf" above), panicing a
system when it resumes with an outer walk.
The following script paniced my single-user mode booted system:
: sysctl net.inet.ip.forwarding=1
: ipfw add 1 allow ip from any to any
: ifconfig lo0 127.1
: route add -net 10 -genmask 255.255.255.0 127.1
: telnet 10.1 # rt_fixchange() panic
: telnet 10.2
: telnet 10.1
: route delete -net 10 # rt_fixdelete() panic
For the time being, avoid these races by disallowing recursive
walks in rt_fixchange() and rt_fixdelete().
Also, make a slight optimization in the rtrequest(RTM_RESOLVE)
case: there is no reason to call rt_fixchange() in this case.
PR: kern/37606
MFC after: 5 days
I/O port range, then we should ignore a resource if it's NOT
a memory range AND NOT an I/O port range.
The OR in the condition caused us to ignore perfectly valid
memory addresses.
While here, remove redundant parenthesis and reindent the
debug print to avoid long lines.
- Put the kernel tsb before before the kernel load address, below
VM_MIN_KERNEL_ADDRESS, instead of after the kernel where it consumes
usable kva. This is magic mapped so the virtual address is irrelevant,
it just needs to be out of the way.
a mapping belongs to by setting it in the vm_page_t structure that backs
the tsb page that the tte for a mapping is in. This allows the pmap that
a mapping belongs to to be found without keeping a pointer to it in the
tte itself.
- Remove the pmap pointer from struct tte and use the space to make the
tte pv lists doubly linked (TAILQs), like on other architectures. This
makes entering or removing a mapping O(1) instead of O(n) where n is the
number of pmaps a page is mapped by (including kernel_pmap).
- Use atomic ops for setting and clearing bits in the ttes, now that they
return the old value and can be easily used for this purpose.
- Use __builtin_memset for zeroing ttes instead of bzero, so that gcc will
inline it (4 inline stores using %g0 instead of a function call).
- Initially set the virtual colour for all the vm_page_ts to be equal to their
physical colour. This will be more useful once uma_small_alloc is
implemented, but basically pages with virtual colour equal to phsyical
colour are easier to handle at the pmap level because they can be safely
accessed through cachable direct virtual to physical mappings with that
colour, without fear of causing illegal dcache aliases.
In total these changes give a minor performance improvement, about 1%
reduction in system time during buildworld.
identify themselves as serial cards that it would be desirable to
attach a different driver than sio to. Since we are claiming all
serial cards, this is not possible. Instead, return -100 to indicate
that we're willing to take the card, but still allow other drivers to
attach.
Pointed out by: Maksim Yevmenkin
associated with the syncache entry: in case tcp_close() has been
called on the corresponding listening socket, the lock has been
destroyed as a side effect of in_pcbdetach(), causing a panic when
we attempt to lock on it.
Reviewed by: hsu
The duplication is caused by the fact that imgact_elf.c is included
by both imgact_elf32.c and imgact_elf64.c and both are compiled by
default on ia64. Consequently, we have two seperate copies of the
elf_legacy_coredump variable due to them being declared static, and
two entries for the same sysctl in the linker set, both referencing
the unique copy of the elf_legacy_coredump variable. Since the second
sysctl cannot be registered, one of the elf_legacy_coredump variables
can not be tuned (if ordering still holds, it's the ELF64 related one).
The only solution is to create two different sysctl variables, just
like the elf<32|64>_trace sysctl variables. This unfortunately is an
(user) interface change, but unavoidable. Thus, on ELF32 platforms
the sysctl variable is called elf32_legacy_coredump and on ELF64
platforms it is called elf64_legacy_coredump. Platforms that have
both ELF formats have both sysctl variables.
These variables should probably be retired sooner rather than later.
not < the size of the device. This avoids geom complaints.
Fix a serious bug in the handling of the RS_NO_CLEAR_UA quirk. When we
go and insert the test-unit-ready command the umass_cam_quirk_cb() function
sets the status as if the READ_CAPACITY command suceeded when, in fact, it
did not. This leads to the CAM layer trying to use garbage in the return
buffer and panicing the system (or doing other bad things).
Add a quirk entry for MSYSTEMS DISK-ON-KEY, which is sold under the Sony
brand as a solid state disk-on-key usb device. This device requires
several quirks to work properly.
Note that the disk-on-key device will not work properly until CAM also
gets a quirk entry for it, which has been submitted to the CAM maintainer,
and you may have to temporarily uncomment the DELAY() as well. -current
does not properly wait for devices to power up so you may also have
to temporarily uncomment the DELAY(300000) to make your device work.
A solution must be found to that issue.
MFC after: 3 days
X-MFC note: the quirk support must MFCd before this patch can be
the dataphysend calculation could only possibly work if the virtual buffer
is also physically contiguous. Calculate dataphysend by calculating the
ending virtual address first, then converting to a physical address.
The second bug applies only to NetBSD and OpenBSD and involves the curlen
calculation in the two-contiguous-physical-pages case (which we don't support).
Also cleanup the use of the OHIC_PAGE() macro on dataphysend and add a panic
if len goes negative (meaning we lost the physical page translation
representing the end of the buffer).
IMHO the dataphysend is still bokered since it might be misrepresented
by shared userland page mappings. The whole section needs to be rewritten
to use the virtual address range.
MFC after: 3 days
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed
to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
flag (and only one of those two).
Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)
Note that this is still the wrong type, but we are not ready to break the
ABI; this change simply allows programs which specify a strict SUSv3
namespace to compile. (They may still have semantic errors, since SUSv3
specifies correct types.)
Make sure sector zero is protected if it contains metadata.
Lower WARNS for gbde to 3 on non-i386 archs. rijndael-fst is evil
but appearntly does the right thing and passes the test-vectors.
MFC Candidate.
for request sizes larger than the sectorsize or for multi-key setups.
See warning mailed to current@ for details of recovery.
Found by: Marcus Reid <marcus@blazingdot.com>
o Add typedefs for pid_t, time_t, size_t and ssize_t.
o Hide struct mymsg and msgsys() in the standards case.
o Add some comments about conformance bugs.
o Sort prototypes.
mbuf for a packet looping back to provide alignment guarantees for
KAME. Unfortunately, this code performs a direct copy of the header
rather than using a header copying primitive (largely because we have
sucky header copying primitives). This results in a multiple free
of the MAC label in the header when the same label data is freed
twice when the two mbufs with that header are freed. As a temporary
work-around, clear the initialized flag on the label to prevent the
duplicate free, which prevents panics on large unaligned loopback
IP and IPv6 data. The real fix is to improve and make use of proper
packet header copying routines here.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
of the `machdep.acpi_root' sysctl. This is required on ia64
because the root pointer hardly ever, if at all, lives in the
first MB of memory and also because scanning the first MB of
memory can cause machine checks.
This provides a save and reliable way for ACPI tools to work
with the tables if ACPI support is present in the kernel. On
ia64 ACPI is non-optional.
the last second before the commit.
# likely we can remove this hack now that gcc generates better aligned code
# in the align to word case.
Noticed by: bde
subset of Peter's patchs that are believed to be safe.
Makefile tweaks:
o -fomit-frame-pointer
o Change default to building both UFS1 and UFS2 bootblocks.
Lots of boot2 tweaks:
o lookup is only ever called with kname, so use it directly.
o inline memsize
o getstr are only ever called with cmd, so hardware that.
o tweaks to the parsing code to test after the conversion rather than
before since we tested after anyways.
o eliminate support for %x in printf.
o eliminate a few bytes in printfs.
o Tweak the boot banner.
o eliminate support for wd and " " devices (I might add wd back to
keep bde happy).
o eliminate support for a few arguments.
This takes us from -162 bytes free to 67 bytes free.
I've tested this only on a few systems, so be careful when updating to
this change.
Submitted by: peter, imp, ian
end up with a dump offset that's smaller than the start of the
dump device and either clobber data in preceding partitions or
try to write beyond the end of the medium (unsigned wrap).
Implement legacy behaviour to never write to the first 64KB as
that is where metadata (ie disklabels) may reside.
it possible to use this driver under ia64, sparc64 (though
there may be endianness issues with this one) and other archs.
Tested on: i386, alpha (gallatin)
skipping read-only pages, which can result in valuable non-text-related
data not getting dumped, the ELF loader and the dynamic loader now mark
read-only text pages NOCORE and the coredump code only checks (primarily) for
complete inaccessibility of the page or NOCORE being set.
Certain applications which map large amounts of read-only data will
produce much larger cores. A new sysctl has been added,
debug.elf_legacy_coredump, which will revert to the old behavior.
This commit represents collaborative work by all parties involved.
The PR contains a program demonstrating the problem.
PR: kern/45994
Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org>
Reviewed by: jdp, dillon
MFC after: 7 days
But for some reason the block size is different when a different type of
tape is placed in the drive. This commit fixes that.
PR: 46209
Submitted by: Alex Wang <alex@alexwang.com>
Approved by: mjacob
_KERNEL scope from "src/sys/sys/mchain.h".
Replace each occurrence of the above in _KERNEL scope with the
appropriate macro from the set of hto(be|le)(16|32|64) and
(be|le)toh(16|32|64) from "src/sys/sys/endian.h".
Tested by: tjr
Requested by: comment marked with XXX
prototype for trm_detach and freeing all resources.
While I'm there, handle better errors in trm_attach and remove the
PCI_BASE_ADDR0 definition, since it's what PCIR_MAPS is used for.
MFC after: 3 days
so as to work correctly on 64-bit platforms.
Reported-by: Jake Burkholder <jake@locore.ca>
Sponsored by: DARPA & NAI Labs.
Approved by: Ian Dowse <iedowse@maths.tcd.ie>
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits. The rest of the structure
is cleaned up when it is reaped. But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.
This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'. The normal reference count is decremented on exit
and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the
process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.
MFC after: 3 weeks
Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.