Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.
The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).
The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.
For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.
For a.out, we use the old linker_set struct.
NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.
The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.
linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.
Reviewed by: eivind
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.
Following changed was made by previous commit:
- add a pointer to struct mauxtag. two integer was too restrictive.
- add m_aux_{add,find}2.
- make sure to nuke mbuf pointed to m_aux.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.
TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.
Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
. remove stale comments and a stale #define (from the old days of ft(4))
. make MAX_SEC_SIZE (used in isa_dmainit()) a #define
. fix a typo in a string
. use 0 as the blocksize in devstat_add_entry(), since the actual blocksize
is unknown (devstat(9) suggests to use 0 in that case)
out nearly every platform but the one I tested on requires the intpin
to swizzle out the correct intline.
tested by: Martijn Pronk <mpkisbkl@xs4all.nl> (lca_pci)
page of the image to load section headers and if we let the text section
start at zero, it corrupts the section table when its loaded. With this
change, the loader gets as far as the 'ok' prompt.
processes a little earlier to avoid a deadlock. Second, when calculating
the 'largest process' do not just count RSS. Instead count the RSS + SWAP
used by the process. Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.
Shamed into fixing this by: ps
greatly improved traceback code from Ross Harvey. This code
requires the use of more traceback friendly temporary labels
at kernel entry points, hence the changes to exception.s and
asm.h
Reviewed by: jhb, dfr
Obtained from: NetBSD
MFC after: 1 week
higher chips. Treat it as if it were a 113x. This is correct as far
as 16-bit cards go, at least how we're using it.
# It appears that my TI-1031 based pci card that YAMAMOTO shigeru-san gave
# me on my trip to Japan now works.
around, use a common function for looking up and extracting the tunables
from the kernel environment. This saves duplicating the same function
over and over again. This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
use of the extsts field in DMA descriptors. We need this to tell the chip
to calculate TCP/IP checksums in hardware on a per-packet basis.
- Fix the unions in DMA descriptor structures. Breakage on alpha led
me to realize I'd done it wrong the first time.
that device add/remove will work without usbd running. usbd is still
used for execing stuff, but that is all now. Ideally it could be replaced
by a devd some day. Until now, usbd had to be running so that the
USB_DISCOVER ioctl could be called to walk the tree when an attachment
status change was noticed.
Among the changes:
- when a detach happens, remove any pending 'attach' messages or the system
suffers from whiplash from exec moused / kill moused loops if you do lots
of attach/detach and later start usbd.
- tweaks related to kthread differences
- disable the select handler for the old interface (never return success).
I have not removed it yet or old usbd's will abort. That can get removed
later once usbd is cleaned up and things have stabilized for a few weeks.
- get Giant in the kthread.
- a couple of minor potential bug fixes (usb_nevents vs malloc failure etc)
Pre-approved by: n_hibma (ages and ages ago)
One way we can reduce the amount of traffic we send in response to a SYN
flood is to eliminate the RST we send when removing a connection from
the listen queue. Since we are being flooded, we can assume that the
majority of connections in the queue are bogus. Our RST is unwanted
by these hosts, just as our SYN-ACK was. Genuine connection attempts
will result in hosts responding to our SYN-ACK with an ACK packet. We
will automatically return a RST response to their ACK when it gets to us
if the connection has been dropped, so the early RST doesn't serve the
genuine class of connections much. In summary, we can reduce the number
of packets we send by a factor of two without any loss in functionality
by ensuring that RST packets are not sent when dropping a connection
from the listen queue.
Submitted by: Mike Silbersack <silby@silby.com>
Reviewed by: jesper
MFC after: 2 weeks
not behaving correctly. Fix by attaching to the correct socket.
Also call so{rw}wakeup in addition to the fifo wakeup, so that any
kqfilters attached to the socket buffer get poked.
is usually (always?) used in expressions like (KTR_COMPILE & KTR_FOO).
Defining it as KTR_INTR|KTR_PROC gave the wrong value in approximately
8497 places according to error output for compiling LINT.
ep driver. The rest of the patch will wait until I can put the time
into it to get it righter than the kludge it is.
This protects us against card eject problems at all times,e xecpt when
we're in the epintr ISR.
PCN_BCR_CLRBIT(sc, PCN_BCR_MIICTL, PCN_MIICTL_DANAS);
should be:
PCN_BCR_SETBIT(sc, PCN_BCR_MIICTL, PCN_MIICTL_DANAS);
Turning this bit on is what disables MII autoneg, not turning it off.
Without this, manually setting the media doesn't work.
Noticed by: Jim Browne <jbrowne@jbrowne.com>
true. This permits better interoperability with programs which register
filters on their stdin/stdout handles.
Submitted by: Niels Provos <provos@citi.umich.edu>
I think bde even reviewed it once.
Also, change the name of ActionTEC pat to more generic Lucent Kermit
chip. Add stub for Xircom card. Add cardbus attachment too.
compliant. All the variable definitions and function names are
reasonably consistent, and the functions which should be static (i.e.,
all of them) are. Other assorted fixes were made. The majority of
the delta is indentation fixes.
Partially reviewed by: bde
incorrect due to a missing check for some dependency. This change
avoids the freelist corruption (but not the temporarily inconsistent
state of the file system).
A message is printed as a reminder of the under lying problem when a
pagedep structure is not freed due to the NEWBLOCK flag being set.
Submitted by: Tor.Egge@fast.no
can be made userland-visible as <dev/ic/...>. Also, those files are
not supposed to contain any bus-specific details at all, so placing
them under .../isa/ has been a misnomer from the beginning.
The files in src/sys/dev/ic/ have been repo-copied from their old
location (this commit is a forced null commit there to record this
message).
elected to do this in the probe rather than the attach so that we don't
disturb things which this might reset. different cards have different
quirks, according to their datasheets.
This should fix the "I booted in windows and rebooted to FreeBSD and
now things don't work" problem.
PR: 4847, 20670
route in ifa_ifwithroute(), as the last resort, look up the route to
the gateway, not destination (to derive the interface from).
PR: kern/27852
Submitted by: Iasen Kostoff <tbyte@tbyte.org>
MFC after: 2 weeks
card bus bridges.
We now always use pci interrupts for pci cards. This will allow us to
more easily configure things. You must change your IRQ lines in
/etc/pccard.conf to match what we've probed. I'm not sure the right
way to deal with this right now.
Development of pci pcmcia has been funded by Monzoon Networks AG. I
am grateful for their generosity.
certain cases, and a close() by another process could potentially rip the
pipe out from under the (blocked) locking operation.
Reported-by: Alexander Viro <viro@math.psu.edu>
for card change interrupts is different than the pci stuff that's
coming soon. Set the management irq in different ways. If
pci_parallel interrutp routing, then use the PCI way of getting
interrupts. Move polling mode into pcic_isa since when we're routing
via pci polling doesn't work because many bridges (systems hang solid).
If we're routing interrupts via pci, they can be shared, so flag them
as such.
Note, this doesn't actually change anything since the pci attachment
isn't quite ready to be committed.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we
prevent this situation.
This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to 200,
as the IPv6 case, this should be sufficient for most
systmes, but you might want to increase it if you have
lots of TCP sessions.
I'm working on making the default value dependent on
nmbclusters.
If you want old behaviour (no upper limit) set this sysctl
to a negative value.
If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero).
Obtained from: NetBSD
MFC after: 1 week
if_up() must be called at splnet or higher.
Second, set the IFF_RUNNING flag on an interface after its
resources (i.e. tunnel source and destination addresses)
have been set. Note that we don't set IFF_UP because it is
if_up()'s job to do that.
PR: kern/27851
Submitted by: Horacio J. PeÓa <horape@compendium.com.ar>
The 3C509-TX card apparently had a slightly different version of the
chip, and has problems when this register is set. The problem does
not appear on the 3C509{BC} cards, but since only the fxp driver needs
specific bits set, conditionalize on that.
canonical: define a versioned struct xswdev, and add a sysctl node
handler that allows the user to get this structure for a certain device
index by specifying this index as last element of the MIB.
This new node handler, vm.swap_info, replaces the old vm.nswapdev
and vm.swapdevX.* (where X was the index) sysctls.
memory I/O space. Otherwise, our resource allocation system might
mistakenly assign pccard, plug and play devices or other things
addresses that conflict with ROMs.
I cleaned up his code a little from the submited driver: style(9)
issues, commentary on why something that looks incorrect really is
correct. Also noted that while a checksum field is defined for the
ROMs, enough common hardware neglects it to make it not worthwhile
checking.
Submitted by: Nikolai Saoukh <nms@otdel-1.org>
PR: 22078
the interface conversion to platform.pci_intr_route(). I've left the
platform.pci_intr_route() function pointer in place, as well as
alpha_pci_route_interrupt(), but no platform currently implements it.
To work around the removal of alpha_platform_assign_pciintr(cfg);
from the pci probe code, I've hooked in calls to platform.pci_intr_map()
in pcib_read_config (similar to the x86 APIC_IO ifdef in pci_cfgregread)
for every chipset that has a platform which needs it.
While here, I've removed the interupt mapping/routing code from the
AS2x00 platform because its not required (it has never been present in
-stable).
Tested on: UP1000, Miata(GL), XP1000, AS2100, AS500
Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.
It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).
The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.
Reviewed by: phk
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs
Requested by: bde
Reviewed by: bde (some time ago)
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.
Reviewed by: -net
Obtained from: OpenBSD
setting the 'max packet size' register in window 3. This only
works for cards based on the cyclone or newer chipsets (i.e. it
won't work with the original 3c905/boomerang cards).
There is a trick which will work with the boomerang, which is to turn
on the 'large packets ok' bit in the MAC control register, however this
lets the chip accept any frame up to 4K in length, which is larger than
the mbuf cluster buffers we use to receive frames. If somebody sends us
such a frame and the chip DMAs it to us, it could write past the end
of the cluster buffer and clobber something.
PR: kern/27742
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we
prevent this situation.
This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4
If you want old behaviour (no upper limit) set this sysctl
to a negative value.
If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero)
Obtained from: NetBSD (partially)
MFC after: 1 week
all alphas with devices behind ppb's. I'm working on a better solution now.
Note that all alphas that use per-platform interrupt mapping are broken
again (as they have been for several months)
gigabit ethernet controller chip. This device is used on some
fiber optic gigE cards from SMC, D-Link and Addtron. Jumbograms and
TCP/IP checksum offload on receive are supported. Hardware VLAN
filtering is not, because it doesn't play well with our existing
VLAN code. Also add manual page.
There is a 4.x version of this driver available at
http://www.freebsd.org/~wpaul/Level1/4.x if anyone feels adventurous
and wants to test it. I still need to do performance testing and
tuning with this device.
(For my next trick, I will make the 3Com 3cR990 sit up and beg.)
any response to our third SYN to work-around some broken
terminal servers (most of which have hopefully been retired)
that have bad VJ header compression code which trashes TCP
segments containing unknown-to-them TCP options.
PR: kern/1689
Submitted by: jesper
Reviewed by: wollman
MFC after: 2 weeks
this works on cs4630 chips, and should implement the clkrun hack for
thinkpads- this will display diagnostic messages when triggered until its
correctness is established.
from cpu_switch(), curproc has been changed, but the sched_lock owner will
not be updated until we return to mi_switch(), thus we deadlock against
ourselves. As a workaround, push the acquire and release of sched_lock out
to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in
set_user_ldt for the same reason.
Sleuting by: tmm
Tested by: tmm, dougb
For FTP control connection, keep the CRLF end-of-line termination
status in there.
Fixed the bug when the first FTP command in a session was ignored.
PR: 24048
MFC after: 1 week
* all members of msginfo from sysv_msg.c;
* msqids from sysv_msg.c;
* sema from sysv_sem.c; and
* shmsegs from sysv_shm.c;
These will be used by ipcs(1) in non-kvm mode.
Reviewed by: tmm
they can be used with cell operators like !.
As I did this, I noticed the whole CELL thing might have problems with
big endian architectures with sizeof(int)!=sizeof(void*).
UDP checksums too, not just IP. The chip only tells us if the checksum
is ok, it does not give us a copy of the partial checksum for later
processing. We have to deal with this the right way, but we can deal
with it.
- Use __func__ instead of __FUNCTION.
- Support power-off to S3 or S5 (takawata)
- Enable ACPI debugging earlier (with a sysinit)
- Fix a deadlock in the EC code (takawata)
- Improve arithmetic and reduce the risk of spurious wakeup in
AcpiOsSleep.
- Add AcpiOsGetThreadId.
- Simplify mutex code (still disabled).
attempting to remove nonexistant exports with MNT_DELEXPORT returns
an error; before this change it always succeeded. This caused
mountd(8) to log "can't delete exports for /whatever" warnings.
Change the error code from EINVAL to a more specific ENOENT, and
make mountd ignore this error when deleting the export list. I
could have just restored the previous behaviour of returning success,
but I think an error return is a useful diagnostic.
Reviewed by: phk
----
Make a device for each ISP- really usable only with devfs and add an ioctl
entry point (this can be used to (re)set debug levels, reset the HBA,
rescan the fabric, issue lips, etc).
----
Add in a kernel thread for Fibre Channel cards. The purpose of this
thread is to be woken up to clean up after Fibre Channel events
block things. Basically, any FC event that casts doubt on the
location or identify of FC devices blocks the queues. When, and
if, we get the PORT DATABASE CHANGED or NAME SERVER DATABASE CHANGED
async event, we activate the kthread which will then, in full thread
context, re-evaluate the local loop and/or the fabric. When it's
satisfied that things are stable, it can then release the blocked
queues and let commands flow again.
The prior mechanism was a lazy evaluation. That is, the next command
to come down the pipe after change events would pay the full price
for re-evaluation. And if this was done off of a softcall, it really
could hang up the system.
These changes brings the FreeBSD port more in line with the Solaris,
Linux and NetBSD ports. It also, more importantly, gets us being
more proactive about topology changes which could then be reflected
upwards to CAM so that the periph driver can be informed sooner
rather than later when things arrive or depart.
---
Add in the (correct) usage of locking macros- we now have lock transition
macros which allow us to transition from holding the CAM lock (Giant)
and grabbing the softc lock and vice versa. Switch over to having this
HBA do real locking. Some folks claim this won't be a win. They're right.
But you have to start somewhere, and this will begin to teach us how
to DTRT for HBAs, etc.
--
Start putting in prototype 2300 support. Add back in LIP
and Loop Reset as async events that each platform will handle.
Add in another int_bogus instrumentation point.
Do some more substantial target mode cleanups.
MFC after: 8 weeks
it becomes possible to trap in ptsstop() in kern/tty_pty.c
if the slave side has never been opened during the life of a kernel.
What happens is that calls to ttyflush() done from ptyioctl() for the
controlling side end up calling ptsstop() [via (*tp->t_stop)(tp, <X>)]
which evaluates the following:
struct pt_ioctl *pti = tp->t_dev->si_drv1;
In order for tp->t_dev to be set, the slave device must first be
opened in ttyopen() [kern/tty.c].
It appears that the only problem is calls to (*tp->t_stop)(tp, <n>),
so this could also happen with other ioctls initiated by the
controlling side before the slave has been opened.
PR: 27698
Submitted by: David Bein bein@netapp.com
MFC after: 6 days
it with vfsload("msdos").
(The proper fix would be to rename the `msdos' file system to
`msdosfs' in VFS_SET(), and mount_msdos(8) to mount_msdosfs(8).
But that would break too many existing fstab(5) setups, and
would require a lot of unnecessary documentation and code
msdos -> msdosfs changes.)
Noticed by: markm
csc_route and func_route to hold the way that each interrupt is
routed. csc is Card Status Change in the datasheets and standard, but
is called "Management Interrupt" in FreeBSDese. There are three types
of interrupt routing: ISA parallel, PCI parallel and ISA serial (some
chipsets support other types as well, but I don't plan on supporting
them).
When we try to allocate an interrupt, and the type for that interrupt
is pci_parallel, allow it to be shared by oring in RF_SHAREABLE to the
flags argument. Introduce pcic_alloc_resource to allow this to
happen.
file is processed by passing its name in argv[1]:
return(mod_loadobj(typestr, argv[1]));
however, it is not tested to see if argv[1] actually is defined.
At best, mod_loadobj() near line 244 returns an error like
"can't find 'garbage'" but if the "filename" entered is sufficiently
long, some buffer gets overrun. Of course, "load -t filename" is
actually a typo because we meant to type "load -t mfs_root filename";
nevertheless, a hung machine seems like too harsh a punishment for
such a small typo...
PR: i386/27693
Submitted by: Adrian Steinmann <ast@marabu.ch>
MFC after: 1 week
breakage:
- call PCIB_ROUTE_INTERRUPT() regardless of how valid the intline looks.
Some alphas leave garbage in the intline and leave the intr mapping
to OS platform support routines that map slots/buses to intlines
- Down in the alpha pci code, first try platform.pci_intr_route() and
if it doesn't exist or returns garbage, just read the intline out of
config space.
tested on AS500 (garbage in intline) and UP1000 (PC-like, intline is valid)
Note that a nice little hack like the APIC_IO section of pci_cfgregread()
is not workable. This is because the calling interface for
alpha_pci_route_interrupt() requires us to figure out the bus/slot/etc
from a device_t. At pci_read_device() time, we don't have a device_t
for the bus/slot/func in question.
instead of using two malloced arrays for storing channel lists, use an
slist. convert the sndstat device to use sbufs and optionally provide more
detail about channel state.
vchans are software mixed playback channels. they are not enabled by this
commit. they use the feeder infrastructure to emulate normal playback
channels in a manner transparent to applications, whilst providing as many
channels are desired, especially suitable for devices with only one hardware
playback channel. in the future they will provide additional features.
those wishing to test this functionality will need to add vchan.c to
sys/conf/files and use 'sysctl -w hw.snd.pcm0.vchans' to enable it.
blocksize and auto-rate selection are not yet supported.
SC_DEV isn't NULL; if it is, evaluate to NULL and don't dereference
NULL. Callers of VIRTUAL_TTY must already check for the result being
NULL since si_tty can be NULL, so this should be safe.
This fixes a panic when trying to switch to a different vty in an
environment such as userconfig (-c option to the kernel).
PR: 26508
the saved uid and gid during execve(). Unfortunately, the optimizations
were incorrect in the case where the credential was updated, skipping
the setting of the saved uid and gid when new credentials were generated.
This change corrects that problem by handling the newcred!=NULL case
correctly.
Reported/tested by: David Malone <dwmalone@maths.tcd.ie>
Obtained from: TrustedBSD Project
despite the fact that most people want to set exactly the same settings
regardless of which card they have. It has been repeatidly suggested
that this configuration should be done via ifconfig. This patch
implements the required functionality in ifconfig and add support to the
wi and an drivers. It also provides partial, untested support for the
awi driver.
PR: 25577
Submitted by: Brooks Davis <brooks@one-eyed-alien.net>
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.
When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).
Rewrite the make_dev_alias() to use this dependency facility.
kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.
Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.
kern/subr_disklabel.c:
Remove some now unneeded variables.
kern/subr_diskmbr.c:
Remove some ancient, commented out code.
kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()
by both OLDCARD and NEWCARD.
# didn't make the tables the same because oldcard supports more devices than
# newcard and newcard's 16-bit stuff needs some work.
initialized on the file system before trying to grab the lock of the
per-mount extattr structure, as this lock is unitialized in that case.
This is needed because ufs_extattr_vnode_inactive is called from
ufs_inactive, which is also used by EA-unaware file systems such as
ext2fs.
Reviewed by: rwatson
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
Add a CAPI (hardware independent) driver i4bcapi(4) and hardware driver
iavc (4) to support active CAPI-based BRI and PRI cards (currently AVM
B1 and T1 cards) to isdn4bsd.
interrupts on other buses. Right now it isn't used, but will be for
the pci attachment.
# Add copyright by me for this year since I've changed so much.
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.
Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%. For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.
I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.
Submitted by: tegge, dillon
o If the class is PCIC_BRIDGE, subclass is PCIS_BRIDGE_PCMCIA and
programming interface is 0, assume that it is a generic PCMCIA PCI
chip we can program. I don't think there are any of these that
we don't know about, but you never know.
o If the class is PCIC_BRIDGE, subclass is PCIS_BRIDGE_CARDBUS and
programming interface is 0, assume that it is a YENTA cardbus bridge
that we know how to cope with. There are likely some cardbus bridges
that haven't it made it in here yet.
pcic_{get,put}b_io. There are some pci bridges (the CL-PD6729 and
maybe others) that do not have memory mapped registers, so we'll need
these in both places. Declare them in pcicvar.h.
- Assert Giant in vm_pageout_scan() for the vnode hacking that it does.
- Don't hold vm_mtx around vget() or vput().
- Lock Giant when calling vm_pageout_scan() from the pagedaemon. Also,
lock curproc while setting the P_BUFEXHAUST flag.
- For now we still hold Giant for all of the vm_daemon. When process
limits are locked we will be only need Giant for swapout_procs().
- Restore the previous order of setting up a new vm_object. The previous
had a small bug where we zero'd out the flags after we set the
OBJ_ONEMAPPING flag.
- Add several asserts of vm_mtx.
- Assert Giant is held rather than locking and unlocking it in a few
places.
- Add in some #ifdef objlocks code to lock individual vm objects when
vm objects each have their own lock someday.
- Don't bother acquiring the allproc lock for a ddb command. If DDB
blocked on the lock, that would be worse than having an inconsistent
allproc list.
- Add a few KTR tracepoints to track the addition and removal of
vm_map_entry's and the creation adn free'ing of vmspace's.
- Adjust a few portions of code so that we update the process' vmspace
pointer to its new vmspace before freeing the old vmspace.
- Don't lock Giant in the scheduler() function except for when calling
faultin().
- In swapout_procs(), lock the VM before the proccess to avoid a lock order
violation.
- In swapout_procs(), release the allproc lock before calling swapout().
We restart the process scan after swapping out a process.
- In swapout_procs(), un #if 0 the code to bump the vmspace reference count
and lock the process' vm structures. This bug was introduced by me and
could result in the vmspace being free'd out from under a running
process.
- Fix an old bug where the vmspace reference count was not free'd if we
failed the swap_idle_threshold2 test.
acquired.
- Assert Giant is held in the strategy, getpages, and putpages methods and
the getchainbuf, flushchainbuf, and waitchainbuf functions.
- Always call flushchainbuf() w/o the VM lock.
- Always call vfs_setdirty() with vm_mtx held.
- Fix an old comment: vm_hold_unload_pages is called vm_hold_free_pages()
nowadays.
- Always call vm_hold_free_pages() w/o vm_mtx held.
vnodes.
- Fix an old bug that would leak a reference to a fd if the vnode being
mmap'd wasn't of type VREG or VCHR.
- Lock Giant in vm_mmap() around calls into the VM that can call into
pager routines that need Giant or into other VM routines that need
Giant.
- Replace code that used a goto to jump around the else branch of a test
to use an else branch instead.
the built-in 1000baseX interface in the Level 1 LXT1001 chip. The Level 1
PHY comes up with the isolate bit in the control register set by default,
but it also has the autonegotiate bit set. When you tell the xmphy driver
to select IFM_AUTO mode, it sees that the autoneg bit is already on, and
thus doesn't bother updating the control register. However this means that
the isolate bit is never turned off (unless you manually select 1000baseSX
full or half duplex mode, which does result in the control register being
modified and the ISO bit being turned off).
This subtle and unusual behavioral difference stopped me from being able
to receive packets on the SMC9462TX card for several days, since isolating
the PHY disconnects it from the MAC's data interface. The fix is to omit
the 'is the autoneg big set?' test, since it doesn't really provide much
of an optimization anyway.
This commit also updates the xmphy driver to support the Jato/Level 1
internal PHY. (I'm not sure how Jato Technologies is related to Level 1:
all I know is the OUI from the PHY ID registers maps to Jato in the OUI
database.) This will be used once I add the if_lge driver to support
the LXT10010 chip.
- Don't bother releasing Giant while doing a lookup on the vm_map of
initproc while starting up init. We have to grab it again right after
the lookup anyways.
the chipset. This is already how the multi-hose systems handle resource
allocation and it fixes a bug where dense and bwx memory allocations were
not handled properly.
Reviewed by: gallatin
at the top of the minute, whichever comes first. It seems
logtimeout() is only called once after the kernel log is opened
and then never again after that. So I guess syslogd only gets
kernel log messages by virtue of syncer(4)'s flushes ...?
PR: 27361
Submitted by: pkern@utcc.utoronto.ca
MFC after: 1 week
This fixes a number of warnings relating to removed cloned devices.
It also makes it possible to recreate deleted devices with
mknod(2). The major/minor arguments are ignored.
systems were repo-copied from sys/miscfs to sys/fs.
- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.
- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.
- Install header files for the above file systems.
- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.
requiring fewer header files for userland programs.
Remove the gross debug device/non-debug device hack used to recognize
whether the kernel module was in sync with the userland module.
compiled with debug support. This can be used by userland programs to
recognize which ioctls the module supports.
As a result, remove the gross debug device/non-debug device hack used
to recognize whether the kernel module was in sync with the userland
module.
Replace explicit references to major/minor numbers of vinum
superdevice with the VINUM_SUPERDEV macro written for that purpose.
have a slightly different 3.3V support than the other clones, so
compensate as best we can. Note: 3.3V support is untested since I do
not have any 3.3V cards that I know of to test it with.
needs instead of relying on idiosyncratic hacks in the tty subsystem.
Also add module code since this can now be compiled as a module.
Silence by: -hackers, -audit
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).
Submitted by: original (pre-SMPng) version by luoqi
shm_deallocate_segment because shmexit_myhook calls it, and the latter
should always be called with it already held.
Submitted by: dwmalone, dd
Approved by: alfred
gets incremented every time the kernel-userland interface changes.
This enables vinum(8) to check for the correct kernel version and to
produce a useful message if it doesn't match.
Requested by: Too many to count.
Move the definitions of struct drive, sd, plex and volume to
vinumobj.h.
Add a new debug flag, DEBUG_LOCKREQS, which logs only lock requests.
with more than one plex, the data will be accessed
multiple times. During this time, userland code could
potentially modify the buffer, thus causing data
corruption. In the case of a multi-plexed volume this
might be cosmetic, but in the case of a RAID-[45] plex it
can cause severe data corruption which only becomes
evident after a drive failure. Avoid this situation by
making a copy of the data buffer before using it.
Note that this solution does not guarantee any particular
content of the buffer, just that it remains unchanged for
the duration of the request.
Suggested by: alfred
Use this instead of DEBUG_LASTREQS to decide whether to log lock
requests.
MFS:
vinumlock: Catch a potential race condition where one process is
waiting for a lock, and between the time it is woken and
it retries the lock, another process gets it and places it
in the first entry in the table.
This problem has not been observed, but it's possible, and
it's easy enough to fix.
Submitted by: tegge
vinumunlock: Catch a real bug capable of hanging a system. When
releasing a lock, vinumunlock() called wakeup_one. This
caused wakeups to sometimes get lost. After due
consideration, we think that this is due to the fact that
you can't guarantee that some other process is also
waiting on the same address. This makes wakeup_one a
very dangerous function to use.
Requested by: bde
Add retryerrors keyword.
vinum_scandisk: Print a different message if an inadvertent start
command did not find any additional drives. The previous message "no
drives found" confused and worried many people.
MFS:
vinum_open: Recognize Mylex devices as storage devices.
In case of error, check the VF_RETRYERRORS flag in the subdisk and
don't take the subdisk down if it's set, just retry the I/O.
Requested by: peter
If the buffer has been copied (XFR_COPYBUF), release the copied
buffer when the I/O completes.
Suggested by: alfred
Desired by: bde
This commit is the first of a general cleanup of the header files..
It won't be enough to make bde happy.
Move debug definitions from vinumhdr.h.
Create a new struct rangelockinfo. In revision 1.21 of vinumlock.c,
the plex info was removed from struct rangelock, since it wasn't
needed there. It *is* needed for trace information, however, so use
struct rangelockinfo for that.
- Don't release the vm mutex early in pipespace() but instead hold it
across vm_object_deallocate() if vm_map_find() returns an error and
across pipe_free_kmem() if vm_map_find() succeeds.
- Add a XXX above a zfree() since zalloc already has its own locking,
one would hope that zfree() wouldn't need the vm lock.
flags if it is safe to do so, otherwise it will just alter the pmap state
(eg, clear the appropriate PG_FOx bits).
This gets alpha booting in the face of the vm_mtx introduction.
Reviewed by: dfr
- pc98_getmemsize() function returns available memory size under 16MB.
- getmemsize() function is merged from PC-AT's one.
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata) and
NOKUBI Hirotaka <nokubi@ff.iij4u.or.jp>
Reviewed by: hm
Bug in i4btel driver read routine corrected. The conditions in the
while() clause caused the receive queue to be referenced before checking
if a channel is connected, leading to kernel panic (do a 'dd
if=/dev/i4btel0 of=/dev/null' on an unconnected tel device, panic will
follow). Correction was to reorder the while clause conditions to check
for connectedness first.
Work through the various power commands and convert them from a "is
this a foo controller or a foo' controller or a foo''' controller" to
a cabability based scheme. We have bits in the softc that tell us
what kind of power control scheme the controller uses, rather than
relying on being able to enumerate them all. Cardbus bridges are
numerous, but nearly all implement the i82365sl-DF scheme (well, a few
implement cirrus CL-PD67xx, but those were made by Cirrus Logic!).
Add a pointer back to the softc in each pcic_slot so we can access
these flags.
Add comments that talk about the issues here. Also note in passing
that there are two differ Vpp schemes in use and that we may need to
adjust the code to deal with both of them. Note why it usually works
now.
We have 5 power management modes right now: KING, AB, DF, PD and VG.
AB is for the i82365 stpes A, B and C. DF is for step DF. PD is the
cirrus logic extensions for 3.3V while VG is the VADEM extensions for
3.3V. KING is for the IBM KING controller found on some old cards.
# I'm looking for one of those old cards or a laptop that has the KING
# bridge in it.
We have to still cheat and treat the AB parts like the DF parts
because pci isn't here yet. As far as I can tell, this is harmless
for actual old parts and necessary to work with 3.3V cards in some
laptops.
This almost eliminates all tests for controller in the code. There
are still a few unrelated to power that need taming as well.
o Introduce flags word to the softc. This will be used to control various
aspects of the driver. Right now there are two bits defined, PCIC_IO_MAPPED
and PCIC_MEM_MAPPED. One for ISA cards that are I/O mapped, the other is
for PCI cards that are memory mapped. Only the ISA side is implemented
with this commit.
o Introduce a pcic_dealloc which will cleanly dealloc resources used. Right
now it is only supported when called from probe/attach.
o Keep track of resources allocated in the pcic_softc.
o move pcictimeout_ch to the softc so we can support multiple devices
in polling mode.
o In ISA probe, set PCIC_IO_MAPPED.
o Introduce and compute the slot mask. This will be used later when
we expand the number of slots on ISA from 2 to 4. In such a case, we
appear to have to use polling mode otherwise we get two different cards
trying to drive the same interrupt line. I don't have hardware to
test this configuration, so I'll stop here.
o Add defines for the VS[12]# bits in register 0x16.
o Add comment about what we're doing reading register 0x16 (PCIC_CDGC)
in the DF case.
o Check bit VS1# rather than a random bit I was checking due to a bogus
transcrition on my part from nakagawa-san's article.
o Add note about IBM KING and 3.3V operation from information larned from
wildboard.
npxsave() went to great lengths to excecute fnsave with interrupts
enabled in case executing it froze the CPU. This case can't happen,
at least for Intel CPU/NPX's. Spurious IRQ13's don't imply spurious
freezes. Anyway, the complications were usually no-ops because IRQ13
is not used on i486's and newer CPUs, and because SMPng broke them in
rev.1.84. Forcible enabling of interrupts was changed to
write_eflags(old_eflags), but since SMPng usually calls npxsave() from
cpu_switch() with interrupts disabled, write_eflags() usually just
kept interrupts disabled.
npxinit() didn't have the usual race because it doesn't save to curpcb,
but it may have had a worse form of it since it uses the npx when it
doesn't "own" it. I'm not sure if locking prevented this. npxinit()
is normally caled with the proc lock but not sched_lock.
Use a critical region to protect pushing of curproc's npx state to
curpcb in npxexit(). Not doing so was harmless since it at worst
saved a wrong state to a dieing pcb.
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.
savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.
machines. The code formerly read:
long val;
if (val < (long)-0x80000000 || ...)
return EINVAL;
The constant 0x80000000 has type unsigned int. The unary `-'
operator does not change the type (or the value, in this case).
Therefore the promotion to long is done by 0-extension, giving
0x0000000080000000 instead of the desired 0xffffffff80000000. I
got rid of the `-' and changed the cast to (int32_t) to give proper
sign-extension on all architectures and to better reflect the fact
that we are range-checking a 32-bit value.
This commit also makes the analogous changes to ng_int{8,16}_parse
for consistency.
MFC after: 3 days
committed to disk before clearing them. More specifically, when
free_newdirblk is called, we know that the inode claims the new
directory block. However, if the associated pagedep is still linked
onto the directory buffer dependency chain, then some of the entries
on the pd_pendinghd list may not be committed to disk yet. In this
case, we will simply note that the inode claims the block and let
the pd_pendinghd list be processed when the pagedep is next written.
If the pagedep is no longer on the buffer dependency chain, then
all the entries on the pd_pending list are committed to disk and
we can free them in free_newdirblk. This corrects a window of
vulnerability introduced in the code added in version 1.95.
things to get 3.3V. It appears that some cardbus chipsets have id
registers that say they are C step parts, but they really support the
DF step 3.3V functionality.
# Need to verify that IBM KING is handled properly since the MISC1
# register is really a cirrus logic only register.
82C146. The Intel i82365SL-DF supports 3.3V cards. The Step A/B/C
parts do not appear to support this. This is hard to know for sure
since it was deduced from "compatible" parts' data sheets and the
article mentioned below.
Rework the VLSI detection to be a little nicer and not depend on
scanning cards twice. This would allow bad VLSI cards to coexist with
a good intel card, for example. We now detect i82365SL-DF cards where
before we'd detect a VLSI. For the most part, this is good, but we
run a small chance of detecting a single slot 82C146 as a i82365SL-DF.
Since I can't find a datasheet for the 82c146, I don't know if this is
a problem or not.
This work is based on an excellent article, in Japanese, by NAKAGAWA,
Yoshihisa-san that appeared in FreeBSD Press Number 4. He provided a
patch against PAO3 in his article. Since the pcic.c code has changed
some since then, I've gone ahead and cleaned up his patch somewhat and
changed how the code detects the buggy '146 cards.
I also removed the comment asking if there were other cards that
matched the 82C146 since we found one and additional information isn't
necessary.
vm_mtx does not recurse and is required for most low level
vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the
vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
copies out the current contents of the video buffer for a syscons terminal,
providing a snapshot of the text and attributes.
Based heavily on work originally submitted by Joel Holveck <joelh@gnu.org>
for 2.2.x almost 30 months ago, which I cleaned up a little, and forward
ported to -current.
See also the usr.bin/scrshot utility.
vn_start_write() on the given vnode will be successful. VOP_LEASE() may
help to solve this problem, but its return value ignored nearly everywhere.
For now just assume that the missing upper layer on write means insufficient
access rights (which is correct for most cases).
wakeup proc0 by hand to enforce the timeout.
- When swapping out a process, keep the process locked via the proc lock
from the first checks up until we clear PS_INMEM and set PS_SWAPPING in
swapout(). The swapout() function now must be called with the proc lock
held and releases it before returning.
- Comment out the code to attempt to lock a process' VM structures before
swapping out. It is broken in that it releases the lock after obtaining
it. If it does grab the lock, it needs to hand it off to swapout()
instead of releasing it. This can be revisisted when the VM is locked
as this is a valid test to perform. It also causes a lock order reversal
for the time being, which is the immediate cause for temporarily
disabling it.
the process in question locked as soon as we find it and determine it to
be eligible until we actually kill it. To avoid deadlock, we don't block
on the process lock but skip any process that is already locked during our
search.
lock. Since we won't actually block on a try lock operation, it's not
a problem. Add a comment explaining why it is safe to skip lock order
checking with try locks.
- Remove the ithread list lock spin lock from the order list.