sysctl requests to avoid wiring too much user memory. Only grab this
lock if the user's old buffer is larger than a page as a tradeoff to
allow more concurrency for common small requests.
- Just use a shared lock on the sysctl tree for user sysctl requests now.
MFC after: 1 week
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
le0 from i386. All these hints were marked 'disabled' and thus already
did not work "out of the box".
Discussed with: imp
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.
While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.
Additively make ipi_nmi_pending as static.
Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
changes
- eliminate redundant TB defines, other minor cosmetics
Reviewed by: marcel, nwhitehorn
Obtained from: Freescale, Semihalf
chipset-specific code to attach chipset-specific data.
- Use chipset-specific data in the acard and promise chipsets rather than
changing the ivars of ATA PCI devices. ivars are reserved for use by the
parent bus driver and are _not_ available for use by devices directly.
This fixes a panic during sysctl -a with certain Promise controllers with
ACPI enabled.
Reviewed by: mav
Tested by: Magnus Kling (kingfon @ gmail) (on 7)
MFC after: 3 days
error due to copyout failure or short buffer.
The later breaks the usermode iterators of the sysctl results that pack
arbitrary number of variable-sized structures. Iterator expects that
kernel filled exactly oldlen bytes, and tries to interpret half-filled
or garbage structure at the end of the buffer. In particular,
kinfo_getfile(3) segfaulted.
Reported and tested by: pho
MFC after: 3 weeks
fget_unlocked().
- Save old file descriptor tables created on expansion until
the entire descriptor table is freed so that pointers may be
followed without regard for expanders.
- Mark the file zone as NOFREE so we may attempt to reference
potentially freed files.
- Convert several fget_locked() users to fget_unlocked(). This
requires us to manage reference counts explicitly but reduces
locking overhead in the common case.
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.
Reviewed by: marcel, raj
Book-E parts by: raj
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.
Discussed with: emaste (briefly)
MFC after: 1 month
direct dispatch policy for specific protocols (NETISR_USB). We leave
the additional 'flags' argument to netisr_register() for the time being,
even though it is no longer required.
NULL or change it. We initialize it before we set if_ioctl. It can
therefore never be NULL, and most other drivers don't bother with this
sanity check.
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)
The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.
Correct a nearby whitespace error in the i386 pmap_copy().
Crash reported by: jeff@
MFC after: 6 weeks
following changes:
Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does. Suggested by: tegge
Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies. Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.
Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.
Introduce vm_page_set_valid().
Reviewed by: tegge
registers contents.
- Use memory barriers to preserve the order of buffer space operations.
This might be needed if we'll ever use this driver on architectures
where ordering is not guaranteed.
major/minor numbers of the devices.
Pretending that the vnodes are character devices prevents file tree
walkers from descending into the directories opened by current process.
Also, not doing stat on the filedescriptors prevents the recursive entry
into the VFS.
Requested by: kientzle
Discussed with: Jilles Tjoelker <jilles stack nl>
representing too large integer, instead of overflowing and possibly
returning a random but valid vnode.
Noted by: Jilles Tjoelker <jilles stack nl>
MFC after: 3 days
to a non loopback/ppp link types) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route is implicitly added by the L2
code during RTM_RESOLVE of that interface address. This host route is
deleted when that interface is removed.
Reviewed by: kmacy
commit and fix it correctly by removing the _KERNEL check entirely. Now
the kernel always sees the same value of NULL as userland meaning that it
sees __null, 0, or 0L when compiled as C++, and '(void *)0' when compiled
as C.
As an experiment, I changed snp(4) to use a mutex instead of an sx lock.
We can't enable this right now, because Syscons still picks up Giant.
It's nice to already have the framework there.
'(void *)0' for NULL for C++ compilers compiling kernel code. Together this
makes it easier to build kernel modules using C++.
Reviewed by: imp
MFC after: 3 days
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.
suffer from the race condition that motivated revision 1.94. Consequently,
the work-around that was implemented by revision 1.94 is no longer needed.
Moreover, reverting this work-around eliminates the need for
vfs_busy_pages() to acquire the page queues lock when preparing a buffer
for read.
Reviewed by: tegge
directly in cpu_reset() in order to idle the APs before exiting
the kernel and letting the BSP enter the firmware so that processes
like init(8) which still might be running on an AP at that point
don't cause a panic there when it crashes due to the fact it no
longer can be supported by the kernel.
MFC after: 3 days
adapted to MPSAFE cam(4) to a isp(4) specific callout structure.
Thanks to Florian Smeets for providing access to a machine exhibiting
this problem for debugging.
Approved by: mjacob
MFC after: 3 days
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.
Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.
For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.
Approved by: kib (mentor)
designation of the emulated kernel version.
linux_kernver() returns integer value formatted as 'VVVMMMIII' where
VVV - version, MMM - major revision, III - minor revision.
Approved by: kib (mentor)
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.
Pointed out by: bde
Approved by: kib (mentor)
- Just use #error when including <sys/ioctl.h> in the kernel. Code
hasn't used this header for years now and probably doesn't compile
anyway, because of -Werror.
- Get rid of struct ttysize, TIOCGSIZE and TIOCSSIZE. All code nowadays
use both TIOC[GS]SIZE and TIOC[GS]WINSZ. Because we have other popular
systems that don't implement the first, it's of little use to support
interfaces nowadays.
Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.
While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.
Reported by: John Hickey
PR: kern/133439
Reviewed by: dfr, kib
At first I allowed ioctl_compat.h to be included, but it just returned
an empty file. I had to do this, to keep kdump happy. I really want to
raise a compiler error when including this header, so now it will just
throw an error if you don't set COMPAT_43TTY.
that vnode_pager_input_smlfs() zeroes the page, it should not mark the page
as valid until after the page is zeroed. Otherwise, the page could be
mapped for read access (e.g., by vm_map_pmap_enter()) before the page is
zeroed. Reviewed by: tegge
Eliminate gratuitous clearing of the page's dirty mask by
vnode_pager_input_smlfs(). Instead, assert that the page is clean.
Reviewed by: tegge
Eliminate some blank lines.
Eliminate pointless calls to pmap_clear_modify() and vm_page_undirty() from
vnode_pager_input_old(). The page is not mapped. Therefore, it cannot have
any page table entries that are modified.
Eliminate an incorrect comment from vnode_pager_generic_getpages().
'net.inet.ip.fw.default_to_accept'. The current value can also be queried
via a read-only sysctl of the same name.
Requested by: plosher
MFC after: 1 week
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.
The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg. Struct vprocg is likely to become replaced in the near future with
a new jail management API import.
As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.
Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.
Permit kldload / kldunload operations to be executed only from the default
vimage context.
This change should have no functional impact on nooptions VIMAGE kernel
builds.
Reviewed by: bz
Approved by: julian (mentor)
This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated
thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode
interlock held.
This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio
thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks
to sync dirty data out to disk. The range locks are already held by a user-level process waiting
on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The
process could not be signalled because the spa_zio thread could not proceed.
The nature of this problem was not apparent due to ZFS locks opting out of witness which meant
that DDB did not know about the locks that were held by ZFS.
Reviewed by: pjd
MFC after: 7 days
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.
Reviewed by: dchagin, kib
Approved by: bz (mentor)
It turns out if we called cfmakeraw() on a TTY with only a rint handler
in place, it could inject data into the TTY, even though it should be
redirected. Always take a look at the hooks before looking at the
termios flags.
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.
linux_times() shall use the value that is exported to user space.
Pointyhat to: dchagin
PR: kern/134251
Approved by: kib (mentor)
MFC after: 2 weeks
The entire world seems to use the non-standard TIOCSCTTY ioctl to make a
TTY a controlling terminal of a session. Even though tcsetsid(3) is also
non-standard, I think it's a lot better to use in our own source code,
mainly because it's similar to tcsetpgrp(), tcgetpgrp() and tcgetsid().
I stole the idea from QNX. They do it the other way around; their
TIOCSCTTY is just a wrapper around tcsetsid(). tcsetsid() then calls
into an IPC framework.
Use the protocol family constants for the domain argument validation.
Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.
Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.
Suggested by: rwatson
Approved by: kib (mentor)
MFC after: 1 month
sc_rixmap is an inverse map
NB: could eliminate the check for an invalid rate by filling in 0 for
invalid entries but the rate control modules use it to identify
bogus rates so leave it for now
This is necessary for two reasons:
1) In order to avoid collisions with the use of a BIOs flags set by a consumer
or a provider
2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
available, so the consumer flags of a BIO had to be misused in order to
support enough flags. The new queue makes it possible to recycle the
GV_BIO_DONE flag into GV_BIO_GROW.
As a consequence, gvinum will now work with any other GEOM class under it or
on top of it.
- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
appear to come from a consumer of a gvinum volume. Use bio_cflags only for
cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.
PR: kern/133604
as they do not really relate and to prepare for an additional queue to be
covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
as for putting a new element into the BIO queue.
(i.e. seems to be) already set.
This should reduce console noise due to curvnet recursion reports.
This change has no impact on nooptions VIMAGE builds.
Approved by: julian (mentor)
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all
vnet instances.
Approved by: julian (mentor)
to module builds. This avoids having to have the module builds walk up
the tree to find the kernel sources. It also allows a kernel + module
build to succeed when a new level of module subdirectories is added without
requiring that the /usr/share/mk/bsd.kmod.mk file on the machine be patched.
MFC after: 1 week
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.
Approved by: kib (mentor)
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.
Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.
This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
Broadcom BCM43xx chipsets. This driver uses the v3 firmware that
needs to be fetched separately. A port will be committed to create
the bwi firmware module.
The driver matches the following chips: Broadcom BCM4301, BCM4307,
BCM4306, BCM4309, BCM4311, BCM4312, BCM4318, BCM4319
The driver works for 802.11b and 802.11g.
Limitations:
This doesn't support the 802.11a or 802.11n portion of radios.
Some BCM4306 and BCM4309 cards don't work with Channel 1, 2 or 3.
Documenation for this firmware is reverse engineered from
http://bcm.sipsolutions.net/
V4 of the firmware is needed for 11a or 11n support
http://bcm-v4.sipsolutions.net/
Firmware needs to be fetched from a third party, port to be committed
# I've tested this with a BCM4319 mini-pci and a BCM4318 CardBus card, and
# not connected it to the build until the firmware port is committed.
Obtained from: DragonFlyBSD, //depot/projects/vap
Reviewed by: sam@, thompsa@
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.
By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.