to SYSINIT routines if they can be resolved via symbol look up in DDB.
To avoid false positives, only honor a name if the symbol resolves
exactly to the pointer value (no offset).
MFC after: 1 week
perform copyin/copyout of the file data into the usermode
buffer. Typical filesystem hold vnode lock and some buffer locks over
the VOP_READ() and VOP_WRITE() operations, and since page fault
handler may need to recurse into VFS to get the page content, a
deadlock is possible.
The facility works by disabling page faults handling for the current
thread and attempting to execute i/o while allowing uiomove() to
access the usermode mapping of the i/o buffer. If all buffer pages are
resident, uiomove() is successfull and request is finished. If EFAULT
is returned from uiomove(), the pages backing i/o buffer are faulted
in and held, and the copyin/out is performed using uiomove_fromphys()
over the held pages for the second attempt of VOP call.
Since pages are hold in chunks to prevent large i/o requests from
starving free pages pool, and since vnode lock is only taken for
i/o over the current chunk, the vnode lock no longer protect atomicity
of the whole i/o request. Use newly added rangelocks to provide the
required atomicity of i/o regardind other i/o and truncations.
Filesystems need to explicitely opt-in into the scheme, by setting the
MNTK_NO_IOPF struct mount flag, and optionally by using
vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or
converting uio into request for uiomove_fromphys().
Reviewed by: bf (comments), mdf, pjd (previous version)
Tested by: pho
Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version)
MFC after: 2 months
the i/o regions of the vnode data space. The implementation is quite
simple-minded, it uses the list of the lock requests, ordered by
arrival time. Each request may be for read or for write. The
implementation is fair FIFO.
MFC after: 2 month
implementation specific vs. the common architecture definition.
Bring PPC4XX defines (PSL, SPR, TLB). Note the new definitions under
BOOKE_PPC4XX are not used in the code yet.
This change set is not supposed to affect existing E500 support, it's just
another reorg step before bringing support for E500mc, E5500 and PPC465.
Obtained from: AppliedMicro, Freescale, Semihalf
This combination doesn't make sense, unit numbers should be hardwired
only in context of a known driver. The wildcard devices should have
wildcard unit numbers.
Reviewed by: jhb
MFC after: 2 weeks
this is a VNET-kernel or not. gcc used to put the static symbol into
the symbol table, clang does not. This fixes the 'netstat: no namelist'
error seen on clang+VNET systems.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'
Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.
Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)
Approved by: kib(mentor)
MFC in: 4 weeks
Most part is merged from amd64.
- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).
- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).
- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.
- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).
- i386/i386/apic_vector.s
Added cpususpend().
- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().
- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().
- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.
MFC after: 3 days
Entries with zero inode number are considered placeholders by libc and
UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp,
unionfs.
Sponsored by: Google Summer of Code 2011
compatible with the sched provider implemented by Solaris and its open-
source derivatives. Full documentation of the sched provider can be found
on Oracle's DTrace wiki pages.
Note that for compatibility with scripts originally written for Solaris,
serveral probes are defined that will never fire. These probes are defined
to fire when Solaris-specific features perform certain actions. As these
features are not present in FreeBSD, the probes can never fire.
Also, I have added a two probes that are not defined in Solaris, lend-pri
and load-change. These probes have been added to make it possible to
collect schedgraph data with DTrace.
Finally, a few probes are defined in Solaris to take a cpuinfo_t *
argument. As it was not immediately clear to me how to translate that to
FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *.
Sponsored by: Sandvine Incorporated
MFC after: 2 weeks
is running on other cpu, the CALLOUT_PENDING flag is temporarily
cleared. Then, callout_stop() on this, in fact active, callout fails
because CALLOUT_PENDING is not set, and callout_stop() returns 0.
Now, in sleepq_check_timeout(), the failed callout_stop() causes the
sleepq code to execute mi_switch() without even setting the wmesg,
since the switch-out is supposed to be transient. In fact, the thread
is put off the CPU for full timeout interval, instead of being put on
runq immediately. Until timeout fires, the process is unkillable for
obvious reasons.
Fix this by marking the migrating callouts with CALLOUT_DFRMIGRATION
flag. The flag is cleared by callout_stop_safe() when the function
detects a migration, besides returning the success. The softclock()
rechecks the flag for migrating callout and cancels its execution if
the flag was cleared meantime.
PR: misc/166340
Reported, debugging traces provided and tested by:
Christian Esken <christian.esken trivago com>
Reviewed by: avg, jhb
MFC after: 1 week
if the accounting log file is atomically replaced with a new file
(such as during log rotation).
- Simplify accounting log rotation a bit. There is no need to re-run
accton(8) after renaming the new log file to it's real name.
PR: kern/167321
Tested by: Jeremy Chadwick
to the process id. It follows the ptrace(2) interface and allows debugging
libraries to use thread ids directly, without slow and verbose conversion
of thread id into pid.
The PGET_NOTID flag is provided to allow a specific sysctl to disallow
this behaviour. All current callers of pget(9) have useful semantic to
operate on tid and do not need this flag.
Reviewed by: jhb, trocini
MFC after: 1 week
over just the active vnodes associated with a mount point to replace
MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync
routines.
The vfs_msync routine is run every 30 seconds for every writably
mounted filesystem. It ensures that any files mmap'ed from the
filesystem with modified pages have those pages queued to be
written back to the file from which they are mapped.
The ffs_lazy_sync and qsync routines are run every 30 seconds for
every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine
ensures that any files that have been accessed in the previous
30 seconds have had their access times queued for updating in the
filesystem. The qsync routine ensures that any files with modified
quotas have those quotas queued to be written back to their
associated quota file.
In a system configured with 250,000 vnodes, less than 1000 are
typically active at any point in time. Prior to this change all
250,000 vnodes would be locked and inspected twice every minute
by the syncer. For UFS/FFS filesystems they would be locked and
inspected six times every minute (twice by each of these three
routines since each of these routines does its own pass over the
vnodes associated with a mount point). With this change the syncer
now locks and inspects only the tiny set of vnodes that are active.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.
To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).
This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
used only as a helper function in that file. Replace sole call to
vbusy() with inline code in vholdl(). Replace sole calls to vfree()
and vdestroy() with inline code in vdropl().
The Clang compiler already inlines these functions, so they do not
show up in a kernel backtrace which is confusing. Also you cannot
set their frame in kgdb which means that it is impossible to view
their local variables. So, while the produced code is unchanged,
the debugging should be easier.
Discussed with: kib
MFC after: 2 weeks
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).
To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.
The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
allow the owner to read and write ACL and file attributes when there
was no entry with subject matching the owner. In other words,
'getfacl meh' shouldn't fail for the owner if the ACL looks like this:
# file: meh
# owner: trasz
# group: wheel
user:root:------a-------:------:allow
Reported by: kientzle
like the one triggered by this:
# kldload geom_vinum
# pwait `pgrep -S gv_worker` &
# kldunload geom_vinum
or this:
GEOM_JOURNAL: Shutting down geom gjournal 3464572051.
panic: destroying non-empty racct: 1 allocated for resource 6
which were tracked by jh@ to be caused by checking p->p_flag,
while it wasn't initialised yet. Basically, during fork, the code
checked p_flag, concluded the process isn't marked as P_SYSTEM,
incremented the counter, and later on, when exiting, checked that
the process was marked as P_SYSTEM, and thus didn't decrement it.
Also, I believe there wasn't any good reason for checking P_SYSTEM
in the first place.
Tested by: jh