we are processing has a base address of zero. Note that this will only
change behavior for devices where all the BARs of a given type have a base
address of 0 since we will enable the appropriate access when we encounter
the first BAR with a base that is not 0. Specifically, this allows certain
Toshiba laptops to no longer require 'hw.pci.enable_io_modes=0' to avoid
hangs during boot.
PR: kern/20040
PR: i386/63776 (possibly)
PR: i386/68900 (possibly)
PR: i386/74532 (possibly)
MFC after: 1 week
file's access time should be updated when it gets executed. A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.
A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called. The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all. Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc. In particular vn_start_write() has not been called. The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point. The actual time update will happen later
during a sync (which handles all the necessary locking).
Got me into this: cperciva
Discussed with: a lot with bde, a little with kan
Showed patches to: phk, jeffr, standards@, arch@
Minor discussion on: arch@
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
are subtle differences in the read and write completion path. Instead,
grab an extra write ref so the write path can drop it when we recursively
call bufdone(). I believe this may be the source of the wrong bufobj
panics.
Reported by: pho, kkenn
structure, sysent. This field will hold the default audit event
to generate when the system call is entered. Currently, it will
default to 0 due to allocation in bss.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
a nested include of param.h is required so that MAXCPU is visible to all
consumers of sys/malloc.h. In an earlier version of the patch, the
malloc_type_internal structure was only conditionally visible.
Pointed out by: delphij
in order to modify the system call table to include event identifiers.
The full audit.h will be merged at a later date.
Obtained from: TrustedBSD Project
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations. In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m. Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.
This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections. It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics. To do this, several changes are
implemented:
- In the previous world order, the statistics memory was allocated by
the owner of the malloc type structure, allocated statically using
MALLOC_DEFINE(). This embedded the definition of the malloc_type
structure into all kernel modules. Move to a model in which a pointer
within struct malloc_type points at a UMA-allocated
malloc_type_internal data structure owned and maintained by
kern_malloc.c, and not part of the exported ABI/API to the rest of
the kernel. For the purposes of easing a possible MFC, re-use an
existing pointer in 'struct malloc_type', and maintain the current
malloc_type structure size, as well as layout with respect to the
fields reused outside of the malloc subsystem (such as ks_shortdesc).
There are several unused fields as a result of no longer requiring
the mutex in malloc_type.
- Struct malloc_type_internal contains an array of malloc_type_stats,
of size MAXCPU. The structure defined above avoids hard-coding a
kernel compile-time value of MAXCPU into kernel modules that interact
with malloc.
- When accessing per-cpu statistics for a malloc type, surround read -
modify - update requests with critical_enter()/critical_exit() in
order to avoid races during write. The per-CPU fields are written
only from the CPU that owns them.
- Per-CPU stats now maintained "allocated" and "freed" counters for
number of allocations/frees and bytes allocated/freed, since there is
no longer a coherent global notion of the totals. When coalescing
malloc stats, accept a slight race between reading stats across CPUs,
and avoid showing the user a negative allocation count for the type
in the event of a race. The global high watermark is no longer
maintained for a malloc type, as there is no global notion of the
number of allocations.
- While tearing up the sysctl() path, also switch to using sbufs. The
current "export as text" sysctl format is retained with the same
syntax. We may want to change this in the future to export more
per-CPU information, such as how allocations and frees are balanced
across CPUs.
This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations. There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there. The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.
Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change. However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs. The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.
Several improvements from: bde
Statistics approach discussed with: ups
Tested by: scottl, others