Commit Graph

12685 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
effb6326a1 Remove redundant include.
MFC after:	1 month
2012-06-10 20:24:01 +00:00
Pawel Jakub Dawidek
297f11037f Style: move opt_*.h includes in the proper place.
MFC after:	1 month
2012-06-10 20:22:10 +00:00
Pawel Jakub Dawidek
69d7614850 When we are closing capability during dup2(), we want to call mq_fdclose()
on the underlying object and not on the capability itself.

Discussed with:	rwatson
Sponsored by:	FreeBSD Foundation
MFC after:	1 month
2012-06-10 14:57:18 +00:00
Pawel Jakub Dawidek
1b693d7494 Merge two ifs into one. Other minor style fixes.
MFC after:	1 month
2012-06-10 13:10:21 +00:00
Pawel Jakub Dawidek
8849ae7256 Simplify fdtofp().
MFC after:	1 month
2012-06-10 06:31:54 +00:00
Kirk McKusick
75c898f2a4 When synchronously syncing a device (MNT_WAIT), wait for buffers
to become available. Otherwise we may excessively spin and fail
with ``fsync: giving up on dirty''.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   1 week
2012-06-09 22:26:53 +00:00
Pawel Jakub Dawidek
e59a97362d There is no need to drop the FILEDESC lock around malloc(M_WAITOK) anymore, as
we now use sx lock for filedesc structure protection.

Reviewed by:	kib
MFC after:	1 month
2012-06-09 18:50:32 +00:00
Pawel Jakub Dawidek
68abac4337 Remove now unused variable.
MFC after:	1 month
MFC with:	r236820
2012-06-09 18:48:06 +00:00
Pawel Jakub Dawidek
380513aaae Make some of the loops more readable.
Reviewed by:	tegge
MFC after:	1 month
2012-06-09 18:03:23 +00:00
Pawel Jakub Dawidek
5d02ed91e9 Correct panic message.
MFC after:	1 month
MFC with:	r236731
2012-06-09 12:27:30 +00:00
Mitsuru IWASAKI
fb864578af Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
  among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
  amd64 code.

Reviewed by:	attilio@, acpi@
2012-06-09 00:37:26 +00:00
John Baldwin
7ac1b61aac Split the second half of vn_open_cred() (after a vnode has been found via
a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function
and use this function in fhopen() instead of duplicating code from
vn_open_cred() directly.

Tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
2012-06-08 18:32:09 +00:00
Mateusz Guzik
3b5da8d609 Plug socket refcount leak on error in sys_sctp_peeloff.
Reviewed by:	tuexen
Approved by:	trasz (mentor)
MFC after:	3 days
2012-06-08 08:04:51 +00:00
Pawel Jakub Dawidek
bf3e37ef15 In fdalloc() f_ofileflags for the newly allocated descriptor has to be 0.
Assert that instead of setting it to 0.

Sponsored by:	FreeBSD Foundation
MFC after:	1 month
2012-06-07 23:33:10 +00:00
Pawel Jakub Dawidek
d3644b04ca Eliminate redundant variable.
Sponsored by:	FreeBSD Foundation
MFC after:	1 week
2012-06-07 23:08:18 +00:00
Pawel Jakub Dawidek
f6ed2ff79d Plug file reference leak in capability failure case.
Sponsored by:	FreeBSD Foundation
MFC after:	3 days
2012-06-07 22:49:09 +00:00
Gleb Smirnoff
36eeafa0e5 style(9) for r236563. 2012-06-05 05:16:04 +00:00
Gleb Smirnoff
8955d2720f Microoptimisation of code from r236560, also coming from Nginx Inc.
Submitted by:	ru
2012-06-04 14:18:13 +00:00
Gleb Smirnoff
835d890042 Optimise kern_sendfile(): skip cycling through the entire mbuf chain in
m_cat(), storing pointer to last mbuf in chain in local variable and
attaching new mbuf to the end of chain.

Submitter reports that CPU load dropped for > 10% on a web server
serving large files with this optimisation.

Submitted by:	Sergey Budnevitch <sb nginx.com>
2012-06-04 12:49:21 +00:00
Konstantin Belousov
bba080854d Add a knob to disable vn_io_fault.
MFC after:	1 month
2012-06-03 16:19:37 +00:00
Konstantin Belousov
bb2f52a61d Count and export the number of prefaulting happen.
MFC after:	 1 month
2012-06-03 16:06:56 +00:00
Andriy Gapon
7adc598a15 free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.

MFC after:	3 weeks
2012-06-03 08:01:12 +00:00
Konstantin Belousov
d1b07fd498 Fix typo [1]. Use commas to separate flag printouts, in style with
other parts of function.

Submitted by: bf [1]
MFC after:   1 week
2012-06-02 19:39:12 +00:00
Konstantin Belousov
705de7c19e Update the print mask for decoding b_flags. Add print masks for
b_vflags and b_xflags_t and print them as well.

MFC after:   1 week
2012-06-02 18:44:40 +00:00
John Baldwin
b871e6613b Extend VERBOSE_SYSINIT to also print out the name of variables passed
to SYSINIT routines if they can be resolved via symbol look up in DDB.
To avoid false positives, only honor a name if the symbol resolves
exactly to the pointer value (no offset).

MFC after:	1 week
2012-06-01 15:42:37 +00:00
Pawel Jakub Dawidek
5edfa04b94 Regenerate after r236361.
MFC after:	3 days
2012-05-31 19:34:53 +00:00
Pawel Jakub Dawidek
6ba7e8178a Add missing system calls.
MFC after:	3 days
2012-05-31 19:32:37 +00:00
Pawel Jakub Dawidek
243f67938e There is no rmdirat system call. Weird, I know.
MFC after:	3 days
2012-05-31 19:31:28 +00:00
Warner Losh
a241707e7a Unlock in the error path to prevent a lock leak.
PR:		162174
Submitted by:	Ian Lepore
MFC after:	2 weeks
2012-05-31 17:27:05 +00:00
Konstantin Belousov
41014d996a vn_io_fault() is a facility to prevent page faults while filesystems
perform copyin/copyout of the file data into the usermode
buffer. Typical filesystem hold vnode lock and some buffer locks over
the VOP_READ() and VOP_WRITE() operations, and since page fault
handler may need to recurse into VFS to get the page content, a
deadlock is possible.

The facility works by disabling page faults handling for the current
thread and attempting to execute i/o while allowing uiomove() to
access the usermode mapping of the i/o buffer. If all buffer pages are
resident, uiomove() is successfull and request is finished. If EFAULT
is returned from uiomove(), the pages backing i/o buffer are faulted
in and held, and the copyin/out is performed using uiomove_fromphys()
over the held pages for the second attempt of VOP call.

Since pages are hold in chunks to prevent large i/o requests from
starving free pages pool, and since vnode lock is only taken for
i/o over the current chunk, the vnode lock no longer protect atomicity
of the whole i/o request. Use newly added rangelocks to provide the
required atomicity of i/o regardind other i/o and truncations.

Filesystems need to explicitely opt-in into the scheme, by setting the
MNTK_NO_IOPF struct mount flag, and optionally by using
vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or
converting uio into request for uiomove_fromphys().

Reviewed by:	bf (comments), mdf, pjd (previous version)
Tested by:	pho
Tested by:	flo, Gustau P?rez <gperez entel upc edu> (previous version)
MFC after:	2 months
2012-05-30 16:42:08 +00:00
Konstantin Belousov
8f0e91308a Add a rangelock implementation, intended to be used to range-locking
the i/o regions of the vnode data space. The implementation is quite
simple-minded, it uses the list of the lock requests, ordered by
arrival time. Each request may be for read or for write. The
implementation is fair FIFO.

MFC after:     2 month
2012-05-30 16:06:38 +00:00
Konstantin Belousov
6c5d7af158 Assert that TDP_NOFAULTING and TDP_NOSPEEPING thread flags do not leak
when thread returns from a syscall to usermode.

Tested by:	pho
MFC after:	1 week
2012-05-30 13:44:42 +00:00
Rafal Jaworowski
17f4cae4a5 Let us manage differences of Book-E PowerPC variations i.e. vendor /
implementation specific vs. the common architecture definition.

Bring PPC4XX defines (PSL, SPR, TLB). Note the new definitions under
BOOKE_PPC4XX are not used in the code yet.

This change set is not supposed to affect existing E500 support, it's just
another reorg step before bringing support for E500mc, E5500 and PPC465.

Obtained from:	AppliedMicro, Freescale, Semihalf
2012-05-27 10:25:20 +00:00
Konstantin Belousov
371778a333 Fix ki_cow for compat32 binaries.
MFC after:	3 days
2012-05-27 05:24:53 +00:00
Konstantin Belousov
9768156746 Stop treating td_sigmask specially for the purposes of new thread
creation. Move it into the copied region of the struct thread.

Update some comments.

Requested by:	bde
X-MFC after:	never
2012-05-26 20:03:47 +00:00
Konstantin Belousov
292520f710 Add a vn_bmap_seekhole(9) vnode helper which can be used by any
filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA
commands for lseek(2).

MFC after:	2 weeks
2012-05-26 05:28:47 +00:00
Ed Schouten
4412ad4887 Regenerate system call tables. 2012-05-25 21:52:57 +00:00
Ed Schouten
520b6a84f6 Remove use of non-ISO-C integer types from system call tables.
These files already use ISO-C-style integer types, so make them less
inconsistent by preferring the standard types.
2012-05-25 21:50:48 +00:00
Andriy Gapon
fa44da0995 device_add_child: protect against child device with no driver but fixed unit number
This combination doesn't make sense, unit numbers should be hardwired
only in context of a known driver.  The wildcard devices should have
wildcard unit numbers.

Reviewed by:	jhb
MFC after:	2 weeks
2012-05-25 07:32:26 +00:00
Alexander Motin
20654f4ef4 MFprojects/zfsd:
Hide warning behind bootverbose. Average user has nothing to do about it.
2012-05-24 11:24:44 +00:00
Gleb Kurtsou
76dcec5d09 Add kern_fhstat(), adjust sys_fhstat() to use it.
Extend kern_getdirentries() to accept uio segflag and optionally return
buffer residue.

Sponsored by:	Google Summer of Code 2011
2012-05-24 08:00:26 +00:00
Konstantin Belousov
4d34e019c4 Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by:	Andrey Zonov <andrey zonov org>
MFC after:	1 week
2012-05-23 18:10:54 +00:00
Edward Tomasz Napierala
1fb2497499 Fix use-after-free in kern_jail_set() triggered e.g. by attempts
to clear "persist" flag from empty persistent jail, like this:

jail -c persist=1
jail -n 1 -m persist=0

Submitted by:	Mateusz Guzik <mjguzik at gmail dot com>
MFC after:	2 weeks
2012-05-22 19:43:20 +00:00
Edward Tomasz Napierala
e30345e790 Don't leak locks in prison_racct_modify().
Submitted by:	Mateusz Guzik <mjguzik at gmail dot com>
MFC after:	2 weeks
2012-05-22 17:30:02 +00:00
Edward Tomasz Napierala
ab27d5d88a Fix panic with RACCT that could occur in low memory (or out of swap)
situations, due to fork1() calling racct_proc_exit() without calling
racct_proc_fork() first.

Submitted by:	Mateusz Guzik <mjguzik at gmail dot com> (earlier version)
Reviewed by:	Mateusz Guzik <mjguzik at gmail dot com>
2012-05-22 15:58:27 +00:00
Hartmut Brandt
ac6e25ec7d Make dumptid non-static. It is used by libkvm to detect whether
this is a VNET-kernel or not. gcc used to put the static symbol into
the symbol table, clang does not. This fixes the 'netstat: no namelist'
error seen on clang+VNET systems.
2012-05-22 07:23:41 +00:00
Alexander V. Chernikov
afa85850e7 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
Mitsuru IWASAKI
e3fd0bc1b2 Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after:	3 days
2012-05-18 18:55:58 +00:00
Gleb Kurtsou
ac13a90c4b Skip directory entries with zero inode number during traversal.
Entries with zero inode number are considered placeholders by libc and
UFS.  Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp,
unionfs.

Sponsored by:	Google Summer of Code 2011
2012-05-16 10:44:09 +00:00
Sergey Kandaurov
2aaae99d96 Fix typo in function name SDT_PROBE4 and unbreak 4BSD UP. 2012-05-15 10:58:17 +00:00