Commit Graph

13870 Commits

Author SHA1 Message Date
kib
d972eee1e7 Fix fcntl(2) compat32 after r270691. The copyin and copyout of the
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.

Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.

Reported by:	jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-09-25 21:07:19 +00:00
kib
0529718f1d In kern_linkat() and kern_renameat(), do not call namei(9) while
holding a write reference on the filesystem.  Try to get write
reference in unblocked way after all vnodes are resolved; if failed,
drop all locks and retry after waiting for suspension end.

The VFS_UNMOUNT() methods for UFS and tmpfs try to establish
suspension on unmount, while covered vnode is locked by VFS, which
prevents namei() from stepping over the mount point.  The thread doing
namei() sleeps on the covered vnode lock, owning the write ref.

Reported by:	bdrewery
Tested by:	bdrewery (previous version), pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-25 20:42:25 +00:00
jhibbits
6208989a41 Stage one of multipass suspend/resume
Summary:
Add the beginnings of multipass suspend/resume, by introducing
BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this.

Reviewers: jhb

Reviewed By: jhb

Differential Revision: https://reviews.freebsd.org/D590
2014-09-23 02:56:40 +00:00
jhb
8f082668d0 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
jhb
d08fb7f877 Convert from timeout(9) to callout(9). 2014-09-22 14:27:26 +00:00
hselasky
bdacf9ba4d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
sbruno
3118b08dd6 svn revisions r269964 and r269963 seemed to have impaired small memory
footprint systems(32M/64M) and didn't leave enough free memory to load modules
when it was setting up page tables that for sizes that are never used on
these smallish boards.

Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this
from happening. Verified on mips32 h/w.

PR:             193465
Submitted by:   delphij
Reviewed by:    adrian
2014-09-22 05:07:22 +00:00
mav
d4e6695660 Reprase r271616 comments.
Submitted by:	alc
MFC after:	1 month
2014-09-17 17:43:32 +00:00
adrian
e4c630d701 Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather
than u_char.

Migrate post_filter to use an int for a CPU rather than u_char.

Change intr_event_bind() to use an int for CPU rather than u_char.

It touches the ppc, sparc64, arm and mips machdep code but it should
(hah!) be a no-op.

Tested:

* i386, AMD64 laptops

Reviewed by:	jhb
2014-09-17 17:33:22 +00:00
adrian
d3fedbed40 Modify cpuset_setithread() to take a CPU ID as an integer, not a char.
We're going to end up having > 254 CPUs at some point.
2014-09-16 01:21:47 +00:00
ngie
356c289c25 Validate the mode argument in access, eaccess, and faccessat for optional
POSIX compliance and to improve compatibility with Linux and NetBSD

The issue was identified with lib/libc/sys/t_access:access_inval from
NetBSD

Update the manpage accordingly

PR: 181155
Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code)
MFC after: 4 weeks
Phabric: D678 (code), D786 (manpage)
Sponsored by: EMC / Isilon Storage Division
2014-09-16 00:56:47 +00:00
mav
023a2a140b Add comments describing r271604 change.
MFC after:	3 days
2014-09-15 11:17:36 +00:00
mav
1cfaa7d62b Add couple memory barries to serialize tdq_cpu_idle and tdq_load accesses.
This change fixes transient performance drops in some of my benchmarks,
vanishing as soon as I am trying to collect any stats from the scheduler.
It looks like reordered access to those variables sometimes caused loss of
IPI_PREEMPT, that delayed thread execution until some later interrupt.

MFC after:	3 days
2014-09-14 22:13:19 +00:00
melifaro
0a46d9d7d5 Fix error handling in cpuset_setithread() introduced in r267716.
Noted by:	kib
MFC after:	1 week
2014-09-13 13:46:16 +00:00
jhb
4cd91e9d81 Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.
2014-09-12 21:29:10 +00:00
jhb
94540ef6a5 Tweak pipe_truncate() to more closely match pipe_chown() and pipe_chmod()
by checking PIPE_NAMED and using invfo_truncate() for unnamed pipes.
2014-09-12 21:20:36 +00:00
jhb
a17a2d5156 Simplify vntype_to_kinfo() by returning when the desired value is found
instead of breaking out of the loop and then immediately checking the loop
index so that if it was broken out of the proper value can be returned.

While here, use nitems().
2014-09-12 20:56:09 +00:00
glebius
5939c729a8 Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES(). 2014-09-10 12:36:41 +00:00
trasz
76b4ee1f43 Avoid unlocking unlocked mutex in RCTL jail code. Specific test case
is attached to PR.

PR:		193457
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-09-09 16:05:33 +00:00
hrs
a8a6b19fee - Make hhook_run_socket() vnet-aware instead of adding CURVNET_SET() around
the function calls.
- Fix a memory leak and stats in the case that hhook_run_socket() fails
  in soalloc().

PR:	193265
2014-09-08 09:04:22 +00:00
dumbbell
af6fb4b67f pause_sbt(): Take the cold path (ie. use DELAY()) if KDB is active
This fixes a panic in the i915 driver when one uses debug.kdb.enter=1
under vt(4).

PR:		193269
Reported by:	emaste@
Submitted by:	avg@
MFC after:	3 days
2014-09-08 08:44:50 +00:00
glebius
bd6b462b17 Fix for r271182.
Submitted by:	mjg
Pointy hat to:	me, submitter and everyone who urged me to commit
2014-09-07 05:44:14 +00:00
mjg
64b244d971 Plug unnecessary fp assignments in kern_fcntl.
No functional changes.
2014-09-05 23:56:25 +00:00
glebius
8122a06e8c Set vnet context before accessing V_socket_hhh[].
Submitted by:	"Hiroo Ono (小野寛生)" <hiroo.ono+freebsd gmail.com>
2014-09-05 19:50:18 +00:00
sbruno
aac1b9fa39 Allow multiple image activators to run on the same execution by changing
imgp->interpreted to a bitmask instead of, functionally, a bool. Each
imgactivator now requires its own flag in interpreted to indicate whether
or not it has already examined argv[0].

Change imgp->interpreted to an unsigned char to add one extra bit for
future use.

With this change, one can execute a shell script from a 64bit host native
make and still get the binmisc image activator to fire for the script
interpreter.  Prior to this, execution would fail.

Phabric:	https://reviews.freebsd.org/D696
Reviewed by:	jhb@
MFC after:	4 weeks
2014-09-04 21:31:25 +00:00
glebius
0e01998d71 Change a very strange code in m_demote() to simple assertion.
Sponsored by:	Nginx, Inc.
2014-09-04 19:27:30 +00:00
glebius
9bd9cb7c21 Provide m_catpkt(), a wrapper around m_cat() that deals with M_PKTHDR mbufs.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-04 09:07:14 +00:00
mjg
9877edbb6d Plug a hypothetical use after free in sysctl kern.proc.groups.
MFC after:	1 week
2014-09-04 01:21:33 +00:00
benno
a11ed5d080 Add KASSERTs to catch the case where a developer may have forgotten to
set bo_bsize on a bufobj.

This is a slight modification of the patch provided.

PR:		193146
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC Isilon Storage Division
2014-09-04 00:10:06 +00:00
kib
f6ccc75aa8 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:40:16 +00:00
kib
7e3c6a1f22 Retire thread_unthread(), it has only one caller. Update comment in
the block of code before the previous call to thread_unthread().

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:35:42 +00:00
kib
401c0a1c8f Right now, thread_single(SINGLE_EXIT) returns after the p_numthreads
reaches 1. The p_numthreads counter is decremented in thread_exit() by
a call to thread_unlink(). This means that the exiting threads may
still execute on other CPUs when thread_single(SINGLE_EXIT) returns.
As result, vmspace could be destroyed while paging structures are
still used on other CPUs by exiting threads.

Delay the return from thread_single(SINGLE_EXIT) until all threads are
really destroyed by thread_stash() after the last switch out. The
p_exitthreads counter already provides the required mechanism, move
the wait from the thread_wait() (which is called from wait(2) code)
into thread_single().

Reported by:	many (as "panic: pmap active <addr>")
Reviewed by:	alc, jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:18:07 +00:00
glebius
e58a48ab6d Fix dereference after NULL check.
CID:		1234607
Sponsored by:	Nginx, Inc.
2014-09-03 08:14:07 +00:00
mjg
25686bf39b Fix up proc_realparent to always return correct process.
Prior to the change it would always return initproc for non-traced processes.

This fixes ps apparently always returning 1 as ppid.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-09-03 06:25:34 +00:00
alc
16a3d82ef4 Automatically prefault a limited number of mappings to resident pages in
shmat(2), just like mmap(2).

MFC after:	5 days
Sponsored by:	EMC / Isilon Storage Division
2014-08-31 17:38:41 +00:00
mjg
4cf719a9ee Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
andreast
d50aa761e5 Rename shm_dict_init to shm_init to fix a compiler warning.
Reviewed by:	jhb
2014-08-29 21:50:32 +00:00
jhb
3314eb81f3 Use a unit number allocator to provide suitable st_dev and st_ino values
for POSIX shared memory descriptors.  The implementation is similar to
that used for pipes.

MFC after:	1 week
2014-08-29 18:18:29 +00:00
kib
d7608dcfd6 Add function and wrapper to switch lockmgr and vnode lock back to
auto-promotion of shared to exclusive.

Tested by:	hrs, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-29 09:02:01 +00:00
mjg
ec92f2e61c Return real parent pid in kinfo (used by e.g. ps)
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
2014-08-28 08:41:11 +00:00
dumbbell
afc6b94804 vt(4): Add cngrab() and cnungrab() callbacks
They are used when a panic occurs or when entering a DDB session for
instance.

cngrab() forces a vt-switch to the console window, no matter if the
original window is another terminal or an X session. However, cnungrab()
doesn't vt-switch back to the original window currently.

MFC after:	1 week
2014-08-27 10:04:10 +00:00
glebius
1ac724b05e - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-26 14:44:08 +00:00
mjg
c5c0a26f76 Fix up races with f_seqcount handling.
It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after:	1 week
2014-08-26 08:17:22 +00:00
kib
921aca5fb8 Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321.  Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-24 16:37:50 +00:00
mjg
b110d1e264 Plug a memory leak in case of failed lookups in capability mode.
Put common cnp cleanup into one function and use it for this purpose.

MFC after:	1 week
2014-08-24 12:51:12 +00:00
mjg
a06f202015 Use refcount_init in sigacts_alloc.
This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after:	3 days
2014-08-24 09:24:37 +00:00
mjg
9405234344 Fix getppid for traced processes.
Traced processes always have the tracer set as the parent.
Utilize proc_realparent to obtain the right process when needed.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:04:09 +00:00
mjg
eda187d10a Properly reparent traced processes when the tracer dies.
Previously they were uncoditionally reparented to init. In effect
it was possible that tracee was never returned to original parent.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:02:16 +00:00
mav
f7d8522961 Restore pre-r239157 handling of sched_yield(), when thread time slice was
aborted, allowing other threads to run.  Without this change thread is just
rescheduled again, that was illustrated by provided test tool.

PR:		192926
Submitted by:	eric@vangyzen.net
MFC after:	2 weeks
2014-08-23 17:31:56 +00:00
kib
fbb1e58092 In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check.  This restores the pre-r251684 behaviour, to
retry once after the signal is detected.

PR:	kern/192918
Submitted by:	Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net>
Obtained from:	Dell Inc.
MFC after:	1 week
2014-08-22 18:42:14 +00:00