7365 Commits

Author SHA1 Message Date
Attilio Rao
bc2258da88 Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.
2012-11-09 18:02:25 +00:00
Brooks Davis
5b6478b077 After years of hard work by many FreeBSD and LLVM developers, make
clang the default compiler on i386 and amd64 systems.

Special thanks to:	dim, ed, rdivacky
2012-11-05 19:08:18 +00:00
Hans Petter Selasky
d30d96ea57 Add a jitter buffer in the common USB serial driver code which
temporarily stores characters if the TTY buffer is full when
used a as a console. This can happen when a console is suspended.
Also properly do the flow stop signalling when this happens and
flow start when the condition changes back to normal again.

Bump __FreeBSD_version to force external kernel modules
to be recompiled. No kernel API changes.

MFC after:	1 week
Suggested by:	ed @
2012-11-05 17:50:40 +00:00
Ed Schouten
305921c48e Add tty_set_winsize().
This removes some of the signalling magic from the Syscons driver and
puts it in the TTY layer, where it belongs.
2012-11-03 22:21:37 +00:00
Attilio Rao
19d4153329 Merge r242395,242483 from mutex implementation:
give rwlock(9) the ability to crunch different type of structures, with
the only constraint that they have a lock cookie named rw_lock.
This name, then, becames reserved from the struct that wants to use
the rwlock(9) KPI and other locking primitives cannot reuse it for
their members.

Namely such structs are the current struct rwlock and the new struct
rwlock_padalign. The new structure will define an object which has the
same layout of a struct rwlock but will be allocated in areas aligned
to the cache line size and will be as big as a cache line.

For further details check comments on above mentioned revisions.

Reviewed by:	jimharris, jeff
2012-11-03 15:57:37 +00:00
Attilio Rao
e10acbc4d2 Tweak comment to make more clear why it will fail.
Submitted by:	jimharris
2012-11-02 16:31:01 +00:00
Alfred Perlstein
bad7e7f3dd Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by:	phk
2012-11-01 17:01:05 +00:00
Attilio Rao
7f44c61839 Give mtx(9) the ability to crunch different type of structures, with the
only constraint that they have a lock cookie named mtx_lock.
This name, then, becames reserved from the struct that wants to use the
mtx(9) KPI and other locking primitives cannot reuse it for their
members.

Namely such structs are the current struct mtx and the new
struct mtx_padalign.  The new structure will define an object which is
the same as the same layout of a struct mtx but will be allocated in
areas aligned to the cache line size and will be as big as a cache line.

This is supposed to give higher performance for highly contented mutexes
both spin or sleep (because of the adaptive spinning), where the cache
line contention results in too much traffic on the system bus.

The struct mtx_padalign can be used in a completely transparent way
with the mtx(9) KPI.

At the moment, a possibility to MFC the patch should be carefully
evaluated because this patch breaks the low level KPI
(not its representation though).

Discussed with:	jhb
Reviewed by:	jeff, andre
Reviewed by:	mdf (earlier version)
Tested by:	jimharris
2012-10-31 13:38:56 +00:00
Attilio Rao
d6073f0627 Compiler have a precise knowledge of the content of sched_pin() and
sched_unpin() as they are functions static and inline.  This way it
can do two dangerous things:
- Reorder instructions around both of them, taking out from the safe
  path operations that are supposed to be (ie. per-cpu accesses)
- Cache the value of td_pinned in CPU registers not making visible
  in kernel context to the scheduler once it is scanning the runqueue,
  as td_pinned is not marked volatile.

In order to avoid both possible bugs explicitly, protect the safe path
with compiler memory barriers. This will prevent reordering and caching
by the compiler about td_pinned operations.

Generally this could lead to suboptimal code traversing the pinnings
but this is not the case as can be easilly verified:
http://lists.freebsd.org/pipermail/svn-src-projects/2012-October/005797.html

Discussed with:	jeff, jhb
MFC after:	2 weeks
2012-10-29 01:35:17 +00:00
Edward Tomasz Napierala
f1988d463c Fix two problems that caused instant panic when the device mounted
with softupdates went away.  Note that this does not fix the problem
entirely; I'm committing it now to make it easier for someone to pick
up the work.

Reviewed by:	mckusick
2012-10-28 18:53:28 +00:00
Edward Tomasz Napierala
36af98697d Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".
It was implemented by Rudolf Tomori during Google Summer of Code 2012.
2012-10-26 16:01:08 +00:00
Ed Schouten
1da7bb41ed Correct SIGTTIN handling.
In the old TTY layer, SIGTTIN was correctly handled like this:

	while (data should be read) {
		send SIGTTIN if not foreground process group
		read data
	}

In the new TTY layer, however, this behaviour was changed, based on a
false interpretation of the standard:

	send SIGTTIN if not foreground process group
	while (data should be read) {
		read data
	}

Correct this by pushing tty_wait_background() into the ttydisc_read_*()
functions.

Reported by:	koitsu
PR:		kern/173010
MFC after:	2 weeks
2012-10-25 09:05:21 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Konstantin Belousov
8859ec84c5 Bump __FreeBSD_version and make a note in UPDATING about removal of
the support for non-MPSAFE filesystems.
2012-10-22 17:54:32 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Eitan Adler
078d07726f Fix build if COMPAT_43 is defined without one of
COMPAT_FREEBSD[4567]

Approved by:	cperciva
2012-10-22 02:59:55 +00:00
Sean Bruno
fabe02f5f3 Update hwpmc to support the Xeon class of Sandybridge processors.
(Model 0x2D     /* Per Intel document 253669-044US 08/2012. */)

Add manpage to document all the goodness that is available in this
processor model.

No support for uncore events at this time.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jimharris@ fabient@
Obtained from:	Yahoo! Inc.
MFC after:	  2 weeks
2012-10-19 17:01:27 +00:00
Andre Oppermann
a8c9f6fd75 Remove splimp() comment from sysinit table and attribute SI_SUB_PROTO_BEGIN
and SI_SUB_PROTO_END to VNET related initializations.

MFC after:	3 days
2012-10-19 10:04:43 +00:00
Attilio Rao
2e564269d0 Disconnect non-MPSAFE SMBFS from the build in preparation for dropping
GIANT from VFS. In addition, disconnect also netsmb, which is a base
requirement for SMBFS.

In the while SMBFS regular users can use FUSE interface and smbnetfs
port to work with their SMBFS partitions.

Also, there are ongoing efforts by vendor to support in-kernel smbfs,
so there are good chances that it will get relinked once properly locked.

This is not targeted for MFC.
2012-10-18 12:04:56 +00:00
Gleb Smirnoff
42a58907c3 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
Konstantin Belousov
9b233e2307 Add a KPI to allow to reserve some amount of space in the numvnodes
counter, without actually allocating the vnodes. The supposed use of
the getnewvnode_reserve(9) is to reclaim enough free vnodes while the
code still does not hold any resources that might be needed during the
reclamation, and to consume the slack later for getnewvnode() calls
made from the innards. After the critical block is finished, the
caller shall free any reserve left, by getnewvnode_drop_reserve(9).

Reviewed by:	avg
Tested by:	pho
MFC after:	1 week
2012-10-14 19:43:37 +00:00
Attilio Rao
3a4730256a Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
Gleb Smirnoff
21d172a3f1 A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order.
  - ip_input() is entered in net byte order and converts packet
    to host byte order right _after_ processing pfil(9) hooks.
  - ip_output() is entered in host byte order and converts packet
    to net byte order right _before_ processing pfil(9) hooks.
  - ip_fragment() accepts and emits packet in net byte order.
  - ip_forward(), ip_mloopback() use host byte order (untouched actually).
  - ip_fastforward() no longer modifies packet at all (except ip_ttl).
  - Swapping of byte order there and back removed from the following modules:
    pf(4), ipfw(4), enc(4), if_bridge(4).
  - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
  - __FreeBSD_version bumped.
  - pfil(9) manual page updated.

Reviewed by:	ray, luigi, eri, melifaro
Tested by:	glebius (LE), ray (BE)
2012-10-06 10:02:11 +00:00
Andriy Gapon
102548d143 mount.h: MNTK_VGONE_UPPER and MNTK_VGONE_WAITER were supposed to be different
... otherwise a waiter is never woken up.

Reported by:	swills
Discussed with:	jhb
Approved by:	kib
MFC after:	3 days
2012-10-05 14:42:38 +00:00
Tijl Coosemans
9cdf77375c Define clang feature test macro __has_extension. It's used in stdatomic.h. 2012-10-04 08:53:05 +00:00
Pawel Jakub Dawidek
55711729f3 - Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change
mkfifoat(2) was not restricted.
- Introduce CAP_MKNOD and enforce it on mknodat(2).

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-10-01 05:43:24 +00:00
Gleb Smirnoff
063efed28c The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
Pawel Jakub Dawidek
45a1f1e1ff Add rounddown2() macro similar to the roundup2() macro. 2012-09-22 17:49:25 +00:00
Attilio Rao
6a612df12c Remove namespace pollution in _rmlock.h by defining rm_queue structure
directly in _rmlock.h and then including it (and its dependencies)
in pcpu.h. This leads to few _*.h headers to be included in pcpu.h
but this is not considered a big deal.

Really pc_rm_queue should be implemented as a dynamic member with
DPCPU interface, but we really want to keep the read acquisition as
fast as possible, so even the further pc_dynamic indirection should be
avoided, and the pollution is dealt like this.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 00:43:15 +00:00
Ed Schouten
5b5d76847d Rename __member2struct() to __containerof().
Compared to __member2struct(), this macro has the following advantages:

- It ensures that the type of the pointer is compatible with the member
  field of the structure (or a void pointer).
- It works properly in combination with volatile and const, though
  unfortunately it drops these qualifiers from the returned value.

mdf@ proposed to add the container_of() macro, just like Linux has.
Eventually I decided against this, as <sys/param.h> is included all over
the place. It seems container_of() on Linux is specific to the kernel,
not userspace. I'd rather not pollute userspace with this.

I also thought about adding __container_of(), but this would have two
advantages. Xorg seems to already have a __container_of(), which is not
compatible with this version. Also, the underscore in the middle
conflicts with our existing macros (__offsetof, __rangeof, etc).

I'm changing member2struct() to use its old code, as the extra
strictness of this new macro conflicts with existing code (read: cxgb).

MFC after:	1 month
2012-09-13 08:13:01 +00:00
Ed Schouten
e48063402c Correctness: use __member2struct() on the correct fields.
The prev-pointers point to the next-pointers of the previous element --
not the ENTRY structure. The next-pointers are stored in the ENTRY
structures first, so the code would already work correctly. Still, it is
more accurate to use the next-fields.

To prevent misuse of __member2struct() in the future, I've got a patch
that requires the pointer to be passed to this macro to be compatible
with the member of the structure. I'll commit this patch after I've
tested it properly.

MFC after:	1 month.
2012-09-12 22:54:11 +00:00
Ed Schouten
4170b08388 Implement LIST_PREV().
Regular LISTs have been implemented in such a way that the prev-pointer
does not point to the previous element, but to the next-pointer stored
in the previous element. This is done to simplify LIST_REMOVE(). This
macro can be implemented without knowing the address of the list head.

Unfortunately this makes it harder to implement LIST_PREV(), which is
why this macro was never here. Still, it is possible to implement this
macro. If the prev-pointer points to the list head, we return NULL.
Otherwise we simply subtract the offset of the prev-pointer within the
structure.

It's not as efficient as traversing forward of course, but in practice
it shouldn't be that bad. In almost all use cases, people will want to
compare the value returned by LIST_PREV() against NULL, so an optimizing
compiler will not emit code that does more branching than TAILQs.

While there, make the code a bit more readable by introducing
__member2struct(). This makes STAILQ_LAST() far more readable.

MFC after:	1 month
2012-09-12 21:03:48 +00:00
Konstantin Belousov
bcd5bb8e57 Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:17:15 +00:00
Konstantin Belousov
84c3cd4f19 Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the
lookup code that dotdot lookups shall override any shared lock
requests with the exclusive one. The flag is useful for filesystems
which sometimes need to upgrade shared lock to exclusive inside the
VOP_LOOKUP or later, which cannot be done safely for dotdot, due to
dvp also locked and causing LOR.

In collaboration with:	    pho
MFC after:	3 weeks
2012-09-09 19:11:52 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
John Baldwin
b9b256e49a Remove NetBSD compat shims for drivers originally shared with NetBSD/pc98.
NetBSD/pc98 was never merged into the main NetBSD tree and is no longer
developed.  Adding locking to these drivers would have made the compat
shims hard to impossible to maintain, so remove the shims to ease
future changes.

These changes were verified by md5.  Some additional shims can be removed
that do affect the compiled results that I will probably do in another
round.

Approved by:	nyan (tentatively)
2012-09-06 18:53:33 +00:00
Fabien Thomas
1e862e5ad0 Add Intel Ivy Bridge support to hwpmc(9).
Update offcore RSP token for Sandy Bridge.
Note: No uncore support.

Will works on Family 6 Model 3a.

MFC after: 1 month
Tested by: bapt, grehan
2012-09-06 13:54:01 +00:00
Gleb Smirnoff
62208ca5d2 - Move jenkins.h to jenkins_hash.c
- Provide missing function that can do hashing of arbitrary sized buffer.
- Refetch lookup3.c and do only minimal edits to it, so that diff between
  our jenkins_hash.c and lookup3.c is minimal.
- Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h.
- Document these functions in hash(9)

Obtained from:	http://burtleburtle.net/bob/c/lookup3.c
2012-09-04 12:07:33 +00:00
Ed Schouten
4a8914c627 While there, remove an unneeded blank line.
MFC after:	1 month
2012-09-01 08:45:58 +00:00
Ed Schouten
ef42458103 Fix whitespace.
MFC after:	1 month
2012-09-01 08:45:19 +00:00
Attilio Rao
d4a2ab8c07 Post r222812 KTR_CPUMASK started being initialized only as a tunable
handler and not more statically.

Unfortunately, it seems that this is not ideal for new platform bringup
and boot low level development (which needs ktr_cpumask to be effective
before tunables can be setup).

Because of this, add a way to statically initialize cpusets, by passing
an list of initializers, divided by commas. Also, provide a way to enforce
an all-set mask, for above mentioned initializers.

This imposes some differences on how KTR_CPUMASK is setup now as a
kernel option, and in particular this makes the words specifications
backward wrt. what is currently in -CURRENT. In order to avoid mismatches
between KTR_CPUMASK definition and other way to setup the mask
(tunable, sysctl) and to print it, change the ordering how
cpusetobj_print() and cpusetobj_scan() acquire the words belonging
to the set.
Please give a look to sys/conf/NOTES in order to understand how the
new format is supposed to work.

Also, ktr manpages will be updated shortly by gjb which volountereed
for this.

This patch won't be merged because it changes a POLA (at least
from the theoretical standpoint) and this is however a patch that
proves to be effective only in development environments.

Requested by:	rpaulo
Reviewed by:	jeff, rpaulo
2012-08-30 21:22:47 +00:00
Ed Schouten
fa4dd27847 Remove unused SI_* flags.
The SI_DEVOPEN, SI_CONSOPEN and SI_CANDELETE flags are not used by any
piece of code in the tree.
2012-08-28 19:30:29 +00:00
Jim Harris
34bca4aef6 Remove unncessary atomic operation when reading process flags in
PMC_PROC_IS_USING_PMCS macro.

Invocations of this macro are not synchronized with setting/clearing
of P_HWPMC flag, so the atomic operation here isn't needed.  Removing
the atomic operation provides noticeable improvement (5-6%) on
some scheduler-intensive workloads with HWPMC_HOOKS enabled on an
8C Sandy Bridge Xeon system.

Sponsored by: Intel
Reviewed by: jhb
MFC after: 1 week
2012-08-22 20:22:55 +00:00
David E. O'Brien
bac7a76e6f Missing one in r239505. 2012-08-21 17:06:36 +00:00
David E. O'Brien
b8d5e2e661 Restore the style of r195843 to that of pre-r194498
to reduce gratuitous diffs to older sources.
2012-08-21 17:05:10 +00:00
David Xu
e31eb35c3f regen. 2012-08-17 02:47:16 +00:00
David Xu
d65f1abca7 Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
Alan Cox
bed229c902 Eliminate some unused declarations. 2012-08-15 22:25:57 +00:00
Konstantin Belousov
02c6fc2114 Add a sysctl kern.pid_max, which limits the maximum pid the system is
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.

MFC after:	1 week
2012-08-15 15:56:21 +00:00
Hans Petter Selasky
c01fc06ee9 Revert r239178 and implement two new functions, namely
"device_free_softc()" and "device_claim_softc()",
to allow USB serial drivers refcounting the softc.
These functions are used to grab the softc from
auto-free and to free the softc back to the correct
malloc type, respectivly.

Discussed with:	jhb
MFC after:	2 weeks
2012-08-15 15:42:57 +00:00