Commit Graph

2570 Commits

Author SHA1 Message Date
John Baldwin
6273c7420d Set si_addr to badvaddr for TLB faults.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25775
2020-07-23 20:08:42 +00:00
Conrad Meyer
4ae224c663 Revert r240317 to prevent leaking pmap entries
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.

An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers.  However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations.  Other affected arch are less popular: i386, MIPS, and
PowerPC.  Arm64, arm32, and riscv are not affected.

Reported by:	Don Morris <dgmorris AT earthlink.net>
Submitted by:	Don Morris (amd64 part)
Reviewed by:	kib, markj, Don (!amd64 parts)
MFC after:	I don't intend to, but you might want to
Sponsored by:	Dell Isilon
Differential Revision:	https://reviews.freebsd.org/D25689
2020-07-16 23:29:26 +00:00
Mark Johnston
e64080e79c Switch from SCTP to SCTP_SUPPORT in GENERIC configs.
This removes SCTP from in-tree kernel configuration files.  Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25611
2020-07-16 15:09:04 +00:00
Adrian Chadd
0155d8f69d [ar71xx] fix watchdog to work on subsequent SoCs
The AR9341 AHB runs at 225MHz, much faster than the 33MHz of the
AR71xx AHB.  So not only is the math going to do weird things, it
will also wrap rather than being clamped.

So:

* clamp! don't wrap!
* tidy up some debugging
* add an option to throw an NMI rather than reset!

Tested:

* AR9341 SoC (TP-Link TL-WDR4300), patting/not patting the watchdog!
2020-07-15 19:34:19 +00:00
Andrew Turner
518da7ace8 Add dwc_otg_acpi
Create an acpi attachment for the DWC USB OTG device. This is present in
the Raspberry Pi 4 in the USB-C port normally used to power the board. Some
firmware presents the kernel with ACPI tables rather than FDT so we need
an ACPI attachment.

Submitted by:	Greg V <greg_unrelenting.technology>
Approved by:	hselasky (removal of All rights reserved)
Differential Revision:	https://reviews.freebsd.org/D25203
2020-06-30 15:58:29 +00:00
Brandon Bergren
40b664f64b [PowerPC] More relocation fixes
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.

This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.

Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.

This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)

 * Remove the rest of the stoffs hack.
 * Remove my half-baked displace_symbol_table() function.
 * Extend ddb initialization to cope with having a relocation offset on the
   kernel symbol table.
 * Fix my kernel-as-initrd hack to work with booke64 by using a temporary
   mapping to access the data.
 * Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
 * Change the behavior or X_db_symbol_values to apply the relocation base
   when updating valp, to match link_elf_symbol_values() behavior.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D25223
2020-06-21 03:39:26 +00:00
John Baldwin
822d2d6ac9 Various fixes to TLS for MIPS.
- Clear the current thread's TLS pointer on exec. Previously the TLS
  pointer (and register) remain unchanged.

- Explicitly clear the TLS pointer when new threads are created.

- Make md_tls_tcb_offset per-process instead of per-thread.

  The layout of the TLS and TCB are identical for all threads in a
  process, it is only the TLS pointer values themselves that vary by
  thread.  This also makes setting md_tls_tcb_offset in
  cpu_set_user_tls() redundant with the setting in exec_setregs(), so
  only set it in exec_setregs().

Submitted by:	Alfredo Mazzinghi (1)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24957
2020-06-12 21:21:18 +00:00
John Baldwin
1138b87ae6 Add some default cases for unreachable code to silence compiler warnings.
This was caused by r361481 when the buffer type was changed from an
int to an enum.

Reported by:	mjg, rpokala
Sponsored by:	Chelsio Communications
2020-06-10 00:09:31 +00:00
John Baldwin
a3d565a118 Add a crypto capability flag for accelerated software drivers.
Use this in GELI to print out a different message when accelerated
software such as AESNI is used vs plain software crypto.

While here, simplify the logic in GELI a bit for determing which type
of crypto driver was chosen the first time by examining the
capabilities of the matched driver after a single call to
crypto_newsession rather than making separate calls with different
flags.

Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25126
2020-06-09 22:26:07 +00:00
John Baldwin
cea399ec0e Mark padlock(4) and cryptocteon(4) as software drivers.
Both already return the accelerated software priority from
cryptodev_probesession.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25125
2020-06-09 22:19:36 +00:00
Adrian Chadd
5687a376df [mips] fix up the assembly generation of unaligned exception loads
I noticed that unaligned accesses were returning garbage values.

Give test data like this:

char testdata[] = { 0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0xf1, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0x5a };

Iterating through uint32_t space 1 byte at a time should
look like this:

freebsd-carambola2:/mnt# ./test
Hello, world!
offset 0 pointer 0x410b00 value 0x12345678 0x12345678
offset 1 pointer 0x410b01 value 0x3456789a 0x3456789a
offset 2 pointer 0x410b02 value 0x56789abc 0x56789abc
offset 3 pointer 0x410b03 value 0x789abcde 0x789abcde
offset 4 pointer 0x410b04 value 0x9abcdef1 0x9abcdef1
offset 5 pointer 0x410b05 value 0xbcdef123 0xbcdef123
offset 6 pointer 0x410b06 value 0xdef12345 0xdef12345
offset 7 pointer 0x410b07 value 0xf1234567 0xf1234567

.. but to begin with it looked like this:

offset 0 value 0x12345678
offset 1 value 0x00410a9a
offset 2 value 0x00419abc
offset 3 value 0x009abcde
offset 4 value 0x9abcdef1
offset 5 value 0x00410a23
offset 6 value 0x00412345
offset 7 value 0x00234567

The amusing reason? The compiler is generating the lwr/lwl incorrectly.
Here's an example after I tried to replace the two macros with a single
invocation and offset, rather than having the compiler compile in addiu
to s3 - but the bug is the same:

1044: 8a620003 lwl v0,0(s3)
1048: 9a730000 lwr s3,3(s3)

.. which is just totally trashy and wrong.

This explicitly tells the compiler to treat the output as being read
and written to, which is what lwl/lwr does with the destination
register.

I think a subsequent commit should unify these macros to skip an addiu,
but that can be a later commit.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D25040
2020-05-29 00:05:43 +00:00
John Baldwin
b04748bef2 Update cryptocteon(4) and nlmsec(4) for changes in r361481.
This does not add support for separate output buffers but updates the
drivers to cope with the changes.

Pointy hat to:	jhb
2020-05-25 23:49:46 +00:00
Conrad Meyer
852c303b61 copystr(9): Move to deprecate (attempt #2)
This reapplies logical r360944 and r360946 (reverting r360955), with fixed
copystr() stand-in replacement macro.  Eventually the goal is to convert
consumers and kill the macro, but for a first step it helps if the macro is
correct.

Prior commit message:

Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy (with correction from brooks@ -- thanks).

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by:		jhb (earlier version)
Discussed with:		brooks (thanks!)
Differential Revision:	https://reviews.freebsd.org/D24672
2020-05-25 16:40:48 +00:00
John Baldwin
2aa1dc7e3b Print CPU informtion later in boot.
Match other architectures and print CPU information during
cpu_startup().  In particular, this prints the information after the
message buffer is initialized which allows it to be retrieved after
boot via dmesg(8).

While here, add some extern declarations to <machine/md_var.h> in
place of duplicated declarations in various source files.

Reviewed by:	brooks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24936
2020-05-20 21:16:54 +00:00
John Baldwin
6adcdf6577 Simplify hot-patching cpu_switch() for lack of UserLocal register.
Rather than walking all of cpu_switch looking for the sequence of
instructions to patch, add a global label at the location that needs
the patch applied.

Reviewed by:	brooks, Alfredo Mazzinghi <alfredo.mazzinghi_cl.cam.ac.uk>
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24931
2020-05-20 21:15:43 +00:00
John Baldwin
871c7dd2aa Merge freebsd32_exec_setregs() into exec_setregs() on MIPS.
The stack pointer was being decremented by 64k twice previously.

Reviewed by:	brooks
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24930
2020-05-20 19:51:39 +00:00
Conrad Meyer
051fc58cb3 Revert r360944 and r360946 until reported issues can be resolved
Reported by:	cy
2020-05-12 04:34:26 +00:00
Conrad Meyer
580744621f copystr(9): Move to deprecate [2/2]
Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy.

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D24672
2020-05-11 22:57:21 +00:00
John Baldwin
63823cac92 Remove MD5 HMAC from OCF.
There are no in-kernel consumers.

Reviewed by:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24775
2020-05-11 22:08:08 +00:00
John Baldwin
0e00c709d7 Remove support for DES and Triple DES from OCF.
It no longer has any in-kernel consumers via OCF.  smbfs still uses
single DES directly, so sys/crypto/des remains for that use case.

Reviewed by:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24773
2020-05-11 21:34:29 +00:00
John Baldwin
33fb013e16 Remove support for the ARC4 algorithm from OCF.
There are no longer any in-kernel consumers.  The software
implementation was also a non-functional stub.

Reviewed by:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24771
2020-05-11 21:17:08 +00:00
John Baldwin
9b5631807e Remove incomplete support for plain MD5 from OCF.
Although a few drivers supported this algorithm, there were never any
in-kernel consumers.  cryptosoft and cryptodev never supported it,
and there was not a software xform auth_hash for it.

Reviewed by:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24767
2020-05-11 20:40:30 +00:00
Adrian Chadd
b8aa77b74d [atheros] [if_arge] Various fixes to avoid TX stalls and bad sized packets
This is stuff I've been running for a couple years.  It's inspired by changes
I found in the linux ag71xx ethernet driver.

* Delay between stopping DMA and checking to see if it's stopped; this gives
  the hardware time to do its thing.

* Non-final frames in the chain need to be a multiple of 4 bytes in size.
  Ensure this is the case when assembling a TX DMA list.

* Add counters for tx/rx underflow and too-short packets.

* Log if TX/RX DMA couldn't be stopped when resetting the MAC.

* Add some more debugging / logging around TX/RX ring bits.

Tested:

* AR7240, AR7241
* AR9344 (TL-WDR3600/TL-WDR4300 APs)
* AR9331 (Carambola 2)
2020-05-10 03:36:11 +00:00
Mark Johnston
06459adc89 Fix a race in pmap_emulate_modified().
pmap_emulate_modify() was assuming that no changes to the pmap could
take place between the TLB signaling the fault and
pmap_emulate_modify()'s acquisition of the pmap lock, but that's clearly
not even true in the uniprocessor case, nevermind the SMP case.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24523
2020-04-24 21:21:49 +00:00
Mark Johnston
117d10f3c1 Fix a race between _pmap_unwire_ptp() and MipsDoTLBMiss().
MipsDoTLBMiss() will load a segmap entry or pde, check that it isn't
zero, and then chase that pointer to a physical page. If that page has
been freed in the interim, it will read garbage and go on to populate
the TLB with it.

This can happen because pmap_unwire_ptp zeros out the pde and
vm_page_free_zero()s the ptp (or, recursively, zeros out the segmap
entry and vm_page_free_zero()s the pdp) without interlocking against
MipsDoTLBMiss(). The pmap is locked, and pvh_global_lock may or may not
be held, but this is not enough. Solve this issue by inserting TLB
shootdowns within _pmap_unwire_ptp(); as MipsDoTLBMiss() runs with IRQs
deferred, the IPIs involved in TLB shootdown are sufficient to ensure
that MipsDoTLBMiss() sees either a zero segmap entry / pde or a non-zero
entry and the pointed-to page still not freed.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24491
2020-04-24 21:21:23 +00:00
John Baldwin
5c4309b474 Handle non-dtrace-triggered kernel breakpoint traps in mips.
If DTRACE is enabled at compile time, all kernel breakpoint traps are
first given to dtrace to see if they are triggered by a FBT probe.
Previously if dtrace didn't recognize the trap, it was silently
ignored breaking the handling of other kernel breakpoint traps such as
the debug.kdb.enter sysctl.  This only returns early from the trap
handler if dtrace recognizes the trap and handles it.

Submitted by:	Nicolò Mazzucato <nicomazz97@gmail.com>
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D24478
2020-04-21 17:38:07 +00:00
John Baldwin
29fe41ddd7 Retire the CRYPTO_F_IV_GENERATE flag.
The sole in-tree user of this flag has been retired, so remove this
complexity from all drivers.  While here, add a helper routine drivers
can use to read the current request's IV into a local buffer.  Use
this routine to replace duplicated code in nearly all drivers.

Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24450
2020-04-20 22:24:49 +00:00
John Baldwin
490befd40a Use the right type for 64-bit coprocessor registers.
The use of "int" here caused the compiler to believe that it needs to
insert a "sll $n, $n, 0" to sign extend as part of the implicit cast
to uint64_t.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	brooks, arichardson
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24457
2020-04-17 18:24:47 +00:00
John Baldwin
fdbb2410cb Add 'gpio' since mmc now requires gpio_if.h. 2020-04-16 20:45:54 +00:00
Brooks Davis
d0fa673e4d Don't directly access userspace memory.
Rather then using the racy useracc() followed by direct access to
userspace memory, perform a copyin() and use the result if it succeeds.

Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D24410
2020-04-15 16:33:27 +00:00
John Baldwin
59838c1a19 Retire procfs-based process debugging.
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging.  ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code.  This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR:		244939 (exp-run)
Reviewed by:	kib, mjg (earlier version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D23837
2020-04-01 19:22:09 +00:00
Ravi Pokala
144af011b4 Fix build for mips.XLP64 kernel, by re-ordering headers
The log for the failure contained errors like this:

| In file included from ${SRCTOP}/sys/mips/nlm/dev/net/xlpge.c:34:
| In file included from ${SRCTOP}/sys/sys/systm.h:44:
| In file included from ./machine/atomic.h:849:
| ${SRCTOP}/sys/sys/_atomic_subword.h:222:37: error: unknown type name 'u_long'; did you mean 'long'?
| atomic_testandset_acq_long(volatile u_long *p, u_int v)
|                                     ^~~~~~
|                                     long

And similar "unknown type name" errors for u_int, not recognizing bool as a type, etc.

This was caused by including <sys/param.h> too far down; move it up where it belongs.

While here, add a blank line after '__FBSDID()', in keeping with convention.

Reviewed by:	emaste
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D24242
2020-03-31 20:09:20 +00:00
John Baldwin
c034143269 Refactor driver and consumer interfaces for OCF (in-kernel crypto).
- The linked list of cryptoini structures used in session
  initialization is replaced with a new flat structure: struct
  crypto_session_params.  This session includes a new mode to define
  how the other fields should be interpreted.  Available modes
  include:

  - COMPRESS (for compression/decompression)
  - CIPHER (for simply encryption/decryption)
  - DIGEST (computing and verifying digests)
  - AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
  - ETA (combined auth and encryption using encrypt-then-authenticate)

  Additional modes could be added in the future (e.g. if we wanted to
  support TLS MtE for AES-CBC in the kernel we could add a new mode
  for that.  TLS modes might also affect how AAD is interpreted, etc.)

  The flat structure also includes the key lengths and algorithms as
  before.  However, code doesn't have to walk the linked list and
  switch on the algorithm to determine which key is the auth key vs
  encryption key.  The 'csp_auth_*' fields are always used for auth
  keys and settings and 'csp_cipher_*' for cipher.  (Compression
  algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms.  This
  doesn't quite work when you factor in modes (e.g. a driver might
  support both AES-CBC and SHA2-256-HMAC separately but not combined
  for ETA).  Instead, a new 'crypto_probesession' method has been
  added to the kobj interface for symmteric crypto drivers.  This
  method returns a negative value on success (similar to how
  device_probe works) and the crypto framework uses this value to pick
  the "best" driver.  There are three constants for hardware
  (e.g. ccr), accelerated software (e.g. aesni), and plain software
  (cryptosoft) that give preference in that order.  One effect of this
  is that if you request only hardware when creating a new session,
  you will no longer get a session using accelerated software.
  Another effect is that the default setting to disallow software
  crypto via /dev/crypto now disables accelerated software.

  Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
  structure.  The linked list of descriptors has been removed.

  A separate enum has been added to describe the type of data buffer
  in use instead of using CRYPTO_F_* flags to make it easier to add
  more types in the future if needed (e.g. wired userspace buffers for
  zero-copy).  It will also make it easier to re-introduce separate
  input and output buffers (in-kernel TLS would benefit from this).

  Try to make the flags related to IV handling less insane:

  - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
    member of the operation structure.  If this flag is not set, the
    IV is stored in the data buffer at the 'crp_iv_start' offset.

  - CRYPTO_F_IV_GENERATE means that a random IV should be generated
    and stored into the data buffer.  This cannot be used with
    CRYPTO_F_IV_SEPARATE.

  If a consumer wants to deal with explicit vs implicit IVs, etc. it
  can always generate the IV however it needs and store partial IVs in
  the buffer and the full IV/nonce in crp_iv and set
  CRYPTO_F_IV_SEPARATE.

  The layout of the buffer is now described via fields in cryptop.
  crp_aad_start and crp_aad_length define the boundaries of any AAD.
  Previously with GCM and CCM you defined an auth crd with this range,
  but for ETA your auth crd had to span both the AAD and plaintext
  (and they had to be adjacent).

  crp_payload_start and crp_payload_length define the boundaries of
  the plaintext/ciphertext.  Modes that only do a single operation
  (COMPRESS, CIPHER, DIGEST) should only use this region and leave the
  AAD region empty.

  If a digest is present (or should be generated), it's starting
  location is marked by crp_digest_start.

  Instead of using the CRD_F_ENCRYPT flag to determine the direction
  of the operation, cryptop now includes an 'op' field defining the
  operation to perform.  For digests I've added a new VERIFY digest
  mode which assumes a digest is present in the input and fails the
  request with EBADMSG if it doesn't match the internally-computed
  digest.  GCM and CCM already assumed this, and the new AEAD mode
  requires this for decryption.  The new ETA mode now also requires
  this for decryption, so IPsec and GELI no longer do their own
  authentication verification.  Simple DIGEST operations can also do
  this, though there are no in-tree consumers.

  To eventually support some refcounting to close races, the session
  cookie is now passed to crypto_getop() and clients should no longer
  set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
  crypto_getkreq() and freed via crypto_freekreq().  This permits the
  crypto layer to track open asym requests and close races with a
  driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
  crypto_contiguous_subsegment now accept the 'crp' object as the
  first parameter instead of individual members.  This makes it easier
  to deal with different buffer types in the future as well as
  separate input and output buffers.  It's also simpler for driver
  writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
  This understands the various types of buffers so that drivers that
  use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
  and OPAD.  This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
  device drivers.  However, session key buffers provided when a session
  is created are expected to remain alive for the duration of the
  session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
  key.  The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
  callback now invokes a function pointer in the session.  This
  function pointer is set based on the mode (in effect) though it
  simplifies a few edge cases that would otherwise be in the switch in
  'process'.

  It does split up GCM vs CCM which I think is more readable even if there
  is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
  as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
  mode.  The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
  This was actually documented as being true in crypto(4) before, but
  the code had not implemented this before I added the CIPHER_FIRST
  flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
  sessions.  I will probably do that at some point in the future as well
  as teach it about IV/nonce and tag lengths for AEAD so we can support
  all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
  of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
  that they compile, but I have not tested all of them.  I have tested
  the following drivers:

  - cryptosoft
  - aesni (AES only)
  - blake2
  - ccr

  and the following consumers:

  - cryptodev
  - IPsec
  - ktls_ocf
  - GELI (lightly)

  I have not tested the following:

  - ccp
  - aesni with sha
  - hifn
  - kgssapi_krb5
  - ubsec
  - padlock
  - safe
  - armv8_crypto (aarch64)
  - glxsb (i386)
  - sec (ppc)
  - cesa (armv7)
  - cryptocteon (mips64)
  - nlmsec (mips64)

Discussed with:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
Alex Richardson
c5247ac505 Use a GCC-compatile compiler flag in files.xlp
Seems like GCC doesn't like -Qunused-arguments, so use
-Wno-unused-command-line-argument, which is supported by both GCC and Clang.
2020-03-22 22:18:06 +00:00
Brandon Bergren
3069380898 [PowerPC][Book-E] Fix missing load base in elf_cpu_parse_dynamic().
When I implemented MD DYNAMIC parsing, I was originally passing a
linker_file_t so that the MD code could relocate pointers.

However, it turns out this isn't even filled in until later, so it was
always 0.

Just pass the load base (ef->address) directly, as that's really the only
thing we were interested in in the first place.

This fixes a crash on RB800 where it was trying to write to an unmapped
address when updating the GOT.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D24105
2020-03-18 02:58:18 +00:00
Alex Richardson
e40375374c Fix build of XLP MIPS kernel with clang
Clang does not recognize some of the GCC optimization options that are
used to compile ucore_app.bin. This is required to switch MIPS to
compile with LLVM by default (D23204).

Reviewed By:	imp
Differential Revision: https://reviews.freebsd.org/D24092
2020-03-17 11:59:40 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
e87c494015 Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by:		hselasky, jeff, bz, gallatin
Differential Revision:	https://reviews.freebsd.org/D23674
2020-02-24 21:07:30 +00:00
Kyle Evans
1881ae23a3 mips: fix kernel build after r357804
Drop the padding down the size of a single uintptr_t to account for
pc_zpcpu_offset
2020-02-14 20:25:04 +00:00
Ruslan Bukin
d987842d1e Enter the network epoch in the xdma interrupt handler if required
by a peripheral device driver.

Sponsored by:	DARPA, AFRL
2020-02-08 23:07:29 +00:00
Ed Maste
fb2644971d beri: correct kernel printf typo
(From review D23453)

Submitted by:	Gordon Bergling <gbergling_gmail.com>
Reviewed by:	rwatson
2020-02-05 19:15:36 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Gleb Smirnoff
0921628ddc Introduce flag IFF_NEEDSEPOCH that marks Ethernet interfaces that
supposedly may call into ether_input() without network epoch.

They all need to be reviewed before 13.0-RELEASE.  Some may need
be fixed.  The flag is not planned to be used in the kernel for
a long time.
2020-01-23 01:41:09 +00:00
John Baldwin
d0cacf5d12 Preserve the inherited value of the status register in cpu_set_upcall().
Instead of re-deriving the value of SR using logic similar to
exec_set_regs(), just inherit the value from the existing thread
similar to fork().

Reviewed by:	brooks
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23059
2020-01-14 18:00:04 +00:00
John Baldwin
9f669cf0a2 Simplify arguments to signal handlers on mips.
- Use ksi_addr directly as si_addr in the siginfo instead of the
  'badvaddr' register.
- Remove a duplicate assignment of si_code.
- Use ksi_addr as the 4th argument to the old-style handler instead of
  'badvaddr'.

Reviewed by:	brooks, kevans
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23013
2020-01-06 18:02:02 +00:00
Brandon Bergren
9aafc7c052 [PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.

This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.

The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.

Submitted by:	jhibbits (original patch), kevans (mips bits)
Reviewed by:	jhibbits, jeff, kevans
Differential Revision:	https://reviews.freebsd.org/D22976
2020-01-02 23:20:37 +00:00
Kyle Evans
06b367b2b6 sc(4) md bits: stop setting sc->kbd entirely
The machdep parts no longer need to touch keyboard parts after r356043;
sc->kbd will be 0-initialized and this works as expected.
2019-12-30 02:07:55 +00:00
Adrian Chadd
863dc8aff0 [ar71xx] generate a random mac address using eth_gen_addr()
This removes a hard-coded random mac address generator and
uses the (not so) new system routine.

Tested:

* TP-Link WDR-4300 (AR934x + AR9580)
2019-12-28 06:56:21 +00:00
Brandon Bergren
38f69a619e Unbreak build. It seems that mips and amd64 still pull in link_elf.c, so
we need to have elf_cpu_parse_dynamic() everywhere after all to avoid
an undefined symbol.
2019-12-24 16:52:10 +00:00
Scott Long
757d4fbaa7 Introduce the concept of busdma tag templates. A template can be allocated
off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created.  See the man page for
details on how to initialize a template.

Templates do not support tag filters.  Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine.  Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D22906
2019-12-24 14:48:46 +00:00
Kyle Evans
117deb3fc4 sc: fix arm/mips/sparc64 MD bits
r356043 missed a couple of references in machdep parts... arguably, these
lines could probably be dropped as the softc is likely still zero'd at this
point.

Pointy hat:	kevans
2019-12-23 21:41:04 +00:00
Warner Losh
fa9b4635f0 Two minor issues:
(1) Don't define load/store 64 atomics for o32. They aren't atomic
there.
(2) Add comment about why we need 64 atomic define on n32 only.
2019-12-17 03:20:37 +00:00
Adrian Chadd
240e4eebaf [atheros] [mips] Add the GPIO driver (back) to the TL-WDR3600/TL-WDR4300 kernel.
So it turns out that sometime in the past I removed the GPIO bits here
and was going to move it into a module in order to save a little space.
However, it turns out that was a mistake on this particular AP - it
uses a pair of GPIO lines to control the two receive LNAs on the 2GHz
radio and without them enabled the radio is a LOT DEAF.

With this re-introduced (and some replacement userland tools to save
space, *cough* cpio/libarchive) I can actually use these chipsets
again as a 2G station.  Without the LNA the AP was seeing a per-radio
RSSI upstairs here of around 3-5dB, with the LNA on it's around 15dB,
more than enough to actually use wifi upstairs and also in line with
the other Atheros / Intel devices I have up here.

Big oopsie to Adrian.  Big, big oopsie.
2019-12-17 00:00:03 +00:00
Jeff Roberson
a94ba188c3 Repeat the spinlock_enter/exit pattern from amd64 on other architectures to
fix an assert violation introduced in r355784.  Without this spinlock_exit()
may see owepreempt and switch before reducing the spinlock count.  amd64
had been optimized to do a single critical enter/exit regardless of the
number of spinlocks which avoided the problem and this optimization had
not been applied elsewhere.

Reported by:	emaste
Suggested by:	rlibby
Discussed with:	jhb, rlibby
Tested by:	manu (arm64)
2019-12-16 20:15:04 +00:00
Jeff Roberson
686bcb5c14 schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
Jeff Roberson
61a74c5ccd schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
Warner Losh
f86e60008b Regularize my copyright notice
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
  All Rights Reserved on same line as other copyright holders (but not
  me). Other such holders are also listed last where it's clear.
2019-12-04 16:56:11 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Emmanuel Vadot
357145a0ce Remove "all rights reserved" from copyright for the file that Jared McNeill
own. He gave me permission to do this.
2019-12-03 21:05:33 +00:00
Ryan Libby
aa4e741821 mips busdma: bzero map on alloc
Maps from the mips busdma dmamap_zone were not completely initialized.
In particular, pagesneeded and pagesreserved were not initialized.  This
could cause a crash.

Remove some dead fields from mips struct bus_dmamap while here.

Reported by:	brooks
Reviewed by:	ian
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22638
2019-12-03 17:43:52 +00:00
Warner Losh
5bebf8b402 Remove two obsolete comments that reference splhigh/splx. 2019-11-21 18:49:54 +00:00
John Baldwin
6b51bdf38c Combine ELF sysvecs for MIPS to reduce code duplication.
Reviewed by:	brooks, kevans
Tested on:	mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22357
2019-11-15 19:00:20 +00:00
John Baldwin
e353233118 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
Brooks Davis
301b49d2e5 Fix a typo in the PMAP_PTE_SET_CACHE_BITS macro.
The second argument should have been "pa" not "ps".  It worked by
accident because the argument was always "pa" which was an in-scope
local variable.

Submitted by:	sson
Reviewed by:	jhb, kevans
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22338
2019-11-13 18:10:42 +00:00
Brooks Davis
aaa3243fa0 Remove an outdated assertion.
The exclusive lock assertion became incorrect due to changes in lock
scope in r354155.

Discussed with:	jeffr
Sponsored by:	DARPA, AFRL
2019-11-04 21:06:06 +00:00
Mark Johnston
01cef4caa7 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
John Baldwin
a2b282151f Use -march=octeon+ for OCTEON1.
External binutils requires octeon+ for saa.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D22033
2019-10-15 17:28:26 +00:00
John Baldwin
6ed6d6931a Fix a write-only variable warning from external GCC.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D22032
2019-10-15 17:17:16 +00:00
John Baldwin
017128c99b Don't set the OUTPUT_FORMAT explicitly but let ld derive it.
This fixes an error with modern ld.bfd and is inline with the changes in
r215251 and r217612.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D22031
2019-10-15 17:14:30 +00:00
John Baldwin
64f1604a76 Update MIPS kernel builds to work with mips-gcc.
- Use a default -march of mips64 on N64 and N32 kernels.
- Set the endianness (via MIPS_ENDIAN) and ABI (via MIPS_ABI) in
  CFLAGS from MACHINE_ARCH.  ARCH_FLAGS now only sets a different
  -march value if needed.
- TRAMP_ARCH_FLAGS inherits MIPS_ENDIAN from MACHINE_ARCH but does
  not set the ABI since XLPN32 needs an N64 ABI for the trampoline
  loader.  When TRAMP_ARCH_FLAGS is used it must set both -march
  and -mabi.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D22030
2019-10-15 17:11:42 +00:00
Jeff Roberson
638f867814 (6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21596
2019-10-15 03:51:46 +00:00
Jeff Roberson
205be21d99 (3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21592
2019-10-15 03:41:36 +00:00
Andriy Gapon
eab7984cfe add atomic_load_64 for mipsn32
It's just an alias for atomic_load_acq_64 (same as on i386).

MFC after:	1 week
2019-10-07 07:42:26 +00:00
Kyle Evans
281ec62c97 mips: use generic sub-word atomic *cmpset
Most of this diff is refactoring to reduce duplication between the different
acq_ and rel_ variants.

Differential Revision:	https://reviews.freebsd.org/D21822
2019-10-02 17:07:59 +00:00
Kyle Evans
22c2c971a6 mips: fcmpset: do not spin on sc failure
For ll/sc architectures, atomic(9) allows failure modes where *old == val
due to write failure and callers should compensate for this. Do not retry on
failure, just leave 0 in ret and fail the operation if we couldn't sc it.
This lets the caller determine if it should retry or not.

Reviewed by:	kib
Looks ok:	imp
Differential Revision:	https://reviews.freebsd.org/D21836
2019-10-02 15:13:40 +00:00
Konstantin Belousov
df08823d07 Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap().  MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated.  According to POSIX description for
mmap(2):
   The system shall always zero-fill any partial page at the end of an
   object. Further, the system shall never write out any modified
   portions of the last page of an object which are beyond its
   end. References within the address range starting at pa and
   continuing for len bytes to whole pages following the end of an
   object shall result in delivery of a SIGBUS signal.

   An implementation may generate SIGBUS signals when a reference
   would cause an error in the mapped object, such as out-of-space
   condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered.  Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault().  Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos.  Removed
unneeded truncations of the fault addresses reported by hardware.

PR:	211924
Reviewed by:	alc
Discussed with:	jilles, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21566
2019-09-27 18:43:36 +00:00
Mark Johnston
b119329d81 Complete the removal of the "wire_count" field from struct vm_page.
Convert all remaining references to that field to "ref_count" and update
comments accordingly.  No functional change intended.

Reviewed by:	alc, kib
Sponsored by:	Intel, Netflix
Differential Revision:	https://reviews.freebsd.org/D21768
2019-09-25 16:11:35 +00:00
Kyle Evans
2e6a21bbd8 mips: fix XLPN32 after r352434
SYSINIT usage was added, but the <sys/kernel.h> dependency was not added.
This worked by coincidence, as most of the mips configs have DDB enabled and
pmap.c gets <sys/kernel.h> via ddb.h pollution.

Reported by:	dim
2019-09-23 12:43:08 +00:00
Kyle Evans
2946ed83c0 octeon1: suppress a couple of warnings under clang
These appear in octeon-sdk -- there are new releases, but they don't seem to
address the running issues in octeon-sdk. GCC4.2 is more than happy, but
clang is much less-so and most of them are fairly innocuous and perhaps a
by-product of their style guide, which may make some of the changes harder
to upstream (if this is even possible anymore).
2019-09-22 18:30:19 +00:00
Jason A. Harmening
9d45af5c09 mips: move support for temporary mappings above KSEG0 to per-CPU data
This is derived from similar work done in r310481 for i386 and r312610 for
armv6/armv7. Additionally, use a critical section to keep the thread
pinned for per-CPU operations instead of completely disabling local interrupts.

No objections from:	adrian, jmallett, imp
Differential Revision: 	https://reviews.freebsd.org/D18593
2019-09-17 03:39:31 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Kyle Evans
4b3b82a756 mips: fix some mcount nits
The symbol version for _mcount was removed 12 years ago in r169525 from
gmon/Symbol.map, to be added to the per-arch Symbol.map. mips was overlooked
in this, so _mcount has no symver. Add it back to where it should have been,
rather than where it would go if it were added today, since we're correcting
a historical mistake.

Additionally, _mcount is getting thrown into .mdebug.abi32 in the llvm80/90
world as it's not getting explicitly thrown into .text, so do this now. This
fixes the libc build that was previously failing due to relocations in
.mdebug.abi32. This is specifically due to the way clang's integrated AS
works and that they emit the .mdebug.abiNN section early in the process. An
LLVM bug has been submitted[0] and an agreement has been made that the
mips backend should switch to .text following .mdebug.abiNN for
compatibility.

[0] https://bugs.llvm.org/show_bug.cgi?id=43119

Reviewed by:	imp, arichardson
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21435
2019-09-02 01:55:55 +00:00
Konstantin Belousov
a2a0f90654 Centralize __pcpu definitions.
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources.  The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu.  There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.

To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source.  Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.

Reported and tested by:	lwhsu, bcran
Reviewed by:	imp, markj (previous version)
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21418
2019-08-29 07:25:27 +00:00
Kyle Evans
e21f96a811 mips: hide regnum definitions behind _KERNEL/_WANT_MIPS_REGNUM
machine/regnum.h ends up being included by sys/procfs.h and sys/ptrace.h via
machine/reg.h. Many of the regnum definitions are too short and too generic
to be exposing to any userland application including one of these two
headers. Moreover, these actively cause build failures in googletest
(template <typename T1 ...> expanding to template <typename 9 ...>).

Hide the definitions behind _KERNEL or _WANT_MIPS_REGNUM, and patch all of
the userland consumers to define as needed.

Discussed with:	imp, jhb
Reviewed by:	imp, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21330
2019-08-22 21:43:21 +00:00
Kyle Evans
7d7fb3dc01 mips: avoid empty mdproc struct
Compiling with a more modern toolchain than GCC 4.2 in base warns about the
empty struct. Take a hint and comment from r350902+r350953 by luporl@.
2019-08-19 18:15:17 +00:00
Jeff Roberson
2194393787 Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by:	jhb, markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21250
2019-08-16 00:45:14 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Ruslan Bukin
951e058411 o Add support for BERI IOMMU device
o Add an experimental IOMMU support to xDMA framework

The BERI IOMMU device is the part of CHERI device-model project [1]. It
translates memory addresses for various BERI peripherals modelled in
software. It accepts FreeBSD/mips64 page directories format and manages
BERI TLB.

1. https://github.com/CTSRD-CHERI/device-model

Sponsored by:	DARPA, AFRL
2019-07-22 16:01:20 +00:00
John Baldwin
c18ca74916 Don't pass error from syscallenter() to syscallret().
syscallret() doesn't use error anymore.  Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D2090
2019-07-15 21:25:16 +00:00
Konstantin Belousov
30b3018d48 Provide protection against starvation of the ll/sc loops when accessing userpace.
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely.  Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison.  The primitive no longer retries
the operation if it failed spuriously.  Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by:	andrew (arm64, previous version), markj
Tested by:	pho
Reported by:	https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20772
2019-07-12 18:43:24 +00:00
Warner Losh
e787ccc3f9 Fix compile errors with the CI20
Fix mutex includes and fix a typo. The CI20 kernel is not built as
part of universe.

PR: 239115
Submitted by: Kai Nacke
2019-07-10 17:21:59 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Warner Losh
4924bcd36e Implement missing MMCBR ivars
All MMCBR bridges have to implement all the MMCBR variables. This
implements them for everybody that currently doesn't.

A common routine for this should be written.
2019-07-04 14:15:04 +00:00
Navdeep Parhar
57f317e60a Display the approximate space needed when a minidump fails due to lack
of space.

Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20801
2019-06-30 03:14:04 +00:00
Ruslan Bukin
2593e9dcb2 Add support for extended descriptor format to Altera mSGDMA driver.
The format to use depends on hardware configuration (synthesis-time),
so make it compile-time kernel option.

Extended format allows DMA engine to operate with 64-bit memory addresses.

Sponsored by:	DARPA, AFRL
2019-06-27 18:08:18 +00:00
Conrad Meyer
c363b16c63 sys: Remove DEV_RANDOM device option
Remove 'device random' from kernel configurations that reference it (most).
Replace perhaps mistaken 'nodevice random' in two MIPS configs with 'options
RANDOM_LOADABLE' instead.  Document removal in UPDATING; update NOTES and
random.4.

Reviewed by:	delphij, markm (previous version)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19918
2019-06-21 00:16:30 +00:00
Mark Johnston
88ea538a98 Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone.  But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place.  No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20470
2019-06-07 18:23:29 +00:00