VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system. DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().
MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5782
Previously, the ACPI PCI bus driver did a single pass over the devices in
the namespace that were a child of a given PCI bus to associate the
PCI bus-enumerated device_t devices with the corresponding ACPI handles.
However, this meant that handles were only established at runtime for devices
found during the initial PCI bus scan.
PCI_IOV adds devices that show up after the initial PCI bus scan, and coming
changes to add a bus rescan can also add devices after the initial scan.
This change adds a pci_child_added() callback to the ACPI PCI bus that walks
the namespace to find the ACPI handle for each device that is added. Using
a callback means that the handle is correctly set for any device no matter
how it is added (initial scan, IOV, or a bus rescan).
Instead of providing a wrapper around device_delete_child() that the PCI
bus and child bus drivers must call explicitly, move the bulk of the logic
from pci_delete_child() into a bus_child_deleted() method
(pci_child_deleted()). This allows PCI devices to be safely deleted via
device_delete_child().
- Add a bus_child_deleted method to the ACPI PCI bus which clears the
device_t associated with the corresponding ACPI handle in addition to
the normal PCI bus cleanup.
- Change cardbus_detach_card to call device_delete_children() and move
CardBus-specific delete logic into a new cardbus_child_deleted() method.
- Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
- Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
frees the OpenFirmware device info for each PCI device.
Reviewed by: imp
Tested on: amd64 (CardBus and PCI-e hotplug)
Differential Revision: https://reviews.freebsd.org/D5831
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit. This extends rman's resources to uintmax_t. With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures. 64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead. That being said, uintmax_t was chosen for source
clarity. If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros. Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous)
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
acpi_GetInteger() execution. Intel DMAR interrupt remapping code
needs to know UID of the HPET to properly route the FSB interrupts
from the HPET, even when interrupt remapping is disabled, and the code
is executed under some non-sleepable mutexes.
Cache HPET UIDs in the device softc at the attach time and provide
lock-less method to get UID, use the method from the dmar hpet
handling code instead of calling GetInteger().
Reported and tested by: Larry Rosenman <ler@lerctr.org>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This simplifies checking for default resource range for bus_alloc_resource(),
and improves readability.
This is part of, and related to, the migration of rman_res_t from u_long to
uintmax_t.
Discussed with: jhb
Suggested by: marcel
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
to shut down; close laptop lid" scenario which otherwise tended to end
with a laptop overheating or the battery dying.
The implementation uses a new sysctl, kern.suspend_blocked; init(8) sets
this while rc.suspend runs, and the ACPI sleep code ignores requests while
the sysctl is set.
Discussed on: freebsd-acpi (35 emails)
MFC after: 1 week
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
drivers, one for fdt, one for acpi. It then uses this to decide if it will
use fdt or acpi.
The GICv2 (interrupt controller) and Generic Timer drivers have been
updated to handle both cases.
As this is early code we still need FDT to find the kernel console, and
some parts are still missing, including PCI support.
Differential Revision: https://reviews.freebsd.org/D2463
Reviewed by: jhb, jkim, emaste
Obtained from: ABT Systems Ltd
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
bridges only supported Intel Pentium and Pentium II era processors and there
is no reason for hardware virtualizations to emulate these quirks.
MFC after: 1 week
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states. Support C1 "I/O then halt" mode. See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.
Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.
In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.
Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated. Linux does not rely on the
ACPI to provide correct tables describing Cx modes. Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.
Tested by: pho (previous versions)
Sponsored by: The FreeBSD Foundation
its use in upcoming code.
This is inspired by something in jhb's NUMA IRQ allocation patchset.
However, the tricky bit here is that the PXM lookup for a node may
fail, requiring a lookup on the parent node. So if it doesn't
exist, don't fail - just go up to the parent. Only error out of the
lookup is the ACPI lookup returns an error.
Sponsored by: Norse Corp, Inc.
"Intelб╝ Processor Vendor-Specific ACPI Interface Specification",
issied Dec 2014. Previous revision 005 was from Sep 2006.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
under bootverbose. Every example I've seen to date has been due to
an ACPI system resource device reserving a range that overlaps with
system memory (which ram0 attempts to reserve) or a local or I/O APIC
(which apic0 attempts to reserve). These are always harmless but look
scary to users.
MFC after: 1 week
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Differential Revision: https://reviews.freebsd.org/D76
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
identifies the tested condition for _PRT as "BYTE value of 0", so the
remaining part of the conditionals is sufficient.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
allows the user to request administrative changes to individual devices
such as attach or detaching drivers or disabling and re-enabling devices.
- Add a new /dev/devctl2 character device which uses ioctls for device
requests. The ioctls use a common 'struct devreq' which is somewhat
similar to 'struct ifreq'.
- The ioctls identify the device to operate on via a string. This
string can either by the device's name, or it can be a bus-specific
address. (For unattached devices, a bus address is the only way to
locate a device.) Bus drivers register an eventhandler to claim
unrecognized device names that the driver recognizes as a valid address.
Two buses currently support addresses: ACPI recognizes any device
in the ACPI namespace via its full path starting with "\" and
the PCI bus driver recognizes an address specification of
'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector
strings supported by pciconf).
- To make it easier to cut and paste, change the PnP location string
in the PCI bus driver to output a full PCI selector string rather
than 'slot=<slot> function=<func>'.
- Add a devctl(3) interface in libdevctl which provides a wrapper around
the ioctls and is the preferred interface for other userland code.
- Add a devctl(8) program which is a simple wrapper around the requests
supported by devctl(3).
- Add a device_is_suspended() function to check DF_SUSPENDED.
- Add a resource_unset_value() function that can be used to remove a
hint from the kernel environment. This is used to clear a
hint.<driver>.<unit>.disabled hint when re-enabling a boot-time
disabled device.
Reviewed by: imp (parts)
Requested by: imp (changing PCI location string)
Relnotes: yes
Also, split power_suspend into power_suspend and power_suspend_early.
power_suspend_early is called before the userland is frozen.
power_suspend is called after the userland is frozen.
Currently only VT switching is hooked to power_suspend_early.
This is needed because switching away from X server requires its
cooperation, so obviously X server must not be frozen when that happens.
Freezing userland during ACPI suspend is useful because not all drivers
correctly handle suspension concurrent with other activity. This is
especially applicable to drivers ported from other operating systems
that suspend all software activity between placing drivers and hardware
into suspended state.
In particular drm2/radeon (radeonkms) depends on the described
procedure. The driver does not have any internal synchronization
between suspension activities and processing of userland requests.
Many thanks to kib for the code that allows to freeze and thaw all
userland threads.
Note that ideally we also need to park / inhibit (non-special) kernel
threads as well to ensure that they do not call into drivers.
MFC after: 17 days
accidentally enable non-existent states.
This bug was triggered if ACPI advertises the presence of a C2 state
which we fail to parse via acpi_PkgGas due to our lack of support for
FFixedHW resources, and causes an immediate panic when an attempt is
made to enter the (NULL) state.
One affected platform is the EC2 c4.8xlarge VM instance type; there
may be others.
MFC after: 1 week
Thanks to: jkim, @_msw_
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST). Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.
PR: 192316
Differential Revision: https://reviews.freebsd.org/D1441
No objection from: jkim
MFC after: 1 week
state said device should go into.
This was a snafu introduced in the ACPI/PCI awareness separation.
When putting a device into a power state, the bus (and thus firmware,
eg ACPI) should be asked before hand to check whether the device
can indeed go into that power state.
There's a set of nodes in ACPI under each device - the _SxD nodes - which
state which ACPI power state to put the device into when the system is
going into power save state 'x'. So when going into S3, the existence
of an _S3D node would override whatever the system was trying to do.
By default the PCI code wants to put devices into D3 before suspending.
I have a laptop here (Asus Zenbook - check the PR) whose EHCI controller
really wants to be in D2 during suspend, not D3. So if we put it into
D3 and then try to enter S3, everything hangs. The device itself
can go into D3 - it just can't be there when the call to ACPI to enter
S3 occurs. The PCI patch fixes this.
jkim@ noticed that the same is needed for the ACPI child device
enumeration.
Thankyou to Matt Dillon (the programmer, not the actor) for buying me
this particular laptop so I could debug the issues with the Atheros
AR9485 that is in it. It's his fault that I ended up with this
laptop and was sufficiently annoyed by the lack of USB suspend
to go down this rabbit hole.
Tested:
* Thinkpad T400
* Thinkpad X230
* Thinkpad T42
* Thinkpad T60
* Asus Zenbook (see PR)
* Asus EEEPC 701
* Asus EEEPC 1001PX
TODO:
* Figure out what we should do about devices we unload drivers for
that want to be in a specific state when entering S3 / S4 -
the "put devices into D3 if they're not bound to a driver" option
may also mess with things.
PR: kern/194884
Reviewed by: jhb, jkim
MFC after: 1 week
Relnotes: yes
Sponsored by: Matt Dillon <dillon@apollo.backplane.com> (hardware)
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.
Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.
MFC after: 3 days
Sponsored by: Mellanox Technologies
It had two bugs: one where mmap was still allowed and another where
D_TRACKCLOSE doesn't handle all cases.
Thanks to jhb and kib for pointing them out.
MFC after: 1 week
In some cases, TSC is broken and special applications might benefit
from memory mapping HPET and reading the registers to count time.
Most often the main HPET counter is 32-bit only[1], so this only gives
the application a 300 second window based on the default HPET
interval.
Other applications, such as Intel's DPDK, expect /dev/hpet to be
present and use it to count time as well.
Although we have an almost userland version of gettimeofday() which
uses rdtsc in userland, it's not always possible to use it, depending
on how broken the multi-socket hardware is.
Install the acpi_hpet.h so that applications can use the HPET register
definitions.
[1] I haven't found a system where HPET's main counter uses more than
32 bit. There seems to be a discrepancy in the Intel documentation
(claiming it's a 64-bit counter) and the actual implementation (a
32-bit counter in a 64-bit memory area).
MFC after: 1 week
Relnotes: yes
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe