the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers. This
change breaks compatibility with the previous encoding (which was only
used in -current).
Fix truncation to (essentially) 16-bit dev_t in newnfs v3.
Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility. Extra bits give the much
larger complication that the translations need to compress into fewer
bits. Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.
The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings. E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2. But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102. ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.
The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them. I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current. E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.
I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation. So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set. Add some error checking for when bits are lost. Keep not
doing any error checking for translations for almost everything in
compat/linux.
compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.
compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.
fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself. Also fix nearby style bugs.
kern/vfs_syscalls.c:
As for freebsd32. Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.
sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros. Describe the encoding and
some of the reasons for it. 16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding. My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits. This minimizes discontiguities.
Reviewed by: kib (except for rewrite of the comment in linux_stats.c)
allocation, I could identify that actually we use this pointer on pci_emul.c as
well as on vga.c source file.
I have reworked the logic here to make it more readable and also add a warn to
explicit show the function where the memory allocation error could happen,
also sort headers.
Also CID 1194192 was marked as "Intentional".
Obtained from: TrueOS
MFC after: 4 weeks.
Sponsored by: iXsystems Inc.
- remove __pure annotations I added earlier for some functions. One
writes to the the arguments as "out" pointers. The
other reads from an array, which while const within the function might
be mutated externally.
- total_change is modified to be at 1, if previously 0, so no if check
is needed.
of needed interface when many gre interfaces are present.
Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
This fixes the race when first core sets up the pagetables, while
secondary cores do translating the address of __riscv_boot_ap.
This now allows us to smpboot in QEMU with 8 cores just fine.
Sponsored by: DARPA, AFRL
Continue my parade on introspection tools by fixing:
- failed to check for null after reallocf
- avoid the comma operator
- mark usage as dead
- correct size of len
of 64-bit dev_t's (but not ones involving dev_t's).
st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.
Change freebsd32's copy_ostat() to be no worse than cvtstat(). It was
missing clamping and bzero()ing of padding.
Reviewed by: kib (except a final fix of the clamp to the signed maximum)
Some casts from pointers to uint64_t and back in lio_main.c cause base
gcc on i386 to warn "cast from pointer to integer of different size",
and vice versa. Add additional casts to uintptr_t to suppress these.
Reviewed by: sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D15754
When hash table lookups are not serialized with in_pcbfree it will be
possible for callers to find an inpcb that has been marked free. We
need to check for this and return NULL.
- initialize all maybe uninitialized vars with bogus values. This shuts
up the compiler, and causes crashes if it changes later.
- mark noreturn as noreturn
- removed unused macro
- handle x_procstate as runtime rather than pre-processor
- avoid using void functions in condtionals
Tested with clang, gcc 7, gcc 9
without this and running vnets with a TCP stack that uses
some of the features is a recipe for panic (without this commit).
Reported by: Larry Rosenman
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15757
Deferring the actual free of the inpcb until after a grace
period has elapsed will allow us to convert the inpcbinfo
info and hash read locks to epoch.
Reviewed by: gallatin, jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15510
Generating the pnp info have the side effect to include all nodes even
if the status isn't "okay".
That means that loading the module will load but not attach as it checks
the status in the probe function.
On pine64 before :
root@pine64-lts:~ # devmatch -u
unattached on ofwbus pnpinfo name=memory
unattached on ofwbus pnpinfo name=chosen
unattached on ofwbus pnpinfo name=sound_spdif compat=simple-audio-card
unattached on ofwbus pnpinfo name=spdif-out compat=linux,spdif-dit
unattached on simplebus pnpinfo name=dma-controller@1c02000 compat=allwinner,sun50i-a64-dma
unattached on simplebus pnpinfo name=mmc@1c10000 compat=allwinner,sun50i-a64-mmc
unattached on simplebus pnpinfo name=usb@1c19000 compat=allwinner,sun8i-a33-musb
unattached on simplebus pnpinfo name=spdif@1c21000 compat=allwinner,sun50i-a64-spdif
unattached on simplebus pnpinfo name=i2s@1c22000 compat=allwinner,sun50i-a64-i2s
unattached on simplebus pnpinfo name=i2s@1c22400 compat=allwinner,sun50i-a64-i2s
unattached on simplebus pnpinfo name=serial@1c28400 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c28800 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c28c00 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=serial@1c29000 compat=snps,dw-apb-uart
unattached on simplebus pnpinfo name=i2c@1c2ac00 compat=allwinner,sun6i-a31-i2c
unattached on simplebus pnpinfo name=i2c@1c2b000 compat=allwinner,sun6i-a31-i2c
unattached on simplebus pnpinfo name=i2c@1c2b400 compat=allwinner,sun6i-a31-i2c
unattached on ofwbus pnpinfo name=aliases
unattached on ofwbus pnpinfo name=symbols
All simplebus node are disabled
After :
root@pine64-lts:~ # devmatch -u
unattached on ofwbus pnpinfo name=memory
unattached on ofwbus pnpinfo name=chosen
unattached on ofwbus pnpinfo name=sound_spdif compat=simple-audio-card
unattached on ofwbus pnpinfo name=spdif-out compat=linux,spdif-dit
unattached on simplebus pnpinfo name=dma-controller@1c02000 compat=allwinner,sun50i-a64-dma
unattached on simplebus pnpinfo name=usb@1c19000 compat=allwinner,sun8i-a33-musb
unattached on ofwbus pnpinfo name=aliases
unattached on ofwbus pnpinfo name=symbols
Reviewed by: imp (with some objection)
Differential Revision: https://reviews.freebsd.org/D15770
There is a type promotion that transform count = -1 into a unsigned int causing
the default TCE SEG SIZE not being returned on a Boston POWER9 machine.
This machine does not have the 'ibm,supported-tce-sizes' entries, thus, count
is set to -1, and the function continue to execute instead of returning.
Reviewed by: jhibbits, wma
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15763
This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".
Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate. Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.
A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
and Change attributes for the file, so that the MDS doesn't need to
acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories. The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.
For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
is. The critical bits of information returned by the FreeBSD
server is the IP address of the DS and, for the Flexible
File layout, that NFSv4.1 is to be used and that it is
"tightly coupled".
There is a "deviceid" which identifies the DeviceInfo.
Layout - This is per file and can be recalled by the server when it
is no longer valid. For the FreeBSD server, there is support
for two types of layout, call File and Flexible File layout.
Both allow the client to do I/O on the DS via NFSv4.1 I/O
operations. The Flexible File layout is a more recent variant
that allows specification of mirrors, where the client is
expected to do writes to all mirrors to maintain them in a
consistent state. The Flexible File layout also allows the
client to report I/O errors for a DS back to the MDS.
The Flexible File layout supports two variants referred to as
"tightly coupled" vs "loosely coupled". The FreeBSD server always
uses the "tightly coupled" variant where the client uses the
same credentials to do I/O on the DS as it would on the MDS.
For the "loosely coupled" variant, the layout specifies a
synthetic user/group that the client uses to do I/O on the DS.
The FreeBSD server does not do striping and always returns
layouts for the entire file. The critical information in a layout
is Read vs Read/Writea and DeviceID(s) that identify which
DS(s) the data is stored on.
At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.
Tested by: james.rose@framestore.com
Relnotes: yes
- fix debugging for family on AMD cpus and add useful debugging for
which file is being selected for update.
Reviewed by: cem
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15574
an IPI.
This does not work however yet in QEMU. As a temporary workaround set
software interrupt pending bit manually on a local core to ensure WFI
doesn't halt the hart.
This is required to smpboot in QEMU.
Sponsored by: DARPA, AFRL
the system to make use of USB device mode / USB OTG to provide a "virtual
serial port" on release images.
Reviewed by: gjb@
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15602