Remove if_ar, if_ray, if_sr, if_ppp, if_sl to reflect the current modules
available, they were removed due to NEEDSGIANT.
While I'm there, add if_et which was missed quite a while ago.
Add support for SPARC64 V (and where it already makes sense for other
HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the
MMU and cache handling, it doesn't add pmap optimizations possible with
these CPU, yet, though.
With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250
and likely also other models based on SPARC64 V like 450, 650 and 850.
Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.
Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
r206794,r206795,r206796,r206797:
r203504:
Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet@gmail.com>, mr
Tested by: mr, Baginski Darren <kickbsd@ya.ru>
r204067:
Update comment. We also look for GPT partitions.
r204073:
Add tunable and sysctl to skip hostid check on pool import.
r204101:
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz@visp.ru>
r204804:
Remove racy assertion.
Reported by: Attila Nagy <bra@fsn.hu>
Obtained from: OpenSolaris, Bug ID 6827260
r205079:
Remove bogus assertion.
Reported by: Johan Ström <johan@stromnet.se>
Obtained from: OpenSolaris, Bug ID 6920880
r205080:
Force commit to correct Bug ID:
Obtained from: OpenSolaris, Bug ID 6920880
r205132:
Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads
r205133:
fix compilation under ZIO_USE_UMA
r205134:
make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc
r205231:
- reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
contended counter (no lock protection needed for atomics)
r205253:
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
r205264:
- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array
r205346:
The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
or that pool is being created and vdev doesn't contain proper guid yet.
To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.
The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.
r206051:
IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.
Currently we define IOCPARM_MAX as PAGE_SIZE.
This is IMHO wrong for three main reasons:
1. It is confusing on archs with page size larger than 8kB (not really
sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
bigger than 8kB, we won't be able to encode anything larger in ioctl
command.
2. It is a waste. Why the structure can be only 4kB on most archs if we
have 13 bits dedicated for that, not 12?
3. It shouldn't depend on architecture and page size. My ioctl command
can work on one arch, but can't on the other?
Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.
Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.
Silence on: arch@
r206667:
Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv)
-------------------- --------------------- ------------------
vn_lock
rrw_enter_read
rrw_enter_write (hangs)
rrw_enter_read (hangs)
vn_lock (hangs)
Reported by: Attila Nagy <bra@fsn.hu>
r206792:
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
r206793:
Remove racy assertion.
Obtained from: OpenSolaris
r206794:
Extend locks scope to match OpenSolaris.
r206795:
Add missing list and lock destruction.
r206796:
Style fixes.
r206797:
Restore previous order.
Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
Use the SUNW,{d,i}tlb-load methods for entering locked TLB entries like
OpenBSD and OpenSolaris do instead of fiddling with the MMUs ourselves.
Unlike direct access the firmware methods don't automatically use the
next free (?) TLB slot, instead the slot to be used has to be specified.
We allocate the TLB slots for the kernel top-down as OpenSolaris suggests
that the firmware will always allocate the ones for its own use bottom-up.
Besides being simpler, according to OpenBSD using the firmware methods is
required to allow booting on Sun Fire E10K with multi-systemboard domains.
- Assert that HEAPSZ is a multiple of PAGE_SIZE as at least the firmware
of Sun Fire V1280 doesn't round up the size itself but instead lets
claiming of non page-sized amounts of memory fail.
- Change parameters and variables related to the TLB slots to unsigned
which is more appropriate.
- Search the whole OFW device tree instead of only the children of the
root nexus device for the BSP as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
- Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Check the return value of init_heap(). This requires moving it after
cons_probe() so we can panic when appropriate. This should be fine as
the PowerPC OFW loader uses that order for quite some time now.
Enable NETIF_OPEN_CLOSE_ONCE on PowerPC OFW. This fixes netbooting on
PowerPC Book-S hardware, which had been broken for a very long time.
Submitted by: Andreas Tobler
Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
somewhere in the neighborhood of INT_MAX/4 one a system with sufficent
RAM and memory bandwidth. Given that the Windows group limit is
1024, this range should be sufficient for most applications
r202342:
Only allocate the space we need before calling kern_getgroups instead
of allocating what ever the user asks for up to "ngroups_max + 1". On
systems with large values of kern.ngroups this will be more efficient.
The now redundant check that the array is large enough in
kern_getgroups() is deliberate to allow this change to be merged to
stable/8 without breaking potential third party consumers of the API.
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
Space cleanup for revision 202669 committed separately for easier review.
This commit is purely space changes.
Submitted by: Matt Reimer
Sponsored by: VPOP Technologies, Inc.
Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status. Booting from a degraded vdev should now be
more robust.
Submitted by: Matt Reimer <mattjreimer at gmail.com>
Sponsored by: VPOP Technologies, Inc.
- Add code allowing a network device to only be open and closed once
by keeping it opened after the first open and closing it via the
cleanup handler when NETIF_OPEN_CLOSE_ONCE is defined in order to
avoid the open-close-dance on every file access which with firmware
that for example performs an auto-negotiation on every open causes
netbooting to take horribly long. Basically the behavior with this
knob enabled resembles the one employed between r60506 and r177108
(and for sparc64 also again since r182919) with the addition that
the network device now is closed eventually before entering the
kernel and before rebooting. Actually I think this should be the
desired MI behavior, however the U-Boot loader actually requires
net_close() to be called after every transaction in order for some
local shutdown operations to be performed (and which I think thus
will break on concurrent opens, i.e. when netdev_opens is > 1, like
the loader does at least for disks when LOADER_GZIP_SUPPORT is
enabled).
- Use NETIF_OPEN_CLOSE_ONCE to replace the hack, which artificially
increased netdev_opens for sparc64 in order to keep the network
device opened forever, as at least some firmware versions require
the network device to be closed eventually before entering the
kernel or otherwise will DMA received packets to stale memory.
The powerpc OFW loader probably wants NETIF_OPEN_CLOSE_ONCE to be
set as well for the same reasons.
Reimplement the boot2 for pc98 completely.
It's based on the newest i386's one and has the advantage of:
- ELF binary support.
- UFS2 filesystem support.
- Many FreeBSD slices support on a disk.
Revert r183628 as with the current ata(4) ATAPI DMA with AcerLabs
M5229 appears to be once again fixed. If this happens to return
we probably should disable ATAPI DMA in ataacerlabs(4) instead
just like the Linux libATA does.
Don't warn about an RSDP with a corrupt checksum. The kernel does a better
job about warning about these things later and this message can be
confusing.
Fix a confusing typo in the EDD packet structure used in gptboot and
gptzfsboot. I got the segment and offset fields reversed in the structure,
but I also succeeded in crossing the assignments so the actual EDD packet
ended up correct.
- Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it to
safely allocate a heap region above 1MB. This enables {gpt,}zfsboot()
to allocate much larger buffers than before.
- Use a larger buffer (1MB instead of 128K) for temporary ZFS buffers. This
allows more reliable reading of compressed files in a raidz/raidz2 pool.
- Various small whitespace and style fixes.
- Improve the algorithm the loader uses to choose a memory range for its
heap when using a range above 1MB.
Create a seperate ZFS enabled loader.
This adds zfsloader which will be called by zfsboot/gptzfsboot code rather
than the tradional loader. This eliminates the need to set the
LOADER_ZFS_SUPPORT variable in order to get a ZFS enabled loader.
Note however, that you must reinstall your bootcode (zfsboot/gptzfsboot)
in order for the boot process to use the new loader.
New installations will no longer be required to build a ZFS enabled
loader for a working ZFS boot system. Installing zfsboot/gptzfsboot is
sufficient for acknowledging the use of CDDL code and therefore the ZFS
enabled loader.
lindev(4) [1] is supposed to be a collection of linux-specific pseudo
devices that we also support, just not by default (thus only LINT or
module builds by default).
While currently there is only "/dev/full" [2], we are planning to see more
in the future. We may decide to change the module/dependency logic in the
future should the list grow too long.
This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.
Suggested by: rwatson [1] (name)
Submitted by: ed [2]
Discussed with: markm, ed, rwatson, kib (weeks ago)
Reviewed by: rwatson, brueffer (prev. version)
PR: kern/68961
Provide an effective (relocated) address when building modules metadata.
This lets modules loaded dynamically in loader(8) work for U-Boot-based
platforms.
Introduce the new loader compile-time option BOOT_PROMPT_123 which allows
to enter the loader prompt just after entering the sequence "123".
Sponsored by: Sandvine Incorporated
Correct some issues with zfs boot.
- Teach it to read gang blocks. (essentially untested)
If you see "ZFS: gang block detected!", please let
me know, so we can either remove the printf if it
works, or fix it if it doesn't.
- If multiple partitions exist on a disk, probe them all.
We also need to reset dsk->start to 0 to read the right
sector here.
- With GPT, we can have 128 partitions.
- If the bootfs property has ever been set on a pool
it seems that it never goes away. zpool won't allow
you to add to the pool with the bootfs property set.
However, if you clear the property back to default
we end up getting 0 for the object number and read
a bogus block pointer and fail to boot.
- Fix some error printfs. The printf in the loader is
only capable of c,s and u formats.
- Teach printf how to display %llu
If the pxe client is told to use / as the root path, honour that rather
of trying to mount /pxeroot instead.
PR: i386/106493
Submitted by: Andrey Russev
Use zfs_read() instead of xfsread() to read /boot.config. xfsread() fails
short read requests, so the result was that a /boot.config smaller than 512
bytes was ignored. boot2 uses fsread() instead of xfsread() to read
/boot.config already, so this makes zfsboot more like boot2.
Approved by: re (kib)
Fix parse() so that the partition to boot (load /boot/loader) from can
be set. The syntax as printed in main() is used: 0:ad(0p3)/boot/loader
Reviewed by: jhb
Approved by: re (kib)
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
slicei, Apple EFI hardware), the bootloader will fail to recognize the GPT
if it finds anything else but the EFI partition. Change the check to continue
detecting the GPT by looking at the EFI partition on the MBR but
stopping successfuly after finding it.
PR: kern/134590
Submitted by: Christoph Langguth <christoph at rosenkeller.org>
Reviewed by: jhb
MFC after: 2 weeks
Approved by: re (kib)
DP83065 Saturn Gigabit Ethernet controllers. These are the successors
of the Sun GEM controllers and still have a similar but extended transmit
logic. As such this driver is based on gem(4).
Thanks to marcel@ for providing a Sun Quad GigaSwift Ethernet UTP (QGE)
card which was vital for getting this driver to work on architectures
not using Open Firmware.
Approved by: re (kib)
MFC after: 2 weeks