not be used outside of the reassembly queue implementation. Provide a new
function to flush all segments from a reassembly queue and call it from the
appropriate places instead of manipulating the queue directly.
Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo
MFC after: 2 weeks
these could be made dependent on either of the octusb or octe options, but
making them standard fixes a number of option combinations that were previously
broken.
clean up most layering violations:
sys/boot/i386/common/rbx.h:
RBX_* defines
OPT_SET()
OPT_CHECK()
sys/boot/common/util.[ch]:
memcpy()
memset()
memcmp()
bcpy()
bzero()
bcmp()
strcmp()
strncmp() [new]
strcpy()
strcat()
strchr()
strlen()
printf()
sys/boot/i386/common/cons.[ch]:
ioctrl
putc()
xputc()
putchar()
getc()
xgetc()
keyhit() [now takes number of seconds as an argument]
getstr()
sys/boot/i386/common/drv.[ch]:
struct dsk
drvread()
drvwrite() [new]
drvsize() [new]
sys/boot/common/crc32.[ch] [new]
sys/boot/common/gpt.[ch] [new]
- Teach gptboot and gptzfsboot about new files. I haven't touched the
rest, but there is still a lot of code duplication to be removed.
- Implement full GPT support. Currently we just read primary header and
partition table and don't care about checksums, etc. After this change we
verify checksums of primary header and primary partition table and if
there is a problem we fall back to backup header and backup partition
table.
- Clean up most messages to use prefix of boot program, so in case of an
error we know where the error comes from, eg.:
gptboot: unable to read primary GPT header
- If we can't boot, print boot prompt only once and not every five
seconds.
- Honour newly added GPT attributes:
bootme - this is bootable partition
bootonce - try to boot from this partition only once
bootfailed - we failed to boot from this partition
- Change boot order of gptboot to the following:
1. Try to boot from all the partitions that have both 'bootme'
and 'bootonce' attributes one by one.
2. Try to boot from all the partitions that have only 'bootme'
attribute one by one.
3. If there are no partitions with 'bootme' attribute, boot from
the first UFS partition.
- The 'bootonce' functionality is implemented in the following way:
1. Walk through all the partitions and when 'bootonce'
attribute is found without 'bootme' attribute, remove
'bootonce' attribute and set 'bootfailed' attribute.
'bootonce' attribute alone means that we tried to boot from
this partition, but boot failed after leaving gptboot and
machine was restarted.
2. Find partition with both 'bootme' and 'bootonce' attributes.
3. Remove 'bootme' attribute.
4. Try to execute /boot/loader or /boot/kernel/kernel from that
partition. If succeeded we stop here.
5. If execution failed, remove 'bootonce' and set 'bootfailed'.
6. Go to 2.
If whole boot succeeded there is new /etc/rc.d/gptboot script coming
that will log all partitions that we failed to boot from (the ones with
'bootfailed' attribute) and will remove this attribute. It will also
find partition with 'bootonce' attribute - this is the partition we
booted from successfully. The script will log success and remove the
attribute.
All the GPT updates we do here goes to both primary and backup GPT if
they are valid. We don't touch headers or partition tables when
checksum doesn't match.
Reviewed by: arch (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 2 weeks
attribute (it should be allowed only to unset it), but for test purposes it
might be useful, so the current code allows it.
Reviewed by: arch@ (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
MFC after: 2 weeks
GPT_ENT_ATTR_BOOTME - this is bootable partition
GPT_ENT_ATTR_BOOTONCE - try to boot only once from this partition
GPT_ENT_ATTR_BOOTFAILED - set this flag if we cannot boot from partition
containing GPT_ENT_ATTR_BOOTONCE flag; note that if we cannot
boot from partition that contains only GPT_ENT_ATTR_BOOTME flag,
the GPT_ENT_ATTR_BOOTFAILED flag won't be set
According to wikipedia Microsoft TechNet says that attributes are divided into
two halves: the lower 4 bytes representing partition independent attributes,
and the upper 4 bytes are partition type dependent. Microsoft is already using
bits 60 (read-only), 62 (hidden) and 63 (do not automount) and I'd like to not
collide with those, so we are using bit 59 (bootme), 58 (bootonce) and 57
(bootfailed).
Reviewed by: arch (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
MFC after: 2 weeks
in the kernel (just as inet_ntoa() and inet_aton()) are and sync their
prototype accordingly with already mentioned functions.
Sponsored by: Sandvine Incorporated
Reviewed by: emaste, rstone
Approved by: dfr
MFC after: 2 weeks
driver to try to switch interrupt handlers at setup. It's not a very
good implementation of bus_teardown_intr, though.
o) Set cache line size and latency timers for PCI devices per Linux.
o) Reset and configure the bus from scratch rather than expecting U-Boot to
do it for us. Values and configuration from Linux, U-Boot and comments
in the Cavium Simple Executive sources.
o) Do a resource assignment and bus numbering pass in the absence of a PCI
BIOS or firmware that will do it for us.
XXX This has to be the third or fourth instance of this in FreeBSD and
it would be nice to have it become part of the PCI bus driver itself,
like it is on Linux.
o) Fix interrupt mapping for and adjust bus configuration for the Lanner
MR-955, based on information provided by Lanner.
too many bge(4) controllers there and model name does not
necessarily match asic/chip revision. Relying on VPD string made
it hard to identify exact asic/chip revision so the first step to
debug bge(4) was getting exact asic/chip information with verbose
boot which may not be available on production server.
or some variation in the path, the new version assumes that $0 is
newvers.sh path, and that dirname $0/.. is the same as $S aka $SYSDIR.
It also removes knowledge of ${MACHINE} and ${MACHINE_ARCH}, which is
also good.
# I've had this in my tree for about 6 months now, which is why I
# didn't notice that I broke it in r209510 and that was fixed in
# r212954. This should finally resolve the issues people had with
# r204824 as well as address the issues that motivated r204824.
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
rather than using the global knote list lock. Also, use the softc
for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4)
already did this.
- Remove remaining spl calls.
Submitted by: Marcin Cieslak saper of saper|info (3)
MFC after: 2 weeks
the location, apply elf_relocaddr to the symbol value to have right
values for the symbols from dpcpu segment.
PR: kern/147769
Discussed with: avg
Tested by: marius
MFC after: 2 weeks
This is a followup to r212964.
stack_print call chain obtains linker sx lock and thus potentially may
lead to a deadlock depending on a kind of a panic.
stack_print_ddb doesn't acquire any locks and it doesn't use any
facilities of ddb backend.
Using stack_print_ddb outside of DDB ifdef required taking a number of
helper functions from under it as well.
It is a good idea to rename linker_ddb_* and stack_*_ddb functions to
have 'unlocked' component in their name instead of 'ddb', because those
functions do not use any DDB services, but instead they provide unlocked
access to linker symbol information. The latter was previously needed
only for DDB, hence the 'ddb' name component.
Alternative is to ditch unlocked versions altogether after implementing
proper panic handling:
1. stop other cpus upon a panic
2. make all non-spinlock lock operations (mutex, sx, rwlock) be a no-op
when panicstr != NULL
Suggested by: mdf
Discussed with: attilio
MFC after: 2 weeks
address spaces
There has been no need to do that starting with ACPICA 20040427 as
AcpiEnableSubsystem() installs the handlers automatically.
Additionaly, explicitly calling AcpiInstallAddressSpaceHandler before
AcpiEnableSubsystem is not supported by ACPICA and leads to too early
execution of _REG methods in some DSDTs, which may result in problems.
Big thanks to Robert Moore of ACPICA/Intel for explaining the above.
Reported by: Daniel Bilik <daniel.bilik@neosystem.cz>
Tested by: Daniel Bilik <daniel.bilik@neosystem.cz>
Reviewed by: jkim
Suggested by: "Moore, Robert" <robert.moore@intel.com>
MFC after: 1 week
and sys/boot/pc98/boot2, do not simply assign 'gcc' to CC, since compile
flags are sometimes passed via this variable, for example during the
build32 stage on amd64. This caused the 32-bit libobjc build on amd64
to fail.
Instead, only replace the first instance of clang (if any, including
optional path) with gcc, and leave the arguments alone.
Approved-by: rpaulo (mentor)
StarFire controller does not require controller reinitialization to
program perfect filters. While here, make driver immediately exit
from interrupt/polling handler if driver reinitialized controller.
PR: kern/87506
If timer capabilities forcing us to change periodicity mode, try to restore
it back later, as soon as new choosen timer capable to do it. Without this,
timer change like HPET->RTC->HPET always results in enabling periodic mode.
code:
- Accept devfs_mount and devfs_dirent as the arguments instead of a
vnode. This generalizes the function so that it can be used from
contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
provided.
- Make the function global and add its prototype to devfs.h.
Reviewed by: kib
the first line of a script exceeded MAXSHELLCMDLEN characters, then
exec_imgact_shell() silently truncated the line and passed on the truncated
interpreter name or argument. Now, exec_imgact_shell() will fail and return
ENOEXEC, which is the commonly used errno among Unix variants for this type
of error. (2) Previously, exec_imgact_shell()'s check on the length of the
interpreter's name was ineffective. In other words, exec_imgact_shell()
could not possibly fail and return ENAMETOOLONG. The reason being that the
length of the interpreter name had to exceed MAXSHELLCMDLEN characters in
order that ENAMETOOLONG be returned. But, the search for the end of the
interpreter name stops after at most MAXSHELLCMDLEN - 2 characters are
scanned. (In the end, this particular error is eventually discovered
outside of exec_imgact_shell() and ENAMETOOLONG is returned. So, the real
effect of this second change is that the error is detected earlier, in
exec_imgact_shell().)
Update the definition of MAXINTERP to the actual limit on the size of
the interpreter name that has been in effect since r142453 (from
2005).
In collaboration with: kib
The idea is to add KDB and KDB_TRACE options to GENERIC kernels on
stable branches, so that at least the minimal information is produced
for non-specific panics like traps on page faults.
The GENERICs in stable branches seem to already include STACK option.
Reviewed by: attilio
MFC after: 2 weeks
is running on "dummy" time counter. But to function properly in one-shot
mode, event timer management code requires working time counter. Slow
moving "dummy" time counter delays first hardclock() call by few seconds
on my systems, even though timer interrupts were correctly kicking kernel.
That causes few seconds delay during boot with one-shot mode enabled.
To break this loop, explicitly call tc_windup() first time during
initialization process to let it switch to some real time counter.
zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths
and it seems to be a not very good idea to do cpu pinning there.
Suggested by: kib
MFC after: 2 weeks
acl_is_trivial_np(3) properly recognize the new trivial ACLs. From
the user point of view, that means "ls -l" no longer shows plus signs
for all the files when running ZFS v28.
- Add a single sysctl procedure to all three drivers to read an arbitrary
register (the register is passed as arg2). Use it to replace existing
routines in igb(4) that used a separate routine for each register, and
to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
'host to card' sysctl stats from em(4) as they are not implemented in
any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).
Reviewed by: jfv
MFC after: 1 week
- 64 bit fixes for ifnlge.c
- Use m_nextpkt to save mbuf vaddr on 64 bit, we cannot store the
64 bit vaddr in the 40bit freeback field.
- remove unused code and unnecessary variables.
- use xlr_io_mmio macro instead of adding io base address
- rewrite GPIO related code to fixup nlge using xlr_write_reg and DELAY
- support for engg boards major num 11 and 12
- add xlr_paddr_lw() to load 32bit value from physical address, fix
inline assembly
- style fixes
write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e.,
copy-on-write.
(This is a regression in the new implementation of POSIX shared memory
objects that is used by HEAD and RELENG_8. This bug does not exist in
RELENG_7's user-level, file-based implementation.)
PR: 150260
MFC after: 3 weeks
vm_map_unlock_nodefer() part of the synchronization interface for maps.
Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing
how they should be used. In particular, describe the deferred deallocations
issue with vm_map_unlock_and_wait().
Redo the implementation of vm_map_unlock_and_wait() so that it passes
along the caller's file and line information, just like the other map
locking primitives.
Reviewed by: kib
X-MFC after: r212824
Instead of adding custom checks to wait for DCD on open(), just modify
the termios structure to set CLOCAL. This means SIGHUP is no longer
generated when losing DCD as well.
Reviewed by: kib@
MFC after: 1 week
This makes /dev/console more fail-safe and prevents a potential console
lock-up during boot.
Discussed on: stable@
Tested by: koitsu@
MFC after: 1 week
the device:
- unobscure some of the code by moving it into its own functions
- get rid of some magic numbers
- create similar structure as the reference driver has, this should
make further syncs easier
seem to be problems both with the on-board Ethernet interfaces and the em(4)
interfaces on PCI under FreeBSD.
Thanks to Lanner for providing access to hardware.
it frees local locks correctly upon close. In order for
nfsrv_localunlock() to work correctly, the lock can no longer be in
the lockowner's stateid list. As such, nfsrv_freenfslock() has to
be called before nfsrv_localunlock(), to get rid of the lock structure
on the lockowner's stateid list. This only affected operation when
local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is
not the default at this time.
MFC after: 1 week
unlock operations correctly. It was passing in F_SETLK instead of
F_UNLCK as the operation for the unlock case. This only affected
operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled.
MFC after: 1 week
on map unlock to the lock downgrade and later read unlock operation.
System map entries cannot be backed by OBJT_VNODE objects, no need to
defer deallocation for them. Map entries from user maps do not require
the owner map for deallocation, and can be accumulated in the
thread-local list for freeing when a user map is unlocked.
Move the collection of entries for deferred reclamation into
vm_map_delete(). Create helper vm_map_process_deferred(), that is
called from locations where processing is feasible. Do not process
deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is
held.
Reviewed by: alc, rstone (previous versions)
Tested by: pho
MFC after: 2 weeks
was eliminated: all references to sockets are explicitly managed by sorele()
and the protocols. As such, garbage collect sotryfree(), and update
sofree() comments to make the new world order more clear.
MFC after: 3 days
Reported by: Anuranjan Shukla <anshukla at juniper dot net>
This is just a cosmetic change for prettier output.
'indent' variable/parameter serves two purposes: it specifies whitespace
indentation level and also implies cpu group level/depth.
It would have been better to split those two uses,
but for now just a simple change.
MFC after: 1 week
not neccessary. It allows to avoid time counter jump of up to 1/18s, when
base frequency slightly tuned via machdep.i8254_freq sysctl.
Fix few style things.
Suggested by: bde
sending IPI to other CPUs. Otherwise, other CPUs will try to honor stale
value, programming timer for zero interval. If timer is fast enough,
it caused extra interrupt before timer correctly reprogrammed by BSP.
particular edge case where X-axis resolution is not multiple of font width.
Now we just advance enough scan lines, then deduct a partial scan line.
It is more intuitive than the previous code. Apply the same wisdom to EGA
and VGA planar renderers for consistency.
Reported by: David DEMELIER (demelier dot david at gmail dot com)
it possible to boot from ZFS RAIDZ for example from within VirtualBox.
The problem with VirtualBox is that its BIOS reports only one disk present.
If we choose to ignore this report, we can find all the disks available.
We can't have this work-around to be turned on by default, because some broken
BIOSes report true when it comes to number of disks, but present the same disk
multiple times.
separate the decision logic, of whether we can do TSO, and the
calculation of the burst length into two distinct parts.
Change the way the TSO burst length calculation is done. While
TSO could do bursts of 65535 bytes that can't be represented in
ip_len together with the IP and TCP header. Account for that and
use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
have the same value of 64K). When more data is available prevent
less than MSS sized segments from being sent during the current
TSO burst.
Add two more KASSERTs to ensure the integrity of the packets.
Tested by: Ben Wilber <ben-at-desync com>
MFC after: 10 days
When the driver is completely saturated with commands (1024 in the
case of the SAS2008 in my test system), I/O stops. If we tell CAM
that we have one less command slot than we have actually allocated,
everything works fine. We also need a few extra command slots to
allow for aborts and other task management commands to be sent down.
This needs more investigation to determine the root cause, but for
now this fixes things in my testing.
mps.c: Change a printf() to mps_printf().
mps_sas.c: Subtract 5 command slots when we tell CAM how many
commands we can handle.
Add some commented-out logic to print the contents
the CDBs for timed-out commands. This can help
in debugging devices that are timing out. This
will be uncommented once I bring some CAM changes in.
Reported by: Andrew Boyer <aboyer at averesystems dot com>
- Process some tx done messages in the transmit path, to ensure that
the XLR NA tx done FIFO does not overflow.
- Add a message ring handler API to process atmost a given number of
messages from a specified bucket mask. This will be used to process
the tx done messages
- Add a callout to restart transmit in the case transmit gets blocked.
- Update enable_msgring_int() and disable_msgring_int(), remove unused
args and make static.
Obtained from: Sriram Gorti (srgorti at netlogicmicro dot com)
KVA space is abundant on amd64, so there is no reason to limit kernel
map size to a fraction of available physical memory. In fact, it could
be larger than physical memory.
This should help with memory auto-tuning for ZFS and shouldn't affect
other workloads.
This should reduce number of circumstances for "kmem_map too small"
panics, but probably won't eliminate them entirely due to potential kmem
fragmentation.
In fact, you might want/need to limit maximum ARC size after this commit
if you need to resrve more memory for applications.
This change was discussed on arch@ and nobody said "don't do it".
MFC after: 6 weeks
Those checks are not present in upstream code and they are enforced in
actual calculations of delta by which ARC size can be grown or should be
reduced.
MFC after: 3 weeks
vm_paging_target() is not a trigger of any kind for pageademon, but
rather a "soft" target for it when it's already triggered.
Thus, trying to keep 2048 pages above that level at the expense of ARC
was simply driving ARC size into the ground even with normal memory
loads.
Instead, use a threshold at which a pagedaemon scan is triggered, so
that ARC reclaiming helps with pagedaemon's task, but the latter still
recycles active and inactive pages.
PR: kern/146410, kern/138790
MFC after: 3 weeks
Unluckily, using one-shot mode is impossible, when same hardware used for
time counting. Introduce new tunable hint.attimer.0.timecounter, setting
which to 0 disables i8254 time counter and allows one-shot mode. Note,
that on some systems there may be no other reliable enough time counters,
so this tunable should be used with understanding.
According to the MPT2 spec, task management commands are
serialized, and so no I/O should start while task management
commands are active.
So, to comply with that, freeze the SIM queue before we send any
task management commands (abort, target reset, etc.) down to the
IOC. We unfreeze the queue once the task management command
completes.
It isn't clear from the spec whether multiple simultaneous task
management commands are supported. Right now it is possible to
have multiple outstanding task management commands, especially in
the abort case. Multiple outstanding aborts do complete
successfully, so it may be supported.
We also don't yet have any recovery mechanism (e.g. reset the IOC)
if the task management command fails.
to give way for the pluggable congestion control framework. It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.
In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.
In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI. It is always set to 0.
In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI. It is always set to 0.
These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.
No MFC is planned. The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.
- Compile fixes for 9.0, the previous version of this driver was
for FreeBSD 6.
- Add virtual address field in OperationDescriptor_t, we cannot use
MIPS_PHYS_TO_KSEG0 on physical address.
- Fixes for new message ring API
- Remove unused sys/mips/rmi/dev/sec/stats.h
- Whitespace fixes
- Move RMI MIPS extension to atomic increment word (LDADDWU) to common
header file sys/mips/rmi/rmi_mips_exts.h
- Fix xlr_ldaddwu() for 64 bit, it is a 32 bit operation, use
unsigned int* instead of unsigned long* argument
- Provide dummy xlr_enable_kx/xlr_restore_kx for n32 and n64.
- Provide xlr_paddr_ld() instead of xlr_paddr_lw(), so that the
descriptor formats are same for 32 and 64 bit
- update nlge and rge for the changes
These changes are also needed by the security driver which will be
added later.
already updated after allocating mbuf so driver had to use the last
index instead of using next producer index. This should fix driver
hang which may happen under high network load.
Reported by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
Tested by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
MFC after: 10 days
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.
Reviewed by: phk (original patch)
a small difference in the last paragraph though) as suggested by jhb.
Clarify that the 'reviewed by' in r212653 by lstewart was for the
functional change, not the comments in the committed version.
- Change putc_func_t to use a char instead of an int for the character.
- Make functions and variables not used outside of this source file static.
- Remove unused prototypes and variables.
- The OFW read and seek methods take 3 and not 4 input arguments.
- Don't probe for PHYs if we already know to use a SERDES. Unlike as with
cas(4) this only serves to speed up the the device attach though and can
only be determined via the OFW device tree but not from the VPD.
- Don't touch the MIF when using a SERDES.
- Add some missing bus space barriers, mainly in the PCS code path.
which are similar to the previous ones, and one for user maps, which
are arrays of pointers into the SLB tree. This changes makes user SLB
updates atomic, closing a window for memory corruption. While here,
rearrange the allocation functions to make context switches faster.
hardware with a lockless sparse tree design. This marginally improves
the performance of PMAP and allows copyin()/copyout() to run without
acquiring locks when used on wired mappings.
Submitted by: mdf
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.