fragment to zero the valid parts of a VM_IO buffer.
RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately.
Reviewed by: alc, tegge
is "void *" (it isn't) or that the default promotion of pid_t is int.
Instead, assume that casting "struct foo *" to "void *" and printing the
result with %p is useful, and that all pid_t's are representable as longs.
Fixed some minor style bugs (mainly spelling errors in comments).
It only supports sa1110 (on simics) right now, but xscale support should come
soon.
Some of the initial work has been provided by :
Stephane Potvin <sepotvin at videotron.ca>
Most of this comes from NetBSD.
Wind River. In the IPv4 output path, one of the tests in ip_output()
checks how many slots are actually available in the interface output
queue before attempting to send a packet. If, for example, we need
to transmit a packet of 32K bytes over an interface with an MTU of
1500, we know it's going to take about 21 fragments to do it. If
there's less than 21 slots left in the output queue, there's no point
in transmitting anything at all: IP does not do retransmission, so
sending only some of the fragments would just be a waste of bandwidth.
(In an extreme case, if you're sending a heavy stream of fragmented
packets, you might find yourself sending nothing by the first fragment
of all your packets.) So if ip_output() notices there's not enough
room in the output queue to send the frame, it just dumps the packet
and returns ENOBUFS to the app.
It turns out ip6_output() lacks this code. Consequently, this caused
the netperf UDPIPV6_STREAM test to produce very poor results with large
write sizes. This commit adds code to check the remaining space in the
output queue and junk fragmented packets if they're too big to be
sent, just like with IPv4. (I can't imagine anyone's running an NFS
server using UDP over IPv6, but if they are, this will likely make them
a lot happier. :)
fills its field (6 characters). In that case the OEMID is not
null-terminated, and the sprintf that was used would copy up to the
next null byte, which could be pretty far away.
registers, so add a register offset array to the softc. We key off the
device ID to determine which set of register offsets. Currently the 8385
host bridge used on amd64 is the only bridge to use the AGP3_VIA_*
register offsets and all other bridges use the AGP_VIA_* offsets. It is
currently unclear if the AGP3_VIA_* offsets are for VIA bridges that
implement AGP 3.0 bridges or just for amd64 bridges.
Submitted by: Kenneth Culver culverk at sweetdreamsracing dot biz
removes a specific thread from a sleep queue. sleepq_resume_thread()
resumes scheduling of a thread that has been previously removed from a
sleep queue.
- sleepq_catch_signals() just removes a thread from the queue it was just
added to when a pending signal is found.
- sleepq_signal() and sleepq_broadcast() remove threads from a queue,
drop the queue lock, and then resume all the previously removed threads.
This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it
makes it a little better as we no longer call setrunnble() with a sleep
queue lock held meaning if setrunnable() tries to wakeup the swapper we
don't try to lock two sleep queue chains at the same time.
proper locking to be checked at runtime.
Remove sb_lock() and sb_unlock() calls from sb_reset_dsp() because the
latter is called from sb_setup() with the lock already held. Add a
call to sb_lockassert().
Surround the call to sb_reset_dsp() in sb16_attach() with sb_lock()
and sb_unlock() calls.
Tested by: Bartek Marcinkiewicz <junior AT p233.if.pwr.wroc.pl>
an option. Note that this option doesn't follow the normal USB_ or
Uxxx_ convention. That's because it is this way in the upstream
provider and I didn't want to change that.
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong. For example, deactivating an unmanaged or wired page is a
nop. Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever. Now, we assert that the page is
neither unmanaged nor wired.
Allow 500us between pauses in ahd_pause_and_flushwork().
The maximum we will wait is now 500ms.
In the same routine, remove any attempt to clear ENSELO.
Let the firmware do it once the current selection has
completed. This avoids some race conditions having to
do with non-packetized completions and the auto-clearing
of ENSELO on packetized completions.
Also avoid attempts to clear critical sections when
interrups are pending. We are going to loop again
anyway, so clearing critical sections is a waste of
time. It also may not be possible to clear a critical
section if the source of the interrupt was a SEQINT.
aic79xx_pci.c:
Use the Generic 9005 mask when looking for generic 7901B
parts. This allows the driver to attach to 7901B parts
on motherboards using a non-Adaptec subvendor ID.
aic79xx_inline.h:
Test for the SCBRAM_RD_BUG against the bugs
field, not the flags field in the softc.
aic79xx.c:
Cancel pending transactions on devices that
respond with a selection timeout. This decreases
the duration of timeout recovery when a device
disappears.
aic79xx.c:
Don't bother forcing renegotiation on a selection
timeout now that we use the device reset handler
to abort any pending commands on the target.
The device reset handler already takes us down
to async narrow and forces a renegotiation.
In the device reset handlers, only send a
BDR sent async event if the status is not
CAM_SEL_TIMEOUT. This avoids sending this
event in the selection timeout case
aic79xx.c:
Modify the Core timeout handler to verify that another
command has the potential to timeout before passing off
a command timeout as due to some other command. This
safety measure is added in response to a timeout recovery
failure on H2B where it appears that incoming reselection
status was lost during a drive pull test. In that case,
the recovery handler continued to wait for the command
that was active on the bus indefinetly. While the root
cause of the above issue is still being determined seems
a prudent safeguard.
aic79xx_pci.c:
Add a specific probe entry for the Dell OEM 39320(B).
aic79xx.c:
aic79xx.h:
aic79xx.reg:
aic79xx.seq:
Modify the aic79xx firmware to never cross a cacheline or
ADB boundary when DMA'ing completion entries to the host.
In PCI mode, at least in 32/33 configurations, the SCB
DMA engine may lose its place in the data-stream should
the target force a retry on something other than an
8byte aligned boundary. In PCI-X mode, we do this to
avoid split transactions since many chipsets seem to be
unable to format proper split completions to continue
the data transfer.
The above change allows us to drop our completion entries
from 8 bytes to 4. We were using 8 byte entries to ensure
that PCI retries could only occur on an 8byte aligned
boundary. Now that the sequencer guarantees this by splitting
up completions, we can safely drop the size to 4 bytes (2
byte tag, one byte SG_RESID, one byte pad).
Both the split-completion and PCI retry problems only show
up under high tag load when interrupt coalescing is being
especially effective. The switch from a 2byte completion
entry to an 8 byte entry to solve the PCI problem increased
the chance of incurring a split in PCI-X mode when multiple
transactions were completed at once. Dropping the completion
size to 4 bytes also means that we can complete more commands
in a single DMA (128byte FIFO -> 32 commands instead of 16).
aic79xx.c:
Modify the SCSIINT handler to defer clearing
sequencer critical sections to the individual
interrupt handlers. This allows us to
immediately disable any outgoing selections in
the case of an unexpected busfree so we don't
inadvertantly clear ENSELO *after* a new selection
has started. Doing so may cause the sequencer
to miss a successful selection.
In ahd_update_pending_scbs(), only clear ENSELO if
the bus is currently busy and a selection is not
already in progress or the sequencer has yet to
handle a pending selection. While we want to ensure
that the selection for the SCB at the head of the
selection queue is restarted so that any change in
negotiation request can take effect, we can't clobber
pending selection state without confusing the sequencer
into missing a selection.
sequencer interrupt codes. These codes are only
relevant to the code that was last being executed
and that context is cleared when we reset the
program counter. This addresses a race condition
between a sequencer interrupt and any SCSI event
that causes us to restart the sequencer.
o When running the untagged-Q, we must start the
timer for any transaction we queue.
o Give the firmware half a millisecond between
pauses to flush work out. This should give us
around half a second of total delay before flagging
an issue with pausing and flushing controller work.
Only attempt to clear critical sections if there
are no pending interrupts in the pause and flush
loop. If the sequencer has issued an INTSTAT, we
may not be able to step out of the critical section.
o Cancel pending transactions on devices that
respond with a selection timeout. This decreases
the duration of timeout recovery when a device
disappears.
Don't bother forcing renegotiation on a selection
timeout now that we use the device reset handler
to abort any pending commands on the target.
The device reset handler already takes us down
to async narrow and forces a renegotiation.
o In the device reset handlers, only send a
BDR sent async event if the status is not
CAM_SEL_TIMEOUT. This avoids sending this
event in the selection timeout case.
o Modify the Core timeout handler to verify that another
command has the potential to timeout before passing off
a command timeout as due to some other command.
are used.
- Reduce duplication of a couple of macros removing the duplicates from
ich.h.
- Remove unused macros from icu.h as well as locore protection as this
header is no longer included in assembly sources.
- Require the APIC enumerators to explicitly enable mixed mode by calling
ioapic_enable_mixed_mode(). Calling this function tells the apic driver
that the PC-AT 8259A PICs are present and routable through the first I/O
APIC via an ExtINT pin. The mptable enumerator always calls this
function for now. The MADT enumerator only enables mixed mode if the
PC-AT compatability flag is set in the MADT header.
- Allow mixed mode to be enabled or disabled via a 'hw.apic.mixed_mode'
tunable. By default this tunable is set to 1 (true). The kernel option
NO_MIXED_MODE changes the default to 0 to preserve existing behavior, but
adding 'hw.apic.mixed_mode=0' to loader.conf achieves the same effect.
- Only use mixed mode to route IRQ 0 if it is both enabled by the APIC
enumerator and activated by the loader tunable. Note that both
conditions must be true, so if the APIC enumerator does not enable mixed
mode, then you can't set the tunable to try to override the enumerator.
to map. If the checksum fails, the table is unmapped and a NULL pointer
returned.
- For ACPI version >= 2.0, check the extended checksum of the RSDP.
AcpiOsGetRootPointer() already checks the version 1.0 checksum.
- Remap the full MADT table at the end of madt_probe() so that we verify
its checksum before saying it is really there.
Requested by: njl
the swizzle method for routing PCI interrupts across the bridge. This
fixes problems with motherboards (typically laptops) whose BIOS doesn't
provide a PRT for the AGP bridge even though there is a device entry for
the bridge in the ACPI namespace.
Tested by: Kenneth Culver culverk at sweetdreamsracing dot biz
it back to userspace, so it does not break bind(2) on raw sockets in jails.
Currently some processes, like traceroute(8) construct a routing request
to determine its source address based on the destination. This sockaddr
data is fed directly to bind(2). When bind calls ifa_ifwithaddr(9) to
make sure the address exists on the interface, the comparison will
fail causing bind(2) to return EADDRNOTAVAIL if the data wasnt zero'ed
before initialization.
Approved by: bmilekic (mentor)
"options OFW_NEWPCI").
This is a bit overdue, the new sparc64 OFW PCI code which is
meant to replace the old one is in place for 10 months and
enabled by default in GENERIC for 8 months. FreeBSD 5.2 and
5.2.1 also shipped with the new code enabled by default.
- Some minor clean-up, e.g. remove functions that encapsulated
the #ifdefs for OFW_NEWPCI, remove unused resp. no longer
required includes, etc.
Approved by: tmm, no objections on freebsd-sparc64
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.
Submitted by: Dan Eischen originally, changed by myself.
there's not dependencies on pccard symboles, such a dependency is not
necessary. This means that drivers that have multiple attachments can
not drag bogus devices into the kernel at load time.
We can't (yet) do this with pci and isa. Drivers written for them
actually do seem to have symbols that depend on these busses'
implementation code.
ndis not touched until other things can be tested.
the kernel. We can guarantee this by resetting the FP status register.
This masks all FP traps. The reason we did get FP traps was that we
didn't reset the FP status register in all cases.
Make sure to reset the FP status register in syscall(). This is one of
the places where it was forgotten.
While on the subject, reset the FP status register only when we trapped
from user space.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.
(Note that this is not the fix for the latter part; pages are still
leaked when a wired area is unmapped in some cases.)
Reviewed by: alc
PR kern/62930
of IP options.
net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified.
net.inet.ip.process_options=1 Process all IP options (default).
net.inet.ip.process_options=2 Reject all packets with IP options with ICMP
filter prohibited message.
This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).
IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.
Reviewed by: sam (mentor)
devices it cannot attach to. This gets rid of extraneous but harmless
device_probe_and_attach() errors. While I'm here, make the device
description more useful. The !acpi case for cpu is handled by legacy0.
and Rx frames up to 8191 octets, so it is perfectly capable of supporting
vlan(4)-style VLAN natively.
Thus, make it support VLAN `oversize' frames.
Reviewed by: tmm
that the OHCI driver uses. Broken OHCI devices (like the controller
in my laptop, apparently) like to set this bit at times. Research
through google shows that this problem has shown up on other systems
as well.
As the scheduling overrun handler doesn't actually do anything, and
the only effect is console spamming, disabling the interrupt seems
to be the right thing to do. (And it is also what linux 2.6 does.)
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.
Reviewed by: tegge@
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.
Reviewed by: marcel, bde, jhb
o The ieee80211_media_status() function updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket.
Approved by: sam
2. Note that ct device uses ctau name as driver name (due to name conflict
with ct driver) and also mark it as a driver inside the CVS tree.
MFC after: 10 days
segment, remove the groping around in the Option ROM segments, remove the
bogus tests for bcopy vs. copyout. There really is no reason for a
management app to know these things other than to create l33t info tables
for the user.
a significant functional change, but it further cleans up the code and
brings it closer to being portable. Thanks to Don Bowman for helping to
test this.
and type for printing info about the device that didn't probe from child, not
parent.
This fixes a panic on systems where not yet supported devices hang off of the
nexus, e.g. on E450.
Reported by: joerg
host-PCI bridge device and find a valid $PIR.
- Make pci_pir_parse() private to pci_pir.c and have pir0's attach routine
call it instead of having legacy_pcib_attach() call it.
- Implement suspend/resume support for the $PIR by giving pir0 a resume
method that calls the BIOS to reroute each link that was already routed
before the machine was suspended.
- Dump the state of the routed flag in the links display code.
- If a link's IRQ is set by a tunable, then force that link to be re-routed
the first time it is used.
- Move the 'Found $PIR' message under bootverbose as the pir0 description
line lists the number of entries already. The pir0 line also only shows
up if we are actually using the $PIR which is a bonus.
- Use BUS_CONFIG_INTR() to ensure that any IRQs used by a PCI link are
set to level/low trigger/polarity.
active low polarity when using the PIC interrupt model. This should fix
broken SCI interrupts on machines when not using the APIC where the BIOS
doesn't program the ELCR to level trigger for the ACPI SCI.
Requested by: njl
polarity for a specified IRQ. The intr_config_intr() function wraps
this pic method hiding the IRQ to interrupt source lookup.
- Add a config_intr() method to the atpic(4) driver that reconfigures
the interrupt using the ELCR if possible and returns an error otherwise.
- Add a config_intr() method to the apic(4) driver that just logs any
requests that would change the existing programming under bootverbose.
Currently, the only changes the apic(4) driver receives are due to bugs
in the acpi(4) driver and its handling of link devices, hence the reason
for such requests currently being ignored.
- Have the nexus(4) driver on i386 implement the bus_config_intr() function
by calling intr_config_intr().
and intr_polarity enums for passing around interrupt trigger modes and
polarity rather than using the magic numbers 0 for level/low and 1 for
edge/high.
- Convert the mptable parsing code to use the new ELCR wrapper code rather
than reading the ELCR directly. Also, use the ELCR settings to control
both the trigger and polarity of EISA IRQs instead of just the trigger
mode.
- Rework the MADT's handling of the ACPI SCI again:
- If no override entry for the SCI exists at all, use level/low trigger
instead of the default edge/high used for ISA IRQs.
- For the ACPI SCI, use level/low values for conforming trigger and
polarity rather than the edge/high values we use for all other ISA
IRQs.
- Rework the tunables available to override the MADT. The
hw.acpi.force_sci_lo tunable is no longer supported. Instead, there
are now two tunables that can independently override the trigger mode
and/or polarity of the SCI. The hw.acpi.sci.trigger tunable can be
set to either "edge" or "level", and the hw.acpi.sci.polarity tunable
can be set to either "high" or "low". To simulate hw.acpi.force_sci_lo,
set hw.acpi.sci.trigger to "level" and hw.acpi.sci.polarity to "low".
If you are having problems with ACPI either causing an interrupt storm
or not working at all (e.g., the power button doesn't turn invoke a
shutdown -p now), you can try tweaking these two tunables to find the
combination that works.
IRQ is edge triggered or level triggered. For ISA interrupts, we assume
that edge triggered interrupts are always active high and that level
triggered interrupts are always active low.
- Don't disable an edge triggered interrupt in the PIC. This avoids
outb instructions to the actual PIC for traditional ISA IRQs such as
IRQ 1, 6, 14, and 15. (Fast interrupts such as IRQs 0 and 8 don't mask
their source, so this doesn't change anything for them.)
- For MCA systems we assume that all interrupts are level triggered and
thus need masking. Otherwise, we probe the ELCR. If it exists we trust
what it tells us regarding which interrupts are level triggered. If it
does not exist, we assume that IRQs 0, 1, 2, and 8 are edge triggered
and that all other IRQs are level triggered and need masking.
- Instruct the ELCR mini-driver to restore its saved state during resume.
register controlled the trigger mode and polarity of EISA interrupts.
However, it appears that most (all?) PCI systems use the ELCR to manage
the trigger mode and polarity of ISA interrupts as well since ISA IRQs used
to route PCI interrupts need to be level triggered with active low
polarity. We check to see if the ELCR exists by sanity checking the value
we get back ensuring that IRQS 0 (8254), 1 (atkbd), 2 (the link from the
slave PIC), and 8 (RTC) are all clear indicating edge trigger and active
high polarity.
This mini-driver will be used by the atpic driver to manage the trigger and
polarity of ISA IRQs. Also, the mptable parsing code will use this mini
driver rather than examining the ELCR directly.
not as a pending interrupt status, but as a matter of status quo.
Consequently, when there's no data to be transmitted the condition
is not cleared and uart_intr() is stuck in an infinite loop trying
to clear the UART_IPEND_TXIDLE status.
The z8530_bus_ipend() function is changed to return idle only once
after having sent any data.
The root cause for this problem is that we cannot use the interrupt
status bits of the SCC itself. The register that holds the interrupt
status can only be accessed by channel A and holds the status for
both channels. Using the interrupt status register would complicate
the driver because we need to synchronize access to the SCC between
the channels.
Elementary testing: marius
labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid
the need for socket layer locking in the lower level network paths
where inpcb locks are already frequently held where needed. In
particular:
- Use the inpcb for label instead of socket in raw_append().
- Use the inpcb for label instead of socket in tcp_output().
- Use the inpcb for label instead of socket in tcp_respond().
- Use the inpcb for label instead of socket in tcp_twrespond().
- Use the inpcb for label instead of socket in syncache_respond().
While here, modify tcp_respond() to avoid assigning NULL to a stack
variable and centralize assertions about the inpcb when inp is
assigned.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
lookup for the label tag fails, return NULL rather than something close
to NULL. This scenario occurs if mbuf header labeling is optional and
a policy requiring labeling is loaded, resulting in some mbufs having
labels and others not. Previously, 0x14 would be returned because the
NULL from m_tag_find() was not treated specially.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
returns okay when HW probe fails. This happens when comconsole flag is
set but VGA console is used instead.
Back out requested by: bde (He will be looking at other solutions from scratch)
test the label pointer for NULL before testing the label slot for
permitted values. When loading mac_test dynamically with conditional
mbuf labels, the label pointer may be NULL if the mbuf was
instantiated while labels were not required on mbufs by any policy.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
synchronization protecting against dynamic load and unload of MAC
policies, and instead simply blocks load and unload. In a static
configuration, this allows you to avoid the synchronization costs
associated with introducing dynamicism.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
interrupt source.
- Only do an outb() to the PIC to clear a bit in imen if the bit is set.
- Add a NUM_ISA_IRQS macro to replace uglier
'sizeof(array) / sizeof(member)' expressions along with a CTASSERT() to
ensure that the macro is correct.
than using legacy_pcib_attach(). The MP Table drivers don't use the $PIR,
and the legacy_pcib_attach() function probes and parses the $PIR in
addition to adding the pci bus child device.
o New function ip_findroute() to reduce code duplication for the
route lookup cases. (luigi)
o Store ip_len in host byte order on the stack instead of using
it via indirection from the mbuf. This allows to defer the host
byte conversion to a later point and makes a quicker fallback to
normal ip_input() processing. (luigi)
o Check if route is dampned with RTF_REJECT flag and drop packet
already here when ARP is unable to resolve destination address.
An ICMP unreachable is sent to inform the sender.
o Check if interface output queue is full and drop packet already
here. No ICMP notification is sent because signalling source quench
is depreciated.
o Check if media_state is down (used for ethernet type interfaces)
and drop the packet already here. An ICMP unreachable is sent to
inform the sender.
o Do not account sent packets to the interface address counters. They
are only for packets with that 'ia' as source address.
o Update and clarify some comments.
Submitted by: luigi (most of it)
o Extend the if_data structure with an ifi_link_state field and
provide the corresponding defines for the valid states.
o The mii_linkchg() callback updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket
in addition to the kqueue KNOTE.
o If vlans are configured on a physical interface notify and update
all vlan pseudo devices as well with the vlan_link_state() callback.
No objections by: sam, wpaul, ru, bms
Brucification by: bde
the falling edge of a media state change.
This is in preparation for media state change notification to the
routing socket.
No objections by: sam, wpaul, ru, bms
Brucification by: bde
o Fix and improve comments and references,
o Add PFIL_HOOKS, UFS_ACL and UFS_DIRHASH,
o Switch from SCHED_4BSD to SCHED_ULE,
o Remove SCSI_DELAY (there's no SCSI support),
- change pf_get_pool() argument rule_number type from u_int32_t
to u_int8_t, fixes corruption of address pools with large
rulesets (mcbride@)
- prevent endless loops with route-to (dhartmei@)
- limit option length to 2 octets max (frantzen@)
Obtained from: OpenBSD
Approved by: mlaier(mentor), bms(mentor)
uncommitted):
Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h
alongside all of the other macros that work ok mbuf's and tag's.
parameter).
Keep using it only in the i386 NOTES for now. It is fairly MI, but it
doesn't use bus-space and has a couple of i386 i/o instructions in pci
intitialization.
intent was to make sure that message structs allocated off of the stack were
4-byte aligned. However, the macros as defined did absolutely nothing.
And since I2O forces you to manually copy messages down to the hardware, there
really is no point of enforced alignment anyways.
rename the control device from rasr%d to asr%d. This starts us down the
path of divorcing ourselves from a very bogus design in the management
apps. Since the apps are open source now, they will likely be updated
and fixed before 5.3.
Adjust staticness and a variable name for the split of cy.c into cy.c and
cy_isa.c. Use the new header required for the split to avoid repeating
declarations in cy_pci.c.
Put @includes in a better spot. Fix many cases of 2 space indents and spaces
between a function name and the parens. Use KASSERT instead of a home-rolled
ASSERT. Remove some undeeded caddr casts.
- Define option FORCECONSPEED to force the serial console to
be CONSPEED. I've run into a lot of boards in which
the detect for prior speed doesn't work and ends up with
broken console since it is at the wrong speed.
- If a serial port is marked as a console, but console=vidconsole
and if the serial ports doesn't exist it will be probed and
attached at a 8250 chip. Then writes to that will freeze the
system.
- Add an option flags 0x400000 to mark this as a potential
comconsole in-case the one flaged with 0x10 does not exist
in the system.
This makes it easier to deploy on systems with one or two serial ports.
Obtained from: IronPort
- Use the dh_inserted member of the dispatch header in the Windows
timer structure to indicate that the timer has been "inserted into
the timer queue" (i.e. armed via timeout()). Use this as the value
to return to the caller in KeCancelTimer(). Previously, I was using
callout_pending(), but you can't use that with timeout()/untimeout()
without creating a potential race condition.
- Make ntoskrnl_init_timer() just a wrapper around ntoskrnl_init_timer_ex()
(reduces some code duplication).
- Drop Giant when entering if_ndis.c:ndis_tick() and
subr_ntorkrnl.c:ntoskrnl_timercall(). At the moment, I'm forced to
use system callwheel via timeout()/untimeout() to handle timers rather
than the callout API (struct callout is too big to fit inside the
Windows struct KTIMER, so I'm kind of hosed). Unfortunately, all
the callouts in the callwhere are not marked as MPSAFE, so when
one of them fires, it implicitly acquires Giant before invoking the
callback routine (and releases it when it returns). I don't need to
hold Giant, but there's no way to stop the callout code from acquiring
it as long as I'm using timeout()/untimeout(), so for now we cheat
by just dropping Giant right away (and re-acquiring it right before
the routine returns so keep the callout code happy). At some point,
I will need to solve this better, but for now this should be a suitable
workaround.
- Remove second license, the first was not that different and should be
fine.
- Add nexus_attach(), and do not perform its task in nexus_probe() any
more.
- Remove nexus_write_ivar(), since it was quite pointless.
- Remove superfluous devinfo members.
- Clean up some comments, minor style issues and prototypes.
as dependent on binutils features/quirks as the current one. This one
loads plain .o files without having to mess with shared object mode.
This happens to be essential on amd64, because binutils hasn't implemented
all the quirks/features that we need for producing the hack non-PIC shared
objects. As it turned out, .o format isn't all that inconvenient after
all. It looks like the ability to use the same .o files for linking
directly into a static kernel or loading as a module might be worth it.
It is still very much a work-in-progress, but it is almost usable. Other
changes are still needed in order to use it though, these have not been
committed yet. There is still a memory corruption/overrun bug somewhere.
For example, test modules load and work, but the machine explodes a few
minutes later in vm_forkproc() or the like. Notable missing things
include kldxref support, and loader(8) support. I wanted to figure out
a working baseline set of code first.
things which compare /etc/fstab entries to results from
getfsstat(). The real way to fix this is to make 'ufs2'
a recognized filesystem (for real, no beating around the
bush).
This should fix things like 'umount -a -t ufs' now.
Appologies for the previous breakage.
boot0sio.s was repo-copied to boot0.S.
- Rename boot0ext.s to boot0ext.S, to stay consistent
with other preprocessed asm files around here, and
for better portability.
Repocopied by: joe
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues. Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.
This commit should fix recent mysql hangs.
Reviewed by: jhb, davidxu
Mysql'd by: Robin P. Blanchard <robin.blanchard at gactr uga edu>
switch to using C99-style comments everywhere in preprocessed
assembler. The reason is that lines starting with the regexp
'^[[:space:]]#' are treated as preprocessing directives, and
while it seems to work now with GCC, it's not necessarily has
to work. Use C99 comments `//' for the trailing comments to
save whitespace.
bridges, the EBus bridge has resource ranges it claims exclusively to
map its children into in its BARs. Hence, we need to allocate these
completely and manage them for the children, instead of just passing
allocations through to the PCI layer as we did before.
While being there, split ebus_probe(), which did also contain code
normally belonging into the attach method, into ebus_probe() and
ebus_attach(), and perform some minor cleanups.
correct interrupt source.
- Cache a pointer to the i8254_intsrc's pending method to avoid several
pointer indirections in i8254_get_timecount().
Reported by: bde
Merge boot0.s and boot0sio.s into boot0_512.s controlled by "#ifdef SIO".
Add Makefile magic to generate boot0.s and boot0sio.s from boot0_512.s.
The compile boot0 and boot0sio have unchanged MD5 checksums.
channels. This also work when PCI native mode has been selected
(patch for /sys/dev/pci/pci.c needed for that) since pci_get_progif
uses the saved value for progif, not the one stored after we may have
changed from legacy mode to native PCI mode.
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.
Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.
The patch being committed is not identical to the patch
in the PR. The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely. This change has also been
presented and addressed on the freebsd-hackers mailing
list.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
libufs, which only works for Charlie root.
This change reverts the introduction of libufs and moves the
check into the kernel. Since the f_fstypename is the same
for both ufs and ufs2, we check fs_magic for presence of
ufs2 and copy "ufs2" explicitly instead.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
possible while maintaining compatibility with the widest range of TCP stacks.
The algorithm is as follows:
---
For connections in the ESTABLISHED state, only resets with
sequence numbers exactly matching last_ack_sent will cause a reset,
all other segments will be silently dropped.
For connections in all other states, a reset anywhere in the window
will cause the connection to be reset. All other segments will be
silently dropped.
---
The necessity of accepting all in-window resets was discovered
by jayanth and jlemon, both of whom have seen TCP stacks that
will respond to FIN-ACK packets with resets not meeting the
strict last_ack_sent check.
Idea by: Darren Reed
Reviewed by: truckman, jlemon, others(?)
1) In pci.c, we need to check the child device's state, not the parent
device's state.
2) In acpi_pci.c, we have to run the power state change after the acpi
method when the old_state is > new state, not the other way around.
Submitted by: Dmitry Remesov
PR: 65694
1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.
There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.
The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.
2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.
=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable
net/if_atmsubr.c
rt_check() cleanup
net/if_ethersubr.c
rt_check() cleanup, arpcom untangling
net/if_fddisubr.c
rt_check() cleanup, arpcom untangling
net/if_iso88025subr.c
rt_check() cleanup
netatalk/aarp.c
arpcom untangling, remove a block of duplicated code
netatalk/at_extern.h
arpcom untangling
netinet/if_ether.c
rt_check() cleanup (change arpresolve)
netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)
- Fix some comments; remove numerous superfluous or outdated ones.
- Correctly pass on the requesting device when handing requests up
to the parent bus.
- Use the complete device name, including unit number, to build the
IOMMU instance name.
- Inline a function that was only used once, and was trivial.
consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.
In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'
- use ifp instead if &ac->ac_if in a couple of nd6* calls;
this removes a useless dependency.
- use IFP2AC(ifp) instead of an extra variable to point to the struct arpcom;
this does not remove the nesting dependency between arpcom and ifnet but
makes it more evident.
caller to vm_page_grab(). Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals. Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads. Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
added an arbitrary delay to our readings, causing us to use the ACPI-safe
read method when not necessary. Submitted by: bde
Old:
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks BAD min = 3, max = 19, width = 16
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks BAD min = 3, max = 19, width = 16
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 4, width = 1
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
New:
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Also, reduce unnecesary overhead in ACPI-fast by remove the barrier for
reads. The timer in the ACPI-fast case is known to increase monotonically
so there is no need to serialize access to it.
misplaced forward declarations of structs). This also reduces namespace
pollution (the misplaced declarations were declared in the !_KERNEL case
when they are not used).
would actually map the file with read access enabled. According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.
The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().
Submitted by: "Mark W. Krentel" <krentel@dreamscape.com>
PR: kern/64573
from tcp_hostcache would have overridden a (now) lower MTU of
an interface or route that changed since first PMTU discovery.
The bug would have caused TCP to redo the PMTU discovery when
not strictly necessary.
Make a comment about already pre-initialized default values
more clear.
Reviewed by: sam
state. Apparently it happens when both devices try to disconnect RFCOMM
multiplexor channel at the same time.
The scenario is as follows:
- local device initiates RFCOMM connection to the remote device. This
creates both RFCOMM multiplexor channel and data channel;
- remote device terminates RFCOMM data channel (inactivity timeout);
- local device acknowledges RFCOMM data channel termination. Because
there is no more active data channels and local device has initiated
connection it terminates RFCOMM multiplexor channel;
- remote device does not acknowledges RFCOMM multiplexor channel
termination. Instead it sends its own request to terminate RFCOMM
multiplexor channel. Even though local device acknowledges RFCOMM
multiplexor channel termination the remote device still keeps
L2CAP connection open.
Because of hanging RFCOMM multiplexor channel subsequent RFCOMM
connections between local and remote devices will fail.
Reported by: Johann Hugo <jhugo@icomtek.csir.co.za>
modules is a very nice way to produce hard-to-find panics. Who would look for
a bug in a Makefile anyway?
Has anyone seen the pointy hat? :-o
Approved by: njl (mentor)
ip_id again. ip_id is already set to the ip_id of the encapsulated packet.
Make a comment about mbuf allocation failures more realistic.
Reviewed by: sobomax
resource pre-allocation. The problem is that the BARs of the EBus bridges
contain the ranges for the resources for the EBus devices beyond the bridge.
So when the EBus code tries to allocate the resource for an EBus device
it's already allocated by the PCI code.
To be removed again as soon as we have a proper solution in the EBus Code.
Reviewed by: tmm
Approved by: marcel (mentor)
source address of a packet exists in the routing table. The
default route is ignored because it would match everything and
render the check pointless.
This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.
Example:
ipfw add 1000 deny ip from any to any not versrcreach
also known in Cisco-speak as:
ip verify unicast source reachable-via any
Reviewed by: luigi
secondary bus is 0, we program the primary bus, the secondary bus and
the suborindate bus. This isn't ideal, since we start at parent_bus +
1 and store this in a static.
Ideally, we'd walk the tree and assign bus numbers. However, that's
harder to accomplish without some help from the bus layer which we're
not planning on doing that until 6.
This fixes my CardBus problems on my Sony PCG-Z1WA, and might fix the
Dells that have had problems.
sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the
culprit(s). We had no provision for processes with P_NOLOAD set. With
pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as
another-normal-process, thus the "expected-plus-one" load reported in
the above thread.
Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>
gadgets (hotkeys, lcd, ...) on Asus laptops. I aim to closely track the
acpi4asus project which implements these features in the Linux kernel.
If this breaks your laptop, please let me know how it does it :-)
Approved by: njl (mentor)
there are a lot of other dependencies that preclude the kernel from
working). Instead, have a more generic note that isa should not be
removed. This should be less confusing for users.
(I hope.)
My original instinct to make ndis_return_packet() asynchronous was correct.
Making ndis_rxeof() submit packets to the stack asynchronously fixes
one recursive spinlock acquisition, but it's also possible for it to
happen via the ndis_txeof() path too. So:
- In if_ndis.c, revert ndis_rxeof() to its old behavior (and don't bother
putting ndis_rxeof_serial() back since we don't need it anymore).
- In kern_ndis.c, make ndis_return_packet() submit the call to the
MiniportReturnPacket() function to the "ndis swi" thread so that
it always happens in another context no matter who calls it.
workaround was for hardware where the clock was not latched, not for
hardware that was too slow. Also, make variable names more specific for ddb
printing.
While I would have prefered to have a solution that didn't move
knowledge of this into the pci layer. However, this is literally the
only exception that's listed in the PCI standard to the usual way of
decoding BARs. atapci devices in legacy mode now ignore the first 4
bars and hard code the values to the legacy ide values (well, for each
of the controllers that are in legacy mode). The 5th bar is handled
normally.
Remove the zero bar handling. zero bars should be ignored at all
other times, and since we handle that specially, we don't need the
older workaround.
what the ACPI-safe workaround is intended to fix. Requested by phk.
Set the bushandle and tag when attaching the timer, don't do it each time
in read_counter(). Pointed out by bde.
Move test_counter() to the end. Staticize acpi_timer_reg.
Clearly comment the assumptions on the structure of keys (addresses)
and masks, and introduce a macro, LEN(p), to extract the size of these
objects instead of using *(u_char *)p which might be confusing.
Comment the confusion in the types used to pass around pointers
to keys and masks, as a reminder to fix that at some point.
Add a few comments on what some functions do.
Comment a probably inefficient (but still correct) section of code
in rn_walktree_from()
The object code generated after this commit is the same as before.
At some point we should also change same variable identifiers such
as "t, tt, ttt" to fancier names such as "root, left, right" (just
in case someone wants to understand the code!), replace misspelling
of NULL as 0, remove 'register' declarations that make little sense
these days.
like "the foo(4) manual page" to "foo(4)". Uniformized the remaining
instances of "manual page" and "manpage" to "man page". Uniformized
some nearby sentence breaks. Reformatted the whole paragraph containing
these changes only for DUMMYNET.
of you with other cards, please do review and test the drivers for
MP-safety and disable Giant in the interrupt routines when you are
sure of proper functionality.
wireless ever since I added the new spinlock code. Previously, I added
a special ndis_rxeof_serial() function to insure that when we receive
a packet, we never end up calling the MiniportReturnPacket() routine
until after the receive handler has finished. I set things up so that
ndis_rxeof_serial() would only be used for serialized miniports since
they depend on this property. Well, it turns out deserialized miniports
depend on a similar property: you can't let MiniportReturnPacket() be
called from the same context as the receive handler at all. The 2100B
driver happens to use a single spinlock for all of its synchronization,
and it tries to acquire it both while in MiniportHandleInterrupt() and
in MiniportReturnPacket(), so if we call MiniportReturnPacket() from
the MiniportHandleInterrupt() context, we will end up trying to acquire
the spinlock recursively, which you can't do.
To fix this, I made the ndis_rxeof_serial() handler the default. An
alternate solution would be to make ndis_return_packet() submit
the call to MiniportReturnPacket() to the NDIS task queue thread.
I may do that in the future, after I've tested things a bit more.
supported. Symptoms of this bug included unnecessary use of ACPI-safe
and a dmesg that has deltas of about 2^24:
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 2, max = 16777210, width = 16777208
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
To fix this:
* Use a 32 bit timecounter mask when the timer is 32 bits.
* In test_counter(), use the acpi_TimerDelta function which handles 24/32
bit timers and wraparound.
Miscellaneous fixes:
* Use C99 initializers for timecounter struct.
* Use u_int and uint32_t where appropriate instead of unsigned.
* Remove whitespace-only lines
* Remove the old PIIX4 PCI workaround. The timecounter testing code has
been in use for long enough to prove it's functional.
globally available. acpi_TimerDelta() subtracts two readings from the
ACPI PM timer and returns the difference. It properly distinguishes between
24-bit and 32-bit timers and handles wraparound.
2. Document that this means that kernel modules must be rebuilt.
3. While I'm here, fix my sorting error in callout.h
Requested by: many [1], scottl [2], bde [3]
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.
Slightly simplify (and document) the logic for memory allocation
in rt_setgate().
The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.
implementation taken directly from OpenBSD.
I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
TIME_WAIT recycling cases I was able to generate with http testing tools.
In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number. As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.
I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.
Except in such contrived benchmarking situations, this problem should never
come up in normal usage... until networks get faster.
No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.