Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.
Also updated the ifconfig utility to show the LINKSTATE
capability if present.
Reviewed by: rwatson, imp, juli
The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.
Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.
This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.
One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
introduce a local variable rte acting as a cache of ro->ro_rt
within ip_output, achieving (in random order of importance):
- a reduction of the number of 'r's in the source code;
- improved legibility;
- a reduction of 64 bytes in the .text
The flow-table module retrieves the destination and source
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.
Unit testing is done using OpenVPN over the if_tun interface.
- The firmware of Sun Fire V1280 has a misfeature of setting %wstate to
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
it which totally screws us even when restoring %wstate afterwards as
spill/fill traps can happen while in OFW. The rather hackish OpenBSD
approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
no option as we treat %wstate as a bit field. So in order to deal with
this problem actually implement spill/fill handlers for %wstate 7 which
just act as the WSTATE_KERNEL ones except of theoretically also handling
32-bit, turn off interrupts completely so we don't even take IPIs while
in OFW which should ensure we only take spill/fill traps at most and
restore %wstate after calling into OFW once we have taken over the trap
table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
OFW just like OpenSolaris does, which should at least help testing this
change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
interrupts are also completely off when exiting to OFW as the firmware
trap table shouldn't be used to handle our interrupts.
Update the page table locking for the 64-bit PMAP. One of these revisions
largely reverted the other, so there is a small amount of churn and the
addition of some mtx_assert()s.
Fix two small bugs. The PowerPC 970 does not support non-coherent memory
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
- When we run our trap cleanup handler, echo that we are running this
handler to make it more clear why we are 'suddenly' running df,
umount, and mdconfig.
- Remove trap handler again after we have unconfigured the memory
device etc. Before we could end up running the trap handler if a
later stage failed, which was a bit confusing and not really useful.
MFC after: 2 weeks
Check that gl_pathc is bigger than zero before derefencing gl_pathv.
When gl_pathc == 0, the content of gl_pathv is undefined.
PR: bin/144761
Submitted by: David BERARD <contact davidberard fr>
Obtained from: OpenBSD
r205066:
Log:
- restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate
r205069:
Log:
fix stats reporting sysctl
r205093:
Log:
re-update copyright to 2010
pointed out by danfe@
r205097:
Log:
flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB
pointed out by jkim@
r205488:
Log:
- boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
allocating again until below 12.5%
Improve the KVA space sizing of r186682; on machines with large dTLBs we
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.
- Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
which just is the complement of TLB_CXR_CTX_MASK instead of trying to
assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
least debugging purposes.
Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
Print memory model of the video mode except for planar memory model.
'P', 'D', 'C', 'H', and 'V' mean packed pixel, direct color, CGA, Hercules,
and VGA X memory models respectively where they have fixed number of planes.
Sync. pixel mode support for VESA and VGA frame buffers with HEAD.
- Map entire video memory again. Although we do not use them all directly,
it seems VGA renderer may access unmapped memory region and cause kernel
panic.
- Fall back to VGA palette functions if VESA function failed and DAC is
still in 6-bit mode. Although we have to check non-VGA compatibility bit
here, it seems there are too many broken VESA BIOSes out to rely on it.
- Be careful when we determine bytes per scan line information. We compare
mode table data against minimum value. If the mode table does not make
sense, we set the minimum in the mode info.
- Teach VGA framebuffer about 8-bit palette format for VESA.
- Add my copyright here.
Sync. pixel mode support for syscons(4) with HEAD.
- Separate 24-bit pixel draw from 32-bit case. Although it is slower, we do
not want to write a useless zero to inaccessible memory region.
- We only want the dummy palette for direct color mode.
Sync. x86bios with HEAD.
- Detect illegal access to unmapped memory within real mode emulator.
- Map EBDA if available and support memory wraparound above 1MB as VM86 does.
- Set initial %ds to 0x40 as X.org int10 handler does.
- Print the initial memory map when bootverbose is set.
- Optimize real mode page table lookup.
- Add strictly aligned memory access for distant future.
- Update copyright date.
Add a seatbelt to the Nested TLB Fault handler to give us a chance
to panic when we have an unexpected TLB fault while interrupt
collection is disabled.
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).
- Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores
to the exclusion lists as the CPU nodes aren't handled as regular devices
either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
lists.
- Search the whole OFW device tree instead of only the children of the
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
It would be great if we could use newbus to deal with the different ways
the 'cpu' devices can hang off of pseudo ones but unfortunately both
cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
This involves changing the MD KTR code to not directly read the UPA
module ID either. We use the MID stored in the per-CPU data instead of
calling cpu_get_mid() as a replacement in order prevent clobbering any
registers as side-effect in the assembler version. This requires CATR()
invocations from mp_startup() prior to mapping the per-CPU pages to be
removed though.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
machdep.c.
- At least the trap table of the Sun Fire V1280 firmware apparently has
no cleanwindows handler so just remove trying to trigger it from _start
and the AP trampoline code as that leads to a crash there. This should
be okay as leaking data from the OFW via the CPU registers on start of
the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
refer to the right macro.