driver was compiled with.
Remove debug printf from ndis_assicn_pcirsc(). It doesn't serve
much purpose.
Implement NdisMIndicateStatus() and NdisMIndicateStatusComplete()
as functions in subr_ndis.c. In NDIS 4.0, they were functions. In
NDIS 5.0 and later, they're just macros.
Allocate a few extra packets/buffers beyond what the driver asks
for since sometimes it seems they can lie about how many they really
need, and some extra stupid ones don't check to see if NdisAllocatePacket()
and/or NdisAllocateBuffer() actually succeed.
calling the haltfunc. If an interrupt is triggered by the init
or halt func, the IFF_UP flag must be set in order for us to be able
to service it.
In kern_ndis.c: implement a handler for NdisMSendResourcesAvailable()
(currently does nothing since we don't really need it).
In subr_ndis.c:
- Correct ndis_init_string() and ndis_unicode_to_ansi(),
which were both horribly broken.
- Implement NdisImmediateReadPciSlotInformation() and
NdisImmediateWritePciSlotInformation().
- Implement NdisBufferLength().
- Work around my first confirmed NDIS driver bug.
The SMC 9462 gigE driver (natsemi 83820-based copper)
incorrectly creates a spinlock in its DriverEntry()
routine and then destroys it in its MiniportHalt()
handler. This is wrong: spinlocks should be created
in MiniportInit(). In a Windows environment, this is
often not a problem because DriverEntry()/MiniportInit()
are called once when the system boots and MiniportHalt()
or the shutdown handler is called when the system halts.
With this stuff in place, this driver now seems to work:
ndis0: <SMC EZ Card 1000> port 0xe000-0xe0ff mem 0xda000000-0xda000fff irq 10 at device 9.0 on pci0
ndis0: assign PCI resources...
ndis_open_file("FLASH9.hex", 18446744073709551615)
ndis0: Ethernet address: 00:04:e2:0e:d3:f0
subr_ndis.c: implement NdisDprAllocatePacket() and NdisDprFreePacket()
(which are aliased to NdisAllocatePacket() and NdisFreePacket()), and
bump the value we return in ndis_mapreg_cnt() to something ridiculously
large, since some drivers apparently expect to be able to allocate
way more than just 64.
These changes allow the Level 1 1000baseSX driver to work for
the following card:
ndis0: <SMC TigerCard 1000 Adapter> port 0xe000-0xe0ff mem 0xda004000-0xda0043ff irq 10 at device 9.0 on pci0
ndis0: Ethernet address: 00:e0:29:6f:cc:04
This is already supported by the lge(4) driver, but I decided
to take a try at making the Windows driver that came with it work too,
since I still had the floppy diskette for it lying around.
the NTx86 section decoration).
subr_ndis.c: correct the behavior of ndis_query_resources(): if the
caller doesn't provide enough space to return the resources, tell it
how much it needs to provide and return an error.
subr_hal.c & subr_ntoskrnl.c: implement/stub a bunch of new routines;
ntoskrnl:
KefAcquireSpinLockAtDpcLevel
KefReleaseSpinLockFromDpcLevel
MmMapLockedPages
InterlockedDecrement
InterlockedIncrement
IoFreeMdl
KeInitializeSpinLock
HAL:
KfReleaseSpinLock
KeGetCurrentIrql
KfAcquireSpinLock
Lastly, correct spelling of "_aullshr" in the ntoskrnl functable.
copyrights to the inf parser files.
Add a -n flag to ndiscvt to allow the user to override the default
device name of NDIS devices. Instead of "ndis0, ndis1, etc..."
you can have "foo0, foo1, etc..." This allows you to have more than
one kind of NDIS device in the kernel at the same time.
Convert from printf() to device_printf() in if_ndis.c, kern_ndis.c
and subr_ndis.c.
Create UMA zones for ndis_packet and ndis_buffer structs allocated
on transmit. The zones are created and destroyed in the modevent
handler in kern_ndis.c.
printf() and UMA changes submitted by green@freebsd.org
peter and jhb: use __volatile__ to prevent gcc from possibly reordering
code, use a null inline instruction instead of a no-op movl (I would
have done this myself if I knew it was allowed) and combine two register
assignments into a single asm statement.
- if_ndis.c: set the NDIS_STATUS_PENDING flag on all outgoing packets
in ndis_start(), make the resource allocation code a little smarter
about how it selects the altmem range, correct a lock order reversal
in ndis_tick().
ndis_var.h
- In kern_ndis.c:ndis_send_packets(), avoid dereferencing NULL pointers
created when the driver's send routine immediately calls the txeof
handler (which releases the packets for us anyway).
- In if_ndis.c:ndis_80211_setstate(), implement WEP support.
method with something a little more intelligent: use BUS_GET_RESOURCE_LIST()
to run through all resources allocated to us and map them as needed. This
way we know exactly what resources need to be mapped and what their RIDs
are without having to guess. This simplifies both ndis_attach() and
ndis_convert_res(), and eliminates the unfriendly "ndisX: couldn't map
<foo>" messages that are sometimes emitted during driver load.
nb_size field in an ndis_buffer is meant to represent, but it does not
represent the original allocation size, so the sanity check doesn't
make any sense now that we're using the Windows-mandated initialization
method.
Among other things, this makes the following card work with the
NDISulator:
ndis0: <NETGEAR PA301 Phoneline10X PCI Adapter> mem 0xda004000-0xda004fff irq 10 at device 9.0 on pci0
This is that notoriously undocumented 10Mbps HomePNA Broadcom chipset
that people wanted support for many moons ago. Sadly, the only other
HomePNA NIC I have handy is a 1Mbps device, so I can't actually do
any 10Mbps performance tests, but it talks to my 1Mbps ADMtek card
just fine.
For received packets, an status of NDIS_STATUS_RESOURCES means we need
to copy the packet data and return the ndis_packet to the driver immediatel.
NDIS_STATUS_SUCCESS means we get to hold onto the packet, but we have
to set the status to NDIS_STATUS_PENDING so the driver knows we're
going to hang onto it for a while.
For transmit packets, NDIS_STATUS_PENDING means the driver will
asynchronously return the packet to us via the ndis_txeof() routine,
and NDIS_STATUS_SUCCESS means the driver sent the frame, and NDIS
(i.e. the OS) retains ownership of the packet and can free it
right away.
evaluate them. Whatever they're meant to do, they're doing it wrong.
Also:
- Clean up last bits of NULL fallout in subr_pe
- Don't let ndis_ifmedia_sts() do anything if the IFF_UP flag isn't set
- Implement NdisSystemProcessorCount() and NdisQueryMapRegisterCount().
packet being freed has NDIS_STATUS_PENDING in the status field of
the OOB data. Finish implementing the "alternative" packet-releasing
function so it doesn't crash.
For those that are curious about ndis0: <ORiNOCO 802.11abg ComboCard Gold>:
1123 packets transmitted, 1120 packets received, 0% packet loss
round-trip min/avg/max/stddev = 3.837/6.146/13.919/1.925 ms
Not bad!
The log message for rev.1.160 of kern/uipc_syscalls.c and associated
changes only claimed to add restrict qualifiers (which have no effect in
the kernel so they probably shouldn't be added), but the following
interface changes were also made:
- caddr_t to `void *' and `struct sockaddr_t *'
- `int *' to `socklen_t *'.
These interface changes are not quite null, and this fix is quick (like
the changes in uipc_syscalls 1.160) because it uses bogus casts instead
of complete bounds-checked conversions.
Things should be fixed better when the conversions can be done without
using the stack gap. linux_check_hdrincl() already uses the stack gap
and is fixed completely though the type mismatches in it were not fatal
(there were only fatal type mismatches from unopaquing pointers to
[o]sockaddr't's -- the difference between accept()'s args and oaccept()'s
args is now non-opaque, but this is not reflected in their args structs).
mbuf<->packet housekeeping. Instead, add a couple of extra fields
to the end of ndis_packet. These should be invisible to the Windows
driver module.
This also lets me get rid of a little bit of evil from ndis_ptom()
(frobbing of the ext_buf field instead of relying on the MEXTADD()
macro).
- Fix ndis_time().
- Implement NdisGetSystemUpTime().
- Implement RtlCopyUnicodeString() and RtlUnicodeStringToAnsiString().
- In ndis_getstate_80211(), use sc->ndis_link to determine connect
status.
Submitted by: Brian Feldman <green@freebsd.org>
- Add explicit cardbus attachment in if_ndis.c
- Clean up after moving bus_setup_intr() in ndis_attach().
- When setting an ssid, program an empty ssid as a 1-byte string
with a single 0 byte. The Microsoft documentation says this is
how you're supposed to tell the NIC to attach to 'any' ssid.
- Keep trace of callout handles for timers externally from the
ndis_miniport_timer structs, and run through and clobber them
all after invoking the haltfunc just in case the driver left one
running. (We need to make sure all timers are cancelled on driver
unload.)
- Handle the 'cancelled' argument in ndis_cancel_timer() correctly.
NDIS_80211_NET_INFRA_BSS: I accidentally reversed them during
transcription from the Microsoft headers. Note that the
driver will default to BSS mode, and you need to specify
'mediaopt adhoc' to get it into IBSS mode.
supposed to be opaque to the driver, however it is exposed through
several macros which expect certain behavior. In my original
implementation, I used the mappedsystemva member of the structure
to hold a pointer to the buffer and bytecount to hold the length.
It turns out you must use the startva pointer to point to the
page containing the start of the buffer and set byteoffset to
the offset within the page where the buffer starts. So, for a buffer
with address 'baseva,' startva is baseva & ~(PAGE_SIZE -1) and
byteoffset is baseva & (PAGE_SIZE -1). We have to maintain this
convention everywhere that ndis_buffers are used.
Fortunately, Microsoft defines some macros for initializing and
manipulating NDIS_BUFFER structures in ntddk.h. I adapted some
of them for use here and used them where appropriate.
This fixes the discrepancy I observed between how RX'ed packet sizes
were being reported in the Broadcom wireless driver and the sample
ethernet drivers that I've tested. This should also help the
Intel Centrino wireless driver work.
Also try to properly initialize the 802.11 BSS and IBSS channels.
(Sadly, the channel value is meaningless since there's no way
in the existing NDIS API to get/set the channel, but this should
take care of any 'invalid channel (NULL)' messages printed on
the console.
In NdisQueryBuffer() and NdisQueryBufferSafe(), the vaddr argument is
optional, so test it before trying to dereference it.
Also correct NdisGetFirstBufferFromPacket()/NdisGetFirstBufferFromPacketSafe():
we need to use nb_mappedsystemva from the buffer, not nb_systemva.
routines: NdisUnchainBufferAtBack(), NdisGetFirstBufferFromPacketSafe()
and NdisGetFirstBufferFromPacket(). This should bring us a little
closer to getting the Intel centrino wireless NIC to work.
Note: I have not actually tested these additions since I don't
have a driver that calls them, however they're pretty simple, and
one of them is taken pretty much directly from the Windows ndis.h
header file, so I'm fairly confident they work, but disclaimers
apply.
- Make ndis_get_info()/ndis_set_info() sleep on the setdone/getdone
routines if they get back NDIS_STATUS_PENDING.
- Add a bunch of net80211 support so that 802.11 cards can be twiddled
with ifconfig. This still needs more work and is not guaranteed to
work for everyone. It works on my 802.11b/g card anyway.
The problem here is Microsoft doesn't provide a good way to a) learn
all the rates that a card supports (if it has more than 8, you're
kinda hosed) and b) doesn't provide a good way to distinguish between
802.11b, 802.11b/g an 802.11a/b/g cards, so you sort of have to guess.
Setting the SSID and switching between infrastructure/adhoc modes
should work. WEP still needs to be implemented. I can't find any API
for getting/setting the channel other than the registry/sysctl keys.
definitions for more than one device (usually differentiated by
the PCI subvendor/subdevice ID). Each device also has its own tree
of registry keys. In some cases, each device has the same keys, but
sometimes each device has a unique tree but with overlap. Originally,
I just had ndiscvt(8) dump out all the keys it could find, and we
would try to apply them to every device we could find. Now, each key
has an index number that matches it to a device in the device ID list.
This lets us create just the keys that apply to a particular device.
I also added an extra field to the device list to hold the subvendor
and subdevice ID.
Some devices are generic, i.e. there is no subsystem definition. If
we have a device that doesn't match a specific subsystem value and
we have a generic entry, we use the generic entry.
make it more robust. This should fix problems with crashes under
heavy traffic loads that have been reported. Also add a 'query done'
callback handler to satisfy the e100bex.sys sample Intel driver.
NdisAnsiStringToUnicodeString(), NdisWriteConfiguration().
Also add stubs for NdisMGetDeviceProperty(), NdisTerminateWrapper(),
NdisOpenConfigurationKeyByName(), NdisOpenConfigurationKeyByIndex()
and NdisMGetDeviceProperty().
- fix ndis_time() so that it returns a time based on the proper
epoch (wacky though it may be)
- implement NdisInitializeString() and NdisFreeString(), and add
stub for NdisMRemoveMiniport()
ntoskrnl_var.h:
- add missing member to the general_lookaside struct (gl_listentry)
subr_ntoskrnl.c:
- Fix arguments to the interlocked push/pop routines: 'head' is an
slist_header *, not an slist_entry *
- Kludge up _fastcall support for the push/pop routines. The _fastcall
convention is similar to _stdcall, except the first two available
DWORD-sized arguments are passed in %ecx and %edx, respectively.
One kludge for this __attribute__ ((regparm(3))), however this
isn't entirely right, as it assumes %eax, %ecx and %edx will be
used (regparm(2) assumes %eax and %edx). Another kludge is to
declare the two fastcall-ed args as local register variables and
explicitly assign them to %ecx and %edx, but experimentation showed
that gcc would not guard %ecx and %edx against being clobbered.
Thus, I came up with a 3rd kludge, which is to use some inline
assembly of the form:
void *arg1;
void *arg2;
__asm__("movl %%ecx, %%ecx" : "=c" (arg1));
__asm__("movl %%edx, %%edx" : "=d" (arg2));
This lets gcc know that we're going to reference %ecx and %edx and
that it should make an effort not to let it get trampled. This wastes
an instruction (movl %reg, %reg is a no-op) but insures proper
behavior. It's possible there's a better way to do this though:
this is the first time I've used inline assembler in this fashion.
The above fixes to ntoskrnl_var.h an subr_ntoskrnl.c make lookaside
lists work for the two drivers I have that use them, one of which
is an NDIS 5.0 miniport and another which is 5.1.
subr_ndis.c: NdisGetCurrentSystemTime() which, according to the
Microsoft documentation returns "the number of 100 nanosecond
intervals since January 1, 1601." I have no idea what's so special
about that epoch or why they chose 100 nanosecond ticks. I don't
know the proper offset to convert nanotime() from the UNIX epoch
to January 1, 1601, so for now I'm just doing the unit convertion
to 100s of nanoseconds.
subr_ntoskrnl.c: memcpy(), memset(), ExInterlockedPopEntrySList(),
ExInterlockedPushEntrySList().
The latter two are different from InterlockedPopEntrySList()
and InterlockedPushEntrySList() in that they accept a spinlock to
hold while executing, whereas the non-Ex routines use a lock
internal to ntoskrnl. I also modified ExInitializePagedLookasideList()
and ExInitializeNPagedLookasideList() to initialize mutex locks
within the lookaside structures. It seems that in NDIS 5.0,
the lookaside allocate/free routines ExInterlockedPopEntrySList()
and ExInterlockedPushEntrySList(), which require the use of the
per-lookaside spinlock, whereas in NDIS 5.1, the per-lookaside
spinlock is deprecated. We need to support both cases.
Note that I appear to be doing something wrong with
ExInterlockedPopEntrySList() and ExInterlockedPushEntrySList():
they don't appear to obtain proper pointers to their arguments,
so I'm probably doing something wrong in terms of their calling
convention (they're declared to be FASTCALL in Widnows, and I'm
not sure what that means for gcc). It happens that in my stub
lookaside implementation, they don't need to do any work anyway,
so for now I've hacked them to always return NULL, which avoids
corrupting the stack. I need to do this right though.
it's an error to set the buffer bytecount to anything larger than
the buffer's original allocation size, but anything less than that
is ok.
Also, in ndis_ptom(), use the same logic: if the bytecount is
larger than the allocation size, consider the bytecount invalid
and the allocation size as the packet fragment length (m_len)
instead of the bytecount.
This corrects a consistency problem between the Broadcom wireless
driver and some of the ethernet drivers I've tested: the ethernet
drivers all report the packet frag sizes in buf->nb_bytecount, but
the Broadcom wireless driver reports them in buf->nb_size. This
seems like a bug to me, but it clearly must work in Windows, so
we have to deal with it here too.
is provided to NDIS via the the miniport characteristics structure
supplied in the call to NdisMRegisterMiniport(). But in NDIS 5.0
and earlier, you had to call NdisMRegisterAdapterShutdownHandler()
and supply both a function pointer and context pointer.
We try to handle both cases in ndis_shutdown_nic(). If the
driver registered a shutdown routine and a context,then used
that context, otherwise pass it the adapter context from
NdisMSetAttributesEx().
This fixes a panic on shutdown with the sample Intel 82559 e100bex.sys
driver from the Windows DDK.
function pointer
Yes, it's what you think it is. Yes, you should run away now.
This is a special compatibility module for allowing Windows NDIS
miniport network drivers to be used with FreeBSD/x86. This provides
_binary_ NDIS compatibility (not source): you can run NDIS driver
code, but you can't build it. There are three main parts:
sys/compat/ndis: the NDIS compat API, which provides binary
compatibility functions for many routines in NDIS.SYS, HAL.dll
and ntoskrnl.exe in Windows (these are the three modules that
most NDIS miniport drivers use). The compat module also contains
a small PE relocator/dynalinker which relocates the Windows .SYS
image and then patches in our native routines.
sys/dev/if_ndis: the if_ndis driver wrapper. This module makes
use of the ndis compat API and can be compiled with a specially
prepared binary image file (ndis_driver_data.h) containing the
Windows .SYS image and registry key information parsed out of the
accompanying .INF file. Once if_ndis.ko is built, it can be loaded
and unloaded just like a native FreeBSD kenrel module.
usr.sbin/ndiscvt: a special utility that converts foo.sys and foo.inf
into an ndis_driver_data.h file that can be compiled into if_ndis.o.
Contains an .inf file parser graciously provided by Matt Dodd (and
mercilessly hacked upon by me) that strips out device ID info and
registry key info from a .INF file and packages it up with a binary
image array. The ndiscvt(8) utility also does some manipulation of
the segments within the .sys file to make life easier for the kernel
loader. (Doing the manipulation here saves the kernel code from having
to move things around later, which would waste memory.)
ndiscvt is only built for the i386 arch. Only files.i386 has been
updated, and none of this is turned on in GENERIC. It should probably
work on pc98. I have no idea about amd64 or ia64 at this point.
This is still a work in progress. I estimate it's about %85 done, but
I want it under CVS control so I can track subsequent changes. It has
been tested with exactly three drivers: the LinkSys LNE100TX v4 driver
(Lne100v4.sys), the sample Intel 82559 driver from the Windows DDK
(e100bex.sys) and the Broadcom BCM43xx wireless driver (bcmwl5.sys). It
still needs to have a net80211 stuff added to it. To use it, you would
do something like this:
# cd /sys/modules/ndis
# make; make load
# cd /sys/modules/if_ndis
# ndiscvt -i /path/to/foo.inf -s /path/to/foo.sys -o ndis_driver_data.h
# make; make load
# sysctl -a | grep ndis
All registry keys are mapped to sysctl nodes. Sometimes drivers refer
to registry keys that aren't mentioned in foo.inf. If this happens,
the NDIS API module creates sysctl nodes for these keys on the fly so
you can tweak them.
An example usage of the Broadcom wireless driver would be:
# sysctl hw.ndis0.EnableAutoConnect=1
# sysctl hw.ndis0.SSID="MY_SSID"
# sysctl hw.ndis0.NetworkType=0 (0 for bss, 1 for adhoc)
# ifconfig ndis0 <my ipaddr> netmask 0xffffff00 up
Things to be done:
- get rid of debug messages
- add in ndis80211 support
- defer transmissions until after a status update with
NDIS_STATUS_CONNECTED occurs
- Create smarter lookaside list support
- Split off if_ndis_pci.c and if_ndis_pccard.c attachments
- Make sure PCMCIA support works
- Fix ndiscvt to properly parse PCMCIA device IDs from INF files
- write ndisapi.9 man page
with the sendsig code in the MD area. It is not safe to assume that all
the register conventions will be the same. Also, the way of producing
32 bit code (.code32 directives) in this file is amd64 specific.
The split-up code is derived from the ia64 code originally.
Note that I have only compile-tested this, not actually run-tested it.
The ia64 side of the force is missing some significant chunks of signal
delivery code.
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
Reported by: truckman
Approved by: re@ (rwatson)
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).
Obtained from: OpenBSD
MFC after: 2 weeks
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().
Some fixes obtain from OpenBSD.
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.
Tested with: Oracle 9i shared server connection mode.
MFC after: 1 week
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.
Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.
from the ia32 specific stuff. Some of this still needs to move to the MI
freebsd32 area, and some needs to move to the MD area. This is still
work-in-progress.
like we have on other platforms. Move savectx() to <machine/pcb.h>.
A lot of files got these MD prototypes through the indirect inclusion
of <machine/cpu.h> and now need to include <machine/md_var.h>. The
number of which is unexpectedly large...
osf1_misc.c especially is tricky because szsigcode is redefined in
one of the osf1 header files. Reordering of the include files was
needed.
linprocfs.c now needs an explicit extern declaration.
Tested with: LINT
- cut the version string at the newline, suppressing information about
who built the kernel and in what directory. Most of this information
was already lost to truncation.
- on i386, return the precise CPU class (if known) rather than just
"i386". Linux software which uses this information to select
which binary to run often does not know what to make of "i386".
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
paging space and how much of it is in use (in pages).
Use this interface from the Linuxolator instead of groping around in the
internals of the swap_pager.
the VMIN and VTIME members of the c_cc array. These members are not
special control characters. By not excluding these members we
changed the noncanonical mode input processing when both members
were 0 on entry (=LINUX_POSIX_VDISABLE) as we would remap them to 255
(=_POSIX_VDISABLE). See termios(4) case A for how that screws up
your terminal I/O.
PR: 23173
Originator: Bjarne Blichfeldt <bbl@dk.damgaard.com>
Patch by: Boris Nikolaus <bn@dali.tellique.de> (original submission)
Philipp Mergenthaler <philipp.mergenthaler@stud.uni-karlsruhe.de>
Reminders by: Joseph Holland King <gte743n@cad.gatech.edu>
MFC after: 5 days
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
having their stack at the 512GB mark. Give 4GB of user VM space for 32
bit apps. Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).
Approved by: re (blanket)
load_gs() calls into a single place that is less likely to go wrong.
Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.
Approved by: re (amd64/* blanket)
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.
Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.
On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.
Approved by: re (amd64/* blanket)
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
Reviewed by: arch@
Approved by: re (rwatson)
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.
The patch involves the following functions:
- shmat
- shmctl
- shm_find_segment_by_shmid
- shm_find_segment_by_shmidx
- linux_shmat
- linux_shmctl
Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by: marcel
do for newstat_copyout().
Lie about disk drives which are character devices
in FreeBSD but block devices under Linux.
PR: 37227
Submitted by: Vladimir B. Grebenschikov <vova@sw.ru>
Reviewed by: phk
MFC after: 2 weeks
FreeBSD flags instead of just adding one to the Linux flags. This should
be identical to the previous version except that I have at least one report
of this patch fixing problems people were having with Linux apps after my
last commit to this file. It is safer to use the switch then to make
assumptions about the flag values anyways, esp. since we currently use
MD defines for the values of the flags and this is MI code.
Tested by: Michael Class <michael_class@gmx.net>
kern_sigprocmask() in the various binary compatibility emulators.
- Replace calls to sigsuspend(), sigaltstack(), sigaction(), and
sigprocmask() that used the stackgap with calls to the corresponding
kern_sig*() functions instead without using the stackgap.
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.
Reviewed by: rwatson, tjr
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
functions are now all basically identical except that alpha linux uses
Elf64 arguments and svr4 and i386 linux use Elf32. The fixups include
changing the first argument to be a register_t ** to match the prototype
for fixup functions, asserting that the process in the image_params struct
is always curproc and removing unnecessary locking to read credentials as a
result, and a few style fixes.
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)
This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality. The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too. It should
be possible to simplify libc's getcwd() after this. No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.
PR: 48169
Submitted by: James Whitwell <abacau@yahoo.com.au>
take a thread instead of a proc for their first argument.
- Add a mutex to protect the system-wide Linux osname, osrelease, and
oss_version variables.
- Change linux_get_prison() to take a thread instead of a proc for its
first argument and to use td_ucred rather than p_ucred. This is ok
because a thread's prison does not change even though it's ucred might.
- Also, change linux_get_prison() to return a struct prison * instead of
a struct linux_prison * since it returns with the struct prison locked
and this makes it easier to safely unlock the prison when we are done
messing with it.
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.
so be more careful about calling stackgap_init.
Tested by: Fred Souza <fred@storming.org>
2) Linux_sendmsg was forgetting to fill out the bsd_args struct.
Reviewed by: ume
3) The args to linux_connect have differently named types on alpha and
i386, so add a cast to stop gcc complaining.
Spotted by: peter
pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
code, make the emulator use it.
Rename unsupported_msg() to unimplemented_syscall(). Rename some arguments
for clarity
Fixup grammar.
Requested by: bde
the same as fcntl() except that it supports the new 64-bit file
locking commands (LINUX_F_GETLK64 etc) that use the `flock64'
structure. We had been interpreting all flock structures passed to
fcntl64() as `struct flock64' instead of only the ones from F_*64
commands.
The glibc in linux_base-7 uses fcntl64() by default, but the bug
was often non-fatal since the misinterpretation typically only
causes junk to appear in the `l_len' field and most junk values are
accepted as valid range lengths. The result is occasional EINVAL
errors from F_SETLK and a few bytes after the supplied `struct
flock' getting clobbered during F_GETLK.
PR: kern/37656
Reviewed by: marcel
Approved by: re
MFC after: 1 week
(1) Permit userland applications to request a change of label atomic
with an execve() via mac_execve(). This is required for the
SEBSD port of SELinux/FLASK. Attempts to invoke this without
MAC compiled in result in ENOSYS, as with all other MAC system
calls. Complexity, if desired, is present in policy modules,
rather than the framework.
(2) Permit policies to have access to both the label of the vnode
being executed as well as the interpreter if it's a shell
script or related UNIX nonsense. Because we can't hold both
vnode locks at the same time, cache the interpreter label.
SEBSD relies on this because it supports secure transitioning
via shell script executables. Other policies might want to
take both labels into account during an integrity or
confidentiality decision at execve()-time.
Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
describes an image activation instance. Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv. With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.
There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
It is never used. I left it there from pre-KSE days as I didn't know
if I'd need it or not but now I know I don't.. It's functionality
is in TDI_IWAIT in the thread.
This is for the not-quite-ready signal/fpu abi stuff. It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.
execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- add wrappers for mmap2(2) and ftruncate64(2) system calls;
- don't spam console with printf's when VFAT_READDIR_BOTH ioctl(2) is invoked;
- add support for SOUND_MIXER_READ_STEREODEVS ioctl(2);
- make msgctl(IPC_STAT) and IPC_SET actually working by converting from
BSD msqid_ds to Linux and vice versa;
- properly return EINVAL if semget(2) is called with nsems being negative.
Reviewed by: marcel
Approved by: marcel
Tested with: LSB runtime test
checks from the MAC tree: allow policies to perform access control
for the ability of a process to send and receive data via a socket.
At some point, we might also pass in additional address information
if an explicit address is requested on send.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.
After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.
CODAFS is unfinished in this regard because the logic is unclear in
some places.
Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
NVIDIA API calls; more specifically, it adds an ioctl() handler for
the range of possible NVIDIA ioctl numbers.
Submitted by: Christian Zander <zander@minion.de>
available at module compile time. Do not #include the bogus
opt_kstack_pages.h at this point and instead refer to the variables that
are also exported via sysctl.
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.
linux_emul_find() that does not use stack gap storage but instead
always returns the resulting path in a malloc'd kernel buffer.
Implement linux_emul_find() in terms of this function. Also add
LCONVPATH* macros that wrap linux_emul_convpath in the same way
that the CHECKALT* macros wrap linux_emul_find().
compat code. Clean up accounting for multiple segments. Part 1/2.
Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications)
MFC after: 3 days
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.
- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.
Trickle this change down into fo_stat/poll() implementations:
- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.
- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.
Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
SVR4 emulation relating to readdir() and fd_revoke(). All other
services appear to be implemented by simply wrapping existing
FreeBSD native system call implementations, so don't require local
instrumentation in the emulator module.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
struct mount is not cached as *mp at this point, so use
vp->v_mount directly, following the check that it's non-NULL.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
kernel access control.
Invoke appropriate MAC entry points for a number of VFS-related
operations in the Linux ABI module. In particular, handle uselib
in a manner similar to open() (more work is probably needed here),
as well as handle statfs(), and linux readdir()-like calls.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).