5561 Commits

Author SHA1 Message Date
Julian Elischer
05b0fdac8c Change two variables to size_t to improve portability.
Submitted by:	Xin Li
2008-05-10 15:02:56 +00:00
Julian Elischer
8b07e49a00 Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

  One thing where FreeBSD has been falling behind, and which by chance I
  have some time to work on is "policy based routing", which allows
  different
  packet streams to be routed by more than just the destination address.

  Constraints:
  ------------

  I want to make some form of this available in the 6.x tree
  (and by extension 7.x) , but FreeBSD in general needs it so I might as
  well do it in -current and back port the portions I need.

  One of the ways that this can be done is to have the ability to
  instantiate multiple kernel routing tables (which I will now
  refer to as "Forwarding Information Bases" or "FIBs" for political
  correctness reasons). Which FIB a particular packet uses to make
  the next hop decision can be decided by a number of mechanisms.
  The policies these mechanisms implement are the "Policies" referred
  to in "Policy based routing".

  One of the constraints I have if I try to back port this work to
  6.x is that it must be implemented as a EXTENSION to the existing
  ABIs in 6.x so that third party applications do not need to be
  recompiled in timespan of the branch.

  This first version will not have some of the bells and whistles that
  will come with later versions. It will, for example, be limited to 16
  tables in the first commit.
  Implementation method, Compatible version. (part 1)
  -------------------------------
  For this reason I have implemented a "sufficient subset" of a
  multiple routing table solution in Perforce, and back-ported it
  to 6.x. (also in Perforce though not  always caught up with what I
  have done in -current/P4). The subset allows a number of FIBs
  to be defined at compile time (8 is sufficient for my purposes in 6.x)
  and implements the changes needed to allow IPV4 to use them. I have not
  done the changes for ipv6 simply because I do not need it, and I do not
  have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

  Other protocol families are left untouched and should there be
  users with proprietary protocol families, they should continue to work
  and be oblivious to the existence of the extra FIBs.

  To understand how this is done, one must know that the current FIB
  code starts everything off with a single dimensional array of
  pointers to FIB head structures (One per protocol family), each of
  which in turn points to the trie of routes available to that family.

  The basic change in the ABI compatible version of the change is to
  extent that array to be a 2 dimensional array, so that
  instead of protocol family X looking at rt_tables[X] for the
  table it needs, it looks at rt_tables[Y][X] when for all
  protocol families except ipv4 Y is always 0.
  Code that is unaware of the change always just sees the first row
  of the table, which of course looks just like the one dimensional
  array that existed before.

  The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
  are all maintained, but refer only to the first row of the array,
  so that existing callers in proprietary protocols can continue to
  do the "right thing".
  Some new entry points are added, for the exclusive use of ipv4 code
  called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
  which have an extra argument which refers the code to the correct row.

  In addition, there are some new entry points (currently called
  rtalloc_fib() and friends) that check the Address family being
  looked up and call either rtalloc() (and friends) if the protocol
  is not IPv4 forcing the action to row 0 or to the appropriate row
  if it IS IPv4 (and that info is available). These are for calling
  from code that is not specific to any particular protocol. The way
  these are implemented would change in the non ABI preserving code
  to be added later.

  One feature of the first version of the code is that for ipv4,
  the interface routes show up automatically on all the FIBs, so
  that no matter what FIB you select you always have the basic
  direct attached hosts available to you. (rtinit() does this
  automatically).

  You CAN delete an interface route from one FIB should you want
  to but by default it's there. ARP information is also available
  in each FIB. It's assumed that the same machine would have the
  same MAC address, regardless of which FIB you are using to get
  to it.

  This brings us as to how the correct FIB is selected for an outgoing
  IPV4 packet.

  Firstly, all packets have a FIB associated with them. if nothing
  has been done to change it, it will be FIB 0. The FIB is changed
  in the following ways.

  Packets fall into one of a number of classes.

  1/ locally generated packets, coming from a socket/PCB.
     Such packets select a FIB from a number associated with the
     socket/PCB. This in turn is inherited from the process,
     but can be changed by a socket option. The process in turn
     inherits it on fork. I have written a utility call setfib
     that acts a bit like nice..

         setfib -3 ping target.example.com # will use fib 3 for ping.

     It is an obvious extension to make it a property of a jail
     but I have not done so. It can be achieved by combining the setfib and
     jail commands.

  2/ packets received on an interface for forwarding.
     By default these packets would use table 0,
     (or possibly a number settable in a sysctl(not yet)).
     but prior to routing the firewall can inspect them (see below).
     (possibly in the future you may be able to associate a FIB
     with packets received on an interface..  An ifconfig arg, but not yet.)

  3/ packets inspected by a packet classifier, which can arbitrarily
     associate a fib with it on a packet by packet basis.
     A fib assigned to a packet by a packet classifier
     (such as ipfw) would over-ride a fib associated by
     a more default source. (such as cases 1 or 2).

  4/ a tcp listen socket associated with a fib will generate
     accept sockets that are associated with that same fib.

  5/ Packets generated in response to some other packet (e.g. reset
     or icmp packets). These should use the FIB associated with the
     packet being reponded to.

  6/ Packets generated during encapsulation.
     gif, tun and other tunnel interfaces will encapsulate using the FIB
     that was in effect withthe proces that set up the tunnel.
     thus setfib 1 ifconfig gif0 [tunnel instructions]
     will set the fib for the tunnel to use to be fib 1.

  Routing messages would be associated with their
  process, and thus select one FIB or another.
  messages from the kernel would be associated with the fib they
  refer to and would only be received by a routing socket associated
  with that fib. (not yet implemented)

  In addition Netstat has been edited to be able to cope with the
  fact that the array is now 2 dimensional. (It looks in system
  memory using libkvm (!)). Old versions of netstat see only the first FIB.

  In addition two sysctls are added to give:
  a) the number of FIBs compiled in (active)
  b) the default FIB of the calling process.

  Early testing experience:
  -------------------------

  Basically our (IronPort's) appliance does this functionality already
  using ipfw fwd but that method has some drawbacks.

  For example,
  It can't fully simulate a routing table because it can't influence the
  socket's choice of local address when a connect() is done.

  Testing during the generating of these changes has been
  remarkably smooth so far. Multiple tables have co-existed
  with no notable side effects, and packets have been routes
  accordingly.

  ipfw has grown 2 new keywords:

  setfib N ip from anay to any
  count ip from any to any fib N

  In pf there seems to be a requirement to be able to give symbolic names to the
  fibs but I do not have that capacity. I am not sure if it is required.

  SCTP has interestingly enough built in support for this, called VRFs
  in Cisco parlance. it will be interesting to see how that handles it
  when it suddenly actually does something.

  Where to next:
  --------------------

  After committing the ABI compatible version and MFCing it, I'd
  like to proceed in a forward direction in -current. this will
  result in some roto-tilling in the routing code.

  Firstly: the current code's idea of having a separate tree per
  protocol family, all of the same format, and pointed to by the
  1 dimensional array is a bit silly. Especially when one considers that
  there is code that makes assumptions about every protocol having the
  same internal structures there. Some protocols don't WANT that
  sort of structure. (for example the whole idea of a netmask is foreign
  to appletalk). This needs to be made opaque to the external code.

  My suggested first change is to add routing method pointers to the
  'domain' structure, along with information pointing the data.
  instead of having an array of pointers to uniform structures,
  there would be an array pointing to the 'domain' structures
  for each protocol address domain (protocol family),
  and the methods this reached would be called. The methods would have
  an argument that gives FIB number, but the protocol would be free
  to ignore it.

  When the ABI can be changed it raises the possibilty of the
  addition of a fib entry into the "struct route". Currently,
  the structure contains the sockaddr of the desination, and the resulting
  fib entry. To make this work fully, one could add a fib number
  so that given an address and a fib, one can find the third element, the
  fib entry.

  Interaction with the ARP layer/ LL layer would need to be
  revisited as well. Qing Li has been working on this already.

  This work was sponsored by Ironport Systems/Cisco

Reviewed by:    several including rwatson, bz and mlair (parts each)
Obtained from:  Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
Coleman Kane
c4ca06b9b3 Update the lib/expat tree for the new v2.0.1 expat import. The bsdxml.h
header is now in two parts: bsdxml.h and bsdxml_external.h, representing
the expat.h and expat_external.h headers. Updated the info on the man
page as well. Also, fixed a type-error in a printf in
sbin/ifconfig/regdomain.c that would cause a compiler warning.

Approved by:	sam, phk
2008-05-08 14:01:42 +00:00
Robert Watson
0693424576 Add "ddb capture print" and "ddb capture status" commands do ddb(8),
alowing the DDB output capture buffer to be easily extracted from
user space.  Both of these commands include -M/-N arguments, allowing
them to be used with kernel crash dumps (or /dev/mem).

This makes it easier to use DDB scripting and output capture with
minidumps or full dumps rather than with text dumps, allowing DDB
output (scripted or otherwise) to be easily extracted from a crash
dump.

MFC after:	1 week
Discussed with:	brooks, jhb
2008-04-25 17:34:09 +00:00
Sam Leffler
b032f27c36 Multi-bss (aka vap) support for 802.11 devices.
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral).  Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.

Supported by:	Hobnob and Marvell
Reviewed by:	many
Obtained from:	Atheros (some bits)
2008-04-20 20:35:46 +00:00
Brooks Davis
8ca3089abc When sending packets directly to the DHCP server, use a socket and send
directly rather than bogusly sending it out as a link layer broadcast
(which fails to be received on some networks).

PR:		bin/96018
MFC after:	2 weeks
2008-04-15 22:48:56 +00:00
Kirk McKusick
aed3576d26 restore(8) does not check for write failure while building two temp
files containing directory and ownership data. If /tmp fills, the
console is blasted with zillions of "file system full" errors, and
restore continues on, even though directory and/or ownership data
has been lost. This is particularly likely to happen when running
from the live CD, which has little /tmp space.

PR:         bin/93603, also probably bin/107213
Fix from:   Ken Lalonde
2008-04-14 20:15:53 +00:00
Marcel Moolenaar
4d32fcb42b Add the bootcode verb for installing boot code. Boot code
is supported for the MBR, GPT and PC98 schemes, where GPT
installs boot code into the PMBR.
2008-04-13 19:54:54 +00:00
Remko Lodder
afa09cae88 I keep taking timemachines to get back in time. Update the
year to 2008.

Noticed by:	ceri
2008-04-13 11:05:59 +00:00
Remko Lodder
722bb56802 Add missing device in tunefs entry.
PR:		docs/122702
Submitted by:	Yoshihiro Ota <ota@j.email.ne.jp>
MFC After:	3 days
2008-04-13 07:48:05 +00:00
Kirk McKusick
cfbb5cdd50 Avoid printing spurious ``Header with wrong dumpdate.'' message. 2008-04-11 21:51:53 +00:00
Kirk McKusick
c028393d70 Correctly set file group when restore is run by a user other than root. 2008-04-11 21:48:14 +00:00
Xin LI
14320f1e7f Add a new flag, '-C' which enables a special mode that is intended for
catastrophic recovery.  Currently, this mode only validates whether a
cylindergroup has good signature data, and prompts the user to decide
whether to clear it as a whole.

This mode is useful when there is data damage on a disk and you are
working on copy of the original disk, as fsck_ffs(8) tends to abnormally
exit in such case, as a last resort to recover data from the disk.
2008-04-10 23:49:23 +00:00
Ruslan Ermilov
d4f2098b47 Fix printing of sockaddr prefixes in verbose mode.
PR:		bin/122403
Submitted by:	az
MFC after:	3 days
2008-04-10 12:16:20 +00:00
John Baldwin
dc4ee2cf88 Add 'zfs' as an alias for the FreeBSD ZFS UUID.
MFC after:	3 days
PR:		bin/119976
Submitted by:	Cian Hughes  Ci of nhugh.es
2008-04-07 18:23:28 +00:00
Ruslan Ermilov
85018ba57b - Normalize usage(), add "ddb pathname" syntax.
- Revise the manpage.
2008-04-04 07:31:43 +00:00
Craig Rodrigues
205e074f2c Add comment about specifying "ro" mount option when
doing an update mount on a read-only file system.

Requested by:	yar
2008-04-04 01:50:58 +00:00
Warner Losh
52b370fe8e Use safer string handling.
Reviewed by: security-team
2008-04-03 20:37:38 +00:00
Sam Leffler
2fa02c5fb7 Fix handling of create operation together with setting other parameters:
o mark cmds/parameters to indicate they are potential arguments to a clone
  operation (e.g. vlantag)
o when handling a create/clone operation do the callback on seeing the first
  non-clone cmd line argument so the new device is created and can be used;
  and re-setup operating state to reflect the newly created device

Reviewed by:	Eugene Grosbein
MFC after:	2 weeks
2008-03-31 15:38:07 +00:00
Brooks Davis
d5354256b6 Add a new function is_default_interface() which determines if this
interface is one with the default route (or there isn't one).  Use it to
decide if we should adjust the default route and /etc/resolv.conf.

Fix the delete of the default route.  The if statement was totally bogus
and the delete only worked due to a typo. [1]

Reported by:	Jordan Coleman <jordan at JordanColeman dot com> [1]
MFC after:	1 week
2008-03-30 02:42:39 +00:00
Ruslan Ermilov
dbdb679c6f Remove options MK_LIBKSE and DEFAULT_THREAD_LIB now that we no longer
build libkse.  This should fix WITHOUT_LIBTHR builds as a side effect.
2008-03-29 17:44:40 +00:00
Craig Rodrigues
67361fdf2d Remove comment about "-r" flag from readlabel. "-r" is a no-op.
The is comment is left over from the old disklabel command.

Reviewed by:	phk
2008-03-23 03:01:10 +00:00
Sam Leffler
61063e478a Defer state change on disassociate to avoid unnecessarily dropping the
lease: track the current bssid and if it changes (as reported in an
assoc/reassoc) event only then kick the state machine.  This gives us
immediate response when roaming but otherwise causes us to fallback on
the normal state machine.

Reviewed by:	brooks, jhb
MFC after:	3 weeks
2008-03-22 16:24:02 +00:00
Sam Leffler
043f1935e0 correct syslog mask so LOG_DEBUG msgs are not lost
MFC after:	2 weeks
2008-03-22 16:18:07 +00:00
Remko Lodder
6764f54349 In route.c in newroute() there's a call to exit(0) if the command was
'get'. Since rtmsg() always gets called and returns 0 on success and -1
on failure, it's possible to exit with a suitable exit code by calling
exit(ret != 0) instead, as is done at the end of newroute().

PR:		bin/112303
Submitted by:	bruce@cran.org.uk
MFC after:	1 week
2008-03-22 12:50:43 +00:00
Warner Losh
d6aed19dfb No need to be gratuitously style(9) non-compliant here, even though
C++ lets me get away with it.
2008-03-21 20:38:28 +00:00
Remko Lodder
eba8219e9b Replace reference from vinum.8 to gvinum.8, it was advised in the PR to
replace this with vinum.4, but that's the kernel interface manual, which
is not appropriate in my understanding.  I think that gvinum is a suitable
replacement for this.

PR:		docs/121938
Submitted by:	"Federico" <federicogalvezdurand at yahoo dot com>
MFC after:	3 days
2008-03-21 20:16:25 +00:00
Poul-Henning Kamp
72d945abcc Add a "spindown" facility to ata-disks: If no requests have been received
for a configurable number of seconds, spin the disk down.  Spin it back
up on the next request.

Notice that the timeout is only armed by a request, so to spin down a
disk you may have to do:

	atacontrol spindown ad10 5
	dd if=/dev/ad10 of=/dev/null count=1

To disable spindown, set timeout to zero:

	atacontrol spindown ad10 0

In order to debug any trouble caused, this code is somewhat noisy on the
console.

Enabling spindown on a disk containing / or /var/log/messages is not
going to do anything sensible.

Spinning a disk up and down all the time will wear it out, use sensibly.

Approved by:	sos
2008-03-17 10:33:23 +00:00
Poul-Henning Kamp
f64275189c Un-cut&paste argument processing, fix things lint found. 2008-03-16 17:54:55 +00:00
Christian Brueffer
51b6df2d16 - Use an uppercase provider name in the example, to make the name change
after labeling the provider more obvious. (1)
- Correct nomenclature usage

PR:		121487 (1)
Submitted by:	Anatoly Borodin <anatoly.borodin@gmail.com>
MFC after:	3 days
2008-03-13 15:37:02 +00:00
Tom McLaughlin
3592acb12e - Update with a better example which shows that options specific to a
file system may be passed using -o.

Approved by:	remko, rodrigc
2008-03-12 02:09:22 +00:00
Tom McLaughlin
a818f1140f - Also change the /sbin/mount_unionfs line I managed to miss just two
lines down to '-o below'.

Approved by:	remko
Noticed by:	rodrigc
Pointyhat by:	me
2008-03-10 20:44:27 +00:00
Tom McLaughlin
39d63c55af - unionfs -b option is deprecated in favor of '-o below' as per
mount_unionfs(8).

Approved by:	remko
2008-03-10 19:03:55 +00:00
Christian Brueffer
058ddc3099 Fix typos.
PR:		121486
Submitted by:	Anatoly Borodin <anatoly.borodin@gmail.com>
MFC after:	3 days
2008-03-08 12:13:00 +00:00
Xin LI
bc69d66f2f Make it possible to build glabel into rescue geom(8) utility.
Ok'ed by:	marcel
No objection:	-current@
2008-03-05 23:31:49 +00:00
Xin LI
a6a568708b Use calloc(). 2008-03-05 23:17:19 +00:00
Brooks Davis
14084ab9bb Add the ability to read a file of commands to ddb(8) modeled after the
feature in ipfw(8).
2008-03-05 17:51:06 +00:00
Pawel Jakub Dawidek
081632cffb Add info about few missing GEOM classes that use geom(8). 2008-03-05 11:51:13 +00:00
Craig Rodrigues
d8f7b008a7 For a mounted file system which is read-only, when
doing the MNT_RELOAD, pass in "ro" and "update"
string mount options to nmount() instead of MNT_RDONLY and MNT_UPDATE flags.

Due to the complexity of the mount parsing code especially
with respect to the root file system, passing in MNT_RDONLY and MNT_UPDATE
flags would do weird things and would cause fsck to convert the root
file system from a read-only mount to read-write.

To test:
 - boot into single user mode
 - show mounted file systems with: mount
 - root file system should be mounted read-only
 - fsck /
 - show mounted file systems with: mount
 - root file system should still be mounted read-only

PR:		120319
MFC after:	1 month
Reported by:	yar
2008-03-05 08:25:49 +00:00
Craig Rodrigues
22a122f315 Remove hacks to filter out MNT_ROOTFS, since we now
do that internally inside nmount() in revision 1.267 of vfs_mount.c.
2008-03-05 06:24:42 +00:00
Sam Leffler
5ce09a9e9c explain that you must set a default transmit key for WEP
Submitted by:	Jeremie Le Hen <jeremie@le-hen.org>
MFC after:	1 week
2008-02-29 20:42:17 +00:00
David Malone
2b2c3b23d1 Dummynet has a limit of 100 slots queue size (or 1MB, if you give
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.

(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)

Note these sysctls in the man page and warn against increasing them
without thinking first.

MFC after:      3 weeks
2008-02-27 13:52:33 +00:00
Xin LI
1aaf5f440f In pass1(), cap inosused to fs_ipg rather than allowing arbitrary
number read from cylinder group.  Chances that we read a smarshed
cylinder group, and we can not 100% trust information it has
supplied.  fsck_ffs(8) will crash otherwise for some cases.
2008-02-26 03:08:22 +00:00
Xin LI
33663c7246 In pass2check(): Be more strict with the inode information before further
processing the information.  chk1 is more prone to crash when insane
information is provided by the on-disk inode, and does not even work
if the inode is being smarshed badly.
2008-02-26 03:05:48 +00:00
Xin LI
8f0931174a Be more careful when checking superblock. We have already checked
whether fs_bsize is larger than MINBSIZE, which is larger than the
value that is used to compared with fs_bsize, the sizeof fs, so the
check followed, will be always true.

By inspecting the code and some old commit log, I believe that the
check must be that *fs_sbsize* is larger than sizeof fs.  We round
up the size to nearest dev_bsize, as the smallest accepted fs_sbsize,
personally, I think this can be even changed to equal, because this
number is mostly an invariant in file systems.

With this check, fsck_ffs(8) will be more picky and has better
chance rejecting bad first superblock rather than referring to bad
value it supplied, thus gives better chance for it to check the
filesystem carefully.
2008-02-26 03:03:17 +00:00
Mike Silbersack
de625c605d Decrease ping6's minimum allowed interval
from .01 to .000001.

Note that due to the architecture of ping6,
you are still limited to kern.hz pings per
second.

MFC after: 2 weeks
2008-02-25 10:45:25 +00:00
Paolo Pisati
f94a7fc0b5 Add table/tablearg support to ipfw's nat.
MFC After: 1 week
2008-02-24 15:37:45 +00:00
Paolo Pisati
d956bdf35e -Fix display of nat range.
-Whitespace elimination.

Bug spotted by: Luiz Otavio O Souza
MFC After: 3 days
2008-02-21 22:55:54 +00:00
Ruslan Ermilov
b7498df286 getopt(3) returns -1, not EOF. 2008-02-19 07:09:19 +00:00
Yaroslav Tykhiy
c6446de05d Undo the damage I did in sys/kern/vfs_mount.c #1.274 and
sbin/mount_nfs/mount_nfs.c #1.76.  Let the dragons sleep.

Requested by:	rodrigc, des
PR:		kern/120319 (welcome the bug back)
2008-02-18 20:58:57 +00:00