because we need to do ioctl(2)s, which are not permitted in the capability
mode. What we do now is to chroot(2) to /var/empty, which restricts access
to file system name space and we drop privileges to hast user and hast
group.
This still allows to access to other name spaces, like list of processes,
network and sysvipc.
To address that, use jail(2) instead of chroot(2). Using jail(2) will restrict
access to process table, network (we use ip-less jails) and sysvipc (if
security.jail.sysvipc_allowed is turned off). This provides much better
separation.
MFC after: 1 week
hastd process and workers, remove unused one and set different range
of numbers. This is done in order not to confuse them with HASTCTL_CMD
defines, used for conversation between hastctl and hastd, and to avoid
bugs like the one fixed in in r221075.
Approved by: pjd (mentor)
MFC after: 1 week
secondary role. It is possible that the remote node is primary, but only
because there was a role change and it didn't finish cleaning up (unmounting
file systems, etc.). If we detect such situation, wait for the remote node
to switch the role to secondary before accepting I/Os. If we don't wait for
it in that case, we will most likely cause split-brain.
MFC after: 1 week
- We have two nodes connected and synchronized (local counters on both sides
are 0).
- We take secondary down and recreate it.
- Primary connects to it and starts synchronization (but local counters are
still 0).
- We switch the roles.
- Synchronization restarts but data is synchronized now from new primary
(because local counters are 0) that doesn't have new data yet.
This fix this issue we bump local counter on primary when we discover that
connected secondary was recreated and has no data yet.
Reported by: trociny
Discussed with: trociny
Tested by: trociny
MFC after: 1 week
hast_proto_recv_hdr() may be used. This also fixes the issue
(introduced by r220523) with hastctl, which crashed on assert in
hast_proto_recv_data().
Suggested and approved by: pjd (mentor)
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
MFC after: 1 month
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
MFC after: 1 week
equal to secondary counters:
primary_localcnt = secondary_remotecnt
primary_remotecnt = secondary_localcnt
Previously it was done wrong and split-brain was observed after
primary had synchronized up-to-date data from secondary.
Approved by: pjd (mentor)
MFC after: 1 week
We can use capsicum for secondary worker processes and hastctl.
When working as primary we drop privileges using chroot+setgid+setuid
still as we need to send ioctl(2)s to ggate device, for which capsicum
doesn't allow (yet).
X-MFC after: capsicum is merged to stable/8
our info about worker processes if any of them was terminated in the meantime.
This fixes the problem with 'hastctl status' running from a hook called on
split-brain:
1. Secondary calls a hooks and terminates.
2. Hook asks for resource status via 'hastctl status'.
3. The main hastd handles the status request by sending it to the secondary
worker who is already dead, but because signals weren't checked yet he
doesn't know that and we get EPIPE.
MFC after: 1 week
This way we know how to connect to secondary node when we are primary.
The same variable is used by the secondary node - it only accepts
connections from the address stored in 'remote' variable.
In cluster configurations it is common that each node has its individual
IP address and there is one addtional shared IP address which is assigned
to primary node. It seems it is possible that if the shared IP address is
from the same network as the individual IP address it might be choosen by
the kernel as a source address for connection with the secondary node.
Such connection will be rejected by secondary, as it doesn't come from
primary node individual IP.
Add 'source' variable that allows to specify source IP address we want to
bind to before connecting to the secondary node.
MFC after: 1 week
connection so the worker will exit if it does not receive packets from
the primary during this interval.
Reported by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Tested by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Approved by: pjd (mentor)
MFC after: 1 week
- Load support for %T for pritning time.
- Add support for %N for printing number in human readable form.
- Add support for %S for printing sockaddr structure (currently only AF_INET
family is supported, as this is all we need in HAST).
- Disable gcc compile-time format checking as this will no longer work.
MFC after: 2 weeks
- HOLE - it simply turns all-zero blocks into few bytes header;
it is extremely fast, so it is turned on by default;
it is mostly intended to speed up initial synchronization
where we expect many zeros;
- LZF - very fast algorithm by Marc Alexander Lehmann, which shows
very decent compression ratio and has BSD license.
MFC after: 2 weeks
1. The descriptor is the one we are listening on (not the one when we connect
as a client and not the one which is created on accept(2)).
2. Descriptor was created by us (PID matches with the PID stored on bind(2)).
Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week
worker can ask the main privileged process to connect in worker's behalf
and then we can migrate descriptor using this socketpair to worker.
This is not really needed now, but will be needed once we start to use
capsicum for sandboxing.
MFC after: 1 week
proto_connection_{send,recv} and change them to return proto_conn
structure. We don't operate directly on descriptors, but on
proto_conns.
- Add wrap method to wrap descriptor with proto_conn.
- Remove methods to send and receive descriptors and implement this
functionality as additional argument to send and receive methods.
MFC after: 1 week
If timeout argument to proto_connect() is -1, then the caller needs to use
this new function to wait for connection.
This change is in preparation for capsicum, where sandboxed worker wants
to ask main process to connect in worker's behalf and pass descriptor
to the worker. Because we don't want the main process to wait for the
connection, it will start async connection and pass descriptor to the
worker who will be responsible for waiting for the connection to finish.
MFC after: 1 week
to syslog if we run in background.
- Asserts in proto.c that method we want to call is implemented and remove
dummy methods from protocols implementation that are only there to abort
the program with nice message.
MFC after: 1 week
Accepting connections and handshaking in secondary is still done before
dropping privileges. It should be implemented by only accepting connections in
privileged main process and passing connection descriptors to the worker, but
is not implemented yet.
MFC after: 1 week
- chrooting to /var/empty (user hast home directory),
- setting groups to 'hast' (user hast primary group),
- setting real group id, effective group id and saved group id to 'hast',
- setting real user id, effective user id and saved user id to 'hast'.
At the end verify that those operations where successfull.
MFC after: 1 week
we expect to be open. Also assert that they point at expected type.
Because openlog(3) API is unable to tell us descriptor number it is using, we
have to close syslog socket, remember assert message in local buffer and if we
fail on assertion, reopen syslog socket and log the message.
MFC after: 1 week
PJDLOG_RVERIFY() - always check expression and on false log the given message
and exit.
PJDLOG_RASSERT() - check expression when NDEBUG is not defined and on false log
given message and exit.
PJDLOG_ABORT() - log the given message and exit.
MFC after: 1 week
master process only and pass changes to the worker processes over control
socket. This removes access to global namespace in preparation for capsicum
sandboxing.
MFC after: 2 weeks
signal action is restored to default in child after fork(2).
In this case there is no need to do anything with dummy SIGCHLD handler,
because after fork(2) it will be automatically reverted to SIG_IGN.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 3 days
It is possible that the parent process ignores some of them and sigtimedwait()
will never see them, eventhough they are masked.
The most common situation for this to happen is boot process where init(8)
ignores SIGHUP before starting to execute /etc/rc. This in turn caused
hastd(8) to ignore SIGHUP.
Reported by: trasz
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 3 days
more useful information. Instead of:
hastd: remote address not configured for resource foo
Print the following:
No resource foo configuration for this node (acceptable node names: freefall, freefall.freebsd.org, 44333332-4c44-4e31-4a30-313920202020).
MFC after: 3 days
races - in this case a keepalive packet was send from wrong thread which
lead to connection dropping, because of corrupted packet.
Fix it by sending keepalive packets directly from the send thread.
As a bonus we now send keepalive packets only when connection is idle.
Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days
initialize all the data. This is huge waste of time and resources if
there were no writes yet, as there is no real data to synchronize.
Optimize this by sending "virgin" argument to secondary, which gives it a hint
that synchronization is not needed.
In the common case (where noth nodes are configured at the same time) instead
of synchronizing everything, we don't synchronize at all.
MFC after: 1 week
error messages, so when we clean up after child process, we have to check if
the event socketpair is still there.
Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days
I'm unable to reproduce the race described in comment anymore and also the
comment is incorrect - localfd represents local component from configuration
file, eg. /dev/da0 and not HAST provider.
Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 1 week
masking it.
This fixes bogus reports about hooks running for too long and other problems
related to garbage-collecting child processes.
Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.
Reported by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.
The fix is to start the control thread before sending any events.
Reported and fix suggested by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.
Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.
This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
This fixes various races and eliminates use of pthread* API in signal handler.
Pointed out by: kib
With help from: jilles
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
and once it can recognize it, it will pass pid and status to hook_check_one().
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
- Keep all hooks we're running in a global list, so we can report when
they finish and also report when they are running for too long.
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
node failures quickly for HAST resources that are rarely modified.
Remove XXX from a comment now that the guard thread never sleeps infinitely.
MFC after: 2 weeks
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
- Load added resources.
- Stop and forget removed resources.
- Update modified resources in least intrusive way, ie. don't touch
/dev/hast/<name> unless path to local component or provider name were
modified.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after: 1 month
- Don't exit on errors if not requested.
- Don't keep configuration in global variable, but allocate memory for
configuration.
- Call yyrestart() before yyparse() so that on error in configuration file
we will start from the begining next time and not from the place we left of.
MFC after: 1 month
PJDLOG_ASSERT() and PJDLOG_VERIFY() that will check the given condition
and log the problem where appropriate. The difference between those
two is that PJDLOG_VERIFY() always work and PJDLOG_ASSERT() can be
turned off by defining NDEBUG.
MFC after: 1 month