273 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
911a2aa37a Use int16 for error.
MFC after:	1 week
2011-01-22 22:33:27 +00:00
Pawel Jakub Dawidek
5ed118d861 - On primary worker reload, update hr_exec field.
- Update comment.

MFC after:	1 week
2011-01-22 22:31:55 +00:00
Pawel Jakub Dawidek
ac7b0b09f3 execve(2), not fork(2) resets signal handler to the default value (if it isn't
ignored). Correct comment talking about that.

Pointed out by:	kib
MFC after:	3 days
2011-01-12 16:16:54 +00:00
Pawel Jakub Dawidek
bcaa0b6789 Add a note that when custom signal handler is installed for a signal,
signal action is restored to default in child after fork(2).
In this case there is no need to do anything with dummy SIGCHLD handler,
because after fork(2) it will be automatically reverted to SIG_IGN.

Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	3 days
2011-01-12 14:38:17 +00:00
Pawel Jakub Dawidek
9cc97e5803 Install default signal handlers before masking signals we want to handle.
It is possible that the parent process ignores some of them and sigtimedwait()
will never see them, eventhough they are masked.

The most common situation for this to happen is boot process where init(8)
ignores SIGHUP before starting to execute /etc/rc. This in turn caused
hastd(8) to ignore SIGHUP.

Reported by:	trasz
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	3 days
2011-01-12 14:35:29 +00:00
Pawel Jakub Dawidek
a7130d73a6 Detect when resource is configured more than once.
MFC after:	3 days
2010-12-26 19:08:41 +00:00
Pawel Jakub Dawidek
66db33a13b When node-specific configuration is missing in resource section, provide
more useful information. Instead of:

	hastd: remote address not configured for resource foo

Print the following:

	No resource foo configuration for this node (acceptable node names: freefall, freefall.freebsd.org, 44333332-4c44-4e31-4a30-313920202020).

MFC after:	3 days
2010-12-26 19:07:58 +00:00
Pawel Jakub Dawidek
fba1bf5a2c The 'ret' variable is of type ssize_t and we use proper format for it (%zd), so
no (bogus) cast is needed.

MFC after:	3 days
2010-12-16 19:48:03 +00:00
Pawel Jakub Dawidek
cd7b7ee577 Improve problems logging.
MFC after:	3 days
2010-12-16 07:30:47 +00:00
Pawel Jakub Dawidek
7208920499 Don't ignore errors from remote requests.
MFC after:	3 days
2010-12-16 07:29:58 +00:00
Pawel Jakub Dawidek
347bde360a Log the fact of launching and include protocol version number.
MFC after:	3 days
2010-12-16 07:28:40 +00:00
Rebecca Cran
e267ef95d5 Don't generate input() since it's not used. 2010-11-22 14:16:22 +00:00
Pawel Jakub Dawidek
d448536ceb Move timeout.tv_sec initialization outside the loop - sigtimedwait(2) won't
modify it.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-15 03:07:42 +00:00
Pawel Jakub Dawidek
1dd5a4bfa2 1. Exit when we cannot create incoming connection.
2. Improve logging to inform which connection can't be created.

Submitted by:	[1] Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-15 03:05:33 +00:00
Pawel Jakub Dawidek
448efa9421 Send packets to remote node only via the send thread to avoid possible
races - in this case a keepalive packet was send from wrong thread which
lead to connection dropping, because of corrupted packet.

Fix it by sending keepalive packets directly from the send thread.
As a bonus we now send keepalive packets only when connection is idle.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-02 22:13:08 +00:00
Pawel Jakub Dawidek
ce837469ba Before this change on first connect between primary and secondary we
initialize all the data. This is huge waste of time and resources if
there were no writes yet, as there is no real data to synchronize.

Optimize this by sending "virgin" argument to secondary, which gives it a hint
that synchronization is not needed.

In the common case (where noth nodes are configured at the same time) instead
of synchronizing everything, we don't synchronize at all.

MFC after:	1 week
2010-10-24 17:28:25 +00:00
Pawel Jakub Dawidek
b9ffbb0a94 Implement nv_exists() function that returns true if argument of the given
name exists.

MFC after:	3 days
2010-10-24 17:24:08 +00:00
Pawel Jakub Dawidek
3dea75d2a8 Move all NV defines into nv.c, they are not used externally thus there is
no need to make then visible from outside.

MFC after:	3 days
2010-10-24 17:22:34 +00:00
Pawel Jakub Dawidek
1f39b27946 Simplify code a bit.
MFC after:	3 days
2010-10-24 15:44:23 +00:00
Pawel Jakub Dawidek
d7be7905ae Plug memory leak.
MFC after:	3 days
2010-10-24 15:42:16 +00:00
Pawel Jakub Dawidek
584a9bc3f8 Plug memory leaks.
Found with:	valgrind
MFC after:	3 days
2010-10-24 15:41:23 +00:00
Pawel Jakub Dawidek
2964aeb34a Load geom_gate.ko module after parsing arguments.
MFC after:	3 days
2010-10-24 15:38:58 +00:00
Pawel Jakub Dawidek
6c71649c5f Use closefrom(2) instead of close(2) in a loop.
MFC after:	1 week
2010-10-20 21:10:01 +00:00
Pawel Jakub Dawidek
3f562cce40 Log correct connection when canceling half-open connection.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-17 15:47:27 +00:00
Pawel Jakub Dawidek
bb317aa6ea Use one fprintf() instead of two.
MFC after:	3 days
2010-10-16 22:50:12 +00:00
Pawel Jakub Dawidek
c0a124e6ce Clear signal mask before executing a hook.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-16 22:48:48 +00:00
Pawel Jakub Dawidek
51c63dce86 We can't zero out ggio request, as we have some fields in there we initialize
once during start-up.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-08 15:05:39 +00:00
Pawel Jakub Dawidek
022f07b682 We close the event socketpair early in the mainloop to prevent spaming with
error messages, so when we clean up after child process, we have to check if
the event socketpair is still there.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-08 15:02:15 +00:00
Pawel Jakub Dawidek
4e47b646bb Clear ggate structures before using them. We don't initialize all the field
and there can be some garbage from the stack.

MFC after:	1 week
2010-10-07 18:23:28 +00:00
Pawel Jakub Dawidek
783ee75392 Log error message when we fail to destroy ggate provider.
MFC after:	3 days
2010-10-07 18:20:16 +00:00
Pawel Jakub Dawidek
4a88128b01 Start the guard thread first, so we can handle signals from the very begining.
Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2010-10-07 18:19:02 +00:00
Pawel Jakub Dawidek
b46198a5db Don't close local component on exit as we can hang waiting on g_waitidle.
I'm unable to reproduce the race described in comment anymore and also the
comment is incorrect - localfd represents local component from configuration
file, eg. /dev/da0 and not HAST provider.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2010-10-07 18:16:22 +00:00
Pawel Jakub Dawidek
428ad0a9c4 Decrease report interval to 5 seconds, as this also means we will check for
signals every 5 seconds and not every 10 seconds as before.

MFC after:	3 days
2010-10-04 21:44:26 +00:00
Pawel Jakub Dawidek
5f24b330df hook_check() is now only used to report about long-running hooks, so the
argument is redundant, remove it.

MFC after:	3 days
2010-10-04 21:43:06 +00:00
Pawel Jakub Dawidek
41013c0b21 We can't mask ignored signal, so install dummy signal hander for SIGCHLD before
masking it.

This fixes bogus reports about hooks running for too long and other problems
related to garbage-collecting child processes.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-04 21:41:18 +00:00
Pawel Jakub Dawidek
b71de2e057 Plug memory leak on fork(2) failure.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-26 10:39:01 +00:00
Pawel Jakub Dawidek
9dd5a6cb0f Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:08:11 +00:00
Pawel Jakub Dawidek
196abd3518 Assert that descriptor numbers are sane.
MFC after:	3 days
2010-09-22 19:05:54 +00:00
Pawel Jakub Dawidek
8b70e6ae9c Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.

The fix is to start the control thread before sending any events.

Reported and fix suggested by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:03:11 +00:00
Pawel Jakub Dawidek
0c24d8e2a1 Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.

MFC after:	3 days
2010-09-22 18:57:06 +00:00
Pawel Jakub Dawidek
c56cf19ebf If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.

MFC after:	3 days
2010-09-22 18:39:43 +00:00
Pawel Jakub Dawidek
351b9a37a4 Sort includes.
MFC after:	3 days
2010-09-22 18:38:02 +00:00
Pawel Jakub Dawidek
e43e02f1a4 Add __dead2 to functions that we know they are going to exit.
MFC after:	3 days
2010-09-20 13:23:43 +00:00
Pawel Jakub Dawidek
6d19256b15 Include process PID in log messages.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:05:13 +00:00
Pawel Jakub Dawidek
8ecdeae9d9 Correct error message.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:03:29 +00:00
Pawel Jakub Dawidek
71c895eb1f Forgot to add event.c and event.h in r212038.
Pointed out by:	pluknet <pluknet@gmail.com>
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 09:38:43 +00:00
Pawel Jakub Dawidek
852ac373cb Mask only those signals that we want to handle.
Suggested by:	jilles
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 06:22:03 +00:00
Pawel Jakub Dawidek
5bdff860e7 Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.

Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.

This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:26:10 +00:00
Pawel Jakub Dawidek
6b276294af We only want to know if descriptors are ready for reading.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:19:21 +00:00
Pawel Jakub Dawidek
eea2deaad0 When someone gives NULL as data, assume this is because he want to declare
connection side only.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:16:45 +00:00