90 Commits

Author SHA1 Message Date
pjd
b85b0868d9 hook_check() is now only used to report about long-running hooks, so the
argument is redundant, remove it.

MFC after:	3 days
2010-10-04 21:43:06 +00:00
pjd
0651a7ac68 We can't mask ignored signal, so install dummy signal hander for SIGCHLD before
masking it.

This fixes bogus reports about hooks running for too long and other problems
related to garbage-collecting child processes.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-04 21:41:18 +00:00
pjd
22936fe435 Plug memory leak on fork(2) failure.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-26 10:39:01 +00:00
pjd
67279d16ee Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:08:11 +00:00
pjd
2eee4ca70d Assert that descriptor numbers are sane.
MFC after:	3 days
2010-09-22 19:05:54 +00:00
pjd
9433a082e8 Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.

The fix is to start the control thread before sending any events.

Reported and fix suggested by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:03:11 +00:00
pjd
3657e3ff87 Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.

MFC after:	3 days
2010-09-22 18:57:06 +00:00
pjd
33133813bc If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.

MFC after:	3 days
2010-09-22 18:39:43 +00:00
pjd
e7991e6689 Sort includes.
MFC after:	3 days
2010-09-22 18:38:02 +00:00
pjd
999124921a Add __dead2 to functions that we know they are going to exit.
MFC after:	3 days
2010-09-20 13:23:43 +00:00
pjd
1c05a32422 Include process PID in log messages.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:05:13 +00:00
pjd
d9a5627136 Correct error message.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:03:29 +00:00
pjd
fdecdfad04 Forgot to add event.c and event.h in r212038.
Pointed out by:	pluknet <pluknet@gmail.com>
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 09:38:43 +00:00
pjd
7476d01cc9 Mask only those signals that we want to handle.
Suggested by:	jilles
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 06:22:03 +00:00
pjd
9b4ae63e78 Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.

Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.

This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:26:10 +00:00
pjd
8a7b72b9d3 We only want to know if descriptors are ready for reading.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:19:21 +00:00
pjd
95ca781a2e When someone gives NULL as data, assume this is because he want to declare
connection side only.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:16:45 +00:00
pjd
107a540b8b Use pjdlog_exit() before fork().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 22:28:04 +00:00
pjd
f116a70c0d Constify arguments we can constify.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 22:26:42 +00:00
pjd
e14a354a91 Execute hook when connection between the nodes is established or lost.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:31:30 +00:00
pjd
2357642204 Execute hook when split-brain is detected.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:12:10 +00:00
pjd
72d737839c Use sigtimedwait(2) for signals handling in primary process.
This fixes various races and eliminates use of pthread* API in signal handler.

Pointed out by:	kib
With help from:	jilles
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:06:05 +00:00
pjd
4ad9896077 - Move functionality responsible for checking one connection to separate
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 22:55:21 +00:00
pjd
39e5544fc3 Disconnect after logging errors.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 22:17:53 +00:00
pjd
1d4a51dd2d - Call hook on role change.
- Document new event.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:42:45 +00:00
pjd
70a52f0307 Allow to run hooks from the main hastd process.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:41:53 +00:00
pjd
9a66bc9a30 - Add hook_fini() which should be called after fork() from the main hastd
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
  and once it can recognize it, it will pass pid and status to hook_check_one().

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:39:49 +00:00
pjd
4a3477caff Implement mtx_destroy() and rw_destroy().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:37:21 +00:00
pjd
98dd369bdd When SIGTERM or SIGINT is received, terminate worker processes.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:28:02 +00:00
pjd
db793cba89 When logging to stdout/stderr, flush after each log.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:26:55 +00:00
pjd
ae9ec59c50 Correct when we log interrupted synchronization.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:20:32 +00:00
pjd
e55034b622 Check if no signals were delivered just before going to sleep.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 20:49:06 +00:00
pjd
bd949b7dfc Add hooks execution.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 20:48:12 +00:00
pjd
4b6cfc055c Document new 'exec' parameter.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 15:20:31 +00:00
pjd
74741a8c60 Allow to execute specified program on various HAST events.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 15:16:52 +00:00
pjd
aeab5efe07 - Run hooks in background - don't block waiting for them to finish.
- Keep all hooks we're running in a global list, so we can report when
  they finish and also report when they are running for too long.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:38:12 +00:00
pjd
dd3961e615 When logging to stdout/stderr don't close those descriptors after fork().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:35:39 +00:00
pjd
ac5c9c9216 Reduce indent where possible.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:28:39 +00:00
pjd
ead19aaef1 Implement keepalive mechanism inside HAST protocol so we can detect secondary
node failures quickly for HAST resources that are rarely modified.

Remove XXX from a comment now that the guard thread never sleeps infinitely.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:26:37 +00:00
pjd
8729a28322 - Remove redundant and incorrect 'old' word from debug message.
- Log disconnects as warnings.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:12:53 +00:00
pjd
0a7a46d1e3 Don't increase number synchronized bytes in case of an error.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:10:25 +00:00
pjd
a3721f8d1b Log that synchronization was interrupted in a proper place.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:08:10 +00:00
pjd
b51d684000 We have sync_start() function to start synchronization, introduce sync_stop()
function to stop it.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:06:00 +00:00
pjd
79f0171a3e Add QUEUE_INSERT() and QUEUE_TAKE() macros that simplify the code a bit.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:01:28 +00:00
pjd
ef9c1a15b4 Add mtx_owned() implementation.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 13:58:38 +00:00
pjd
29f3bd82d2 Make comment more readable.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 13:54:17 +00:00
pjd
395a43623f For some setups sending data in 128kB chunks makes communication very slow. No
idea why. 32kB on the other hand seems to work properly everywhere.

Reported by:	Thomas Steen Rasmussen <thomas@gibfest.dk>
MFC after:	3 weeks
2010-08-18 12:09:27 +00:00
pjd
46021d25fa The 'size' variable is there to limit how many bytes we want to copy from
'addr'. It is very likely that size of 'addr' is larger than 'size', so checking
strlcpy() return value is bogus.

MFC after:	3 weeks
2010-08-16 21:59:56 +00:00
joel
dd1fff9bcb Fix typos, spelling, formatting and mdoc mistakes found by Nobuyuki while
translating these manual pages.  Minor corrections by me.

Submitted by:	Nobuyuki Koganemaru <n-kogane@syd.odn.ne.jp>
2010-08-16 15:18:30 +00:00
pjd
d71ba1ed02 Document 'none' value for remote.
Reviewed by:	dougb
MFC after:	1 month
2010-08-05 19:54:57 +00:00