Commit Graph

368 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
a61f579394 Provides three states for pjdlog_initialized, so we can also tell that
this is fist initialization ever.

MFC after:	2 weeks
2011-03-07 10:33:52 +00:00
Pawel Jakub Dawidek
8cd3d45ad9 Allow to compress on-the-wire data using two algorithms:
- HOLE - it simply turns all-zero blocks into few bytes header;
	it is extremely fast, so it is turned on by default;
	it is mostly intended to speed up initial synchronization
	where we expect many zeros;
- LZF - very fast algorithm by Marc Alexander Lehmann, which shows
	very decent compression ratio and has BSD license.

MFC after:	2 weeks
2011-03-06 23:09:33 +00:00
Pawel Jakub Dawidek
1fee97b01f Allow to checksum on-the-wire data using either CRC32 or SHA256.
MFC after:	2 weeks
2011-03-06 22:56:14 +00:00
Pawel Jakub Dawidek
493812ee6e When we decide to unlink socket file, sun_path must be set. If it is set,
but there is problem unlinking the file, log a warning.

MFC after:	1 week
2011-02-09 08:01:10 +00:00
Pawel Jakub Dawidek
0d8d37212b Explicitly include <sys/types.h> as suggested by getpid(2) and don't rely on
<sys/un.h> including what's needed.

MFC after:	1 week
2011-02-08 23:16:19 +00:00
Pawel Jakub Dawidek
f431ab182a Unlink UNIX domain socket file only if:
1. The descriptor is the one we are listening on (not the one when we connect
   as a client and not the one which is created on accept(2)).
2. Descriptor was created by us (PID matches with the PID stored on bind(2)).

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2011-02-08 23:08:20 +00:00
Pawel Jakub Dawidek
e84a29b629 Now that we break the loop on fstat(2) failure we no longer need to satisfy
gcc's imperfections.

MFC after:	1 week
2011-02-06 14:17:08 +00:00
Pawel Jakub Dawidek
207ee3cdea Add (void) cast before snprintf(3)s for which we are not interested in return
values.

MFC after:	1 week
2011-02-06 14:09:19 +00:00
Pawel Jakub Dawidek
ee3a876c18 Treat fstat(2) failure (different than EBADF) as fatal error.
Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2011-02-06 14:07:58 +00:00
Pawel Jakub Dawidek
18d6e1a5f6 Open syslog when logging sysconf(3) failure.
Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2011-02-06 14:06:37 +00:00
Pawel Jakub Dawidek
5aa85abd1d Close more descriptors that can be open if the worker process for the given
resource is already running.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2011-02-06 12:21:29 +00:00
Pawel Jakub Dawidek
32ecf62028 Setup another socketpair between parent and child, so that primary sandboxed
worker can ask the main privileged process to connect in worker's behalf
and then we can migrate descriptor using this socketpair to worker.
This is not really needed now, but will be needed once we start to use
capsicum for sandboxing.

MFC after:	1 week
2011-02-03 11:39:49 +00:00
Pawel Jakub Dawidek
21e7bc5e52 Add missing locking after moving keepalive_send() to remote send thread
in r214692.

MFC after:	1 week
2011-02-03 11:33:32 +00:00
Pawel Jakub Dawidek
f4c96f944c Let the caller log info about successful privilege drop.
We don't want to log this in hastctl.

MFC after:	1 week
2011-02-03 10:37:44 +00:00
Pawel Jakub Dawidek
01ab52c021 - Rename proto_descriptor_{send,recv}() functions to
proto_connection_{send,recv} and change them to return proto_conn
  structure. We don't operate directly on descriptors, but on
  proto_conns.
- Add wrap method to wrap descriptor with proto_conn.
- Remove methods to send and receive descriptors and implement this
  functionality as additional argument to send and receive methods.

MFC after:	1 week
2011-02-02 15:53:09 +00:00
Pawel Jakub Dawidek
1c1933226f Add proto_connect_wait() to wait for connection to finish.
If timeout argument to proto_connect() is -1, then the caller needs to use
this new function to wait for connection.

This change is in preparation for capsicum, where sandboxed worker wants
to ask main process to connect in worker's behalf and pass descriptor
to the worker. Because we don't want the main process to wait for the
connection, it will start async connection and pass descriptor to the
worker who will be responsible for waiting for the connection to finish.

MFC after:	1 week
2011-02-02 15:46:28 +00:00
Pawel Jakub Dawidek
9d70b24b93 Allow to specify connection timeout by the caller.
MFC after:	1 week
2011-02-02 15:42:00 +00:00
Pawel Jakub Dawidek
5ee1703532 Move protocol allocation and deallocation to separate functions.
MFC after:	1 week
2011-02-02 15:23:07 +00:00
Pawel Jakub Dawidek
8dd94e231b Be prepared that hp_client or hp_server might be NULL now.
MFC after:	1 week
2011-02-02 08:24:26 +00:00
Pawel Jakub Dawidek
292c424d6e Do not set socket send and receive buffer. It will be auto-tuned.
Confirmed by:	rwatson
MFC after:	1 week
2011-02-01 07:58:43 +00:00
Pawel Jakub Dawidek
94486ae22d Fix build on ia64.
I found no way how to use CMSG_NXTHDR() macro on ia64 without alignment
warnings.

MFC after:	1 week
2011-01-31 23:46:36 +00:00
Pawel Jakub Dawidek
2c450cb873 Until I fix the build on ia64 comment out problematic lines.
Those lines are part of the (for now) unused functions.
2011-01-31 23:08:26 +00:00
Pawel Jakub Dawidek
8046c499ab Implement two new functions for sending descriptor and receving descriptor
over UNIX domain sockets and socket pairs.
This is in preparation for capsicum.

MFC after:	1 week
2011-01-31 18:35:17 +00:00
Pawel Jakub Dawidek
2ec483c58e - Use pjdlog for assertions and aborts as this will log assert/abort message
to syslog if we run in background.
- Asserts in proto.c that method we want to call is implemented and remove
  dummy methods from protocols implementation that are only there to abort
  the program with nice message.

MFC after:	1 week
2011-01-31 18:32:17 +00:00
Pawel Jakub Dawidek
05a6b8de87 Rename pjdlog_verify() to pjdlog_abort() as it better describes what the
the function does and mark it with __dead2.

MFC after:	1 week
2011-01-31 15:52:00 +00:00
Pawel Jakub Dawidek
6d7967de8a Drop privileges in worker processes.
Accepting connections and handshaking in secondary is still done before
dropping privileges. It should be implemented by only accepting connections in
privileged main process and passing connection descriptors to the worker, but
is not implemented yet.

MFC after:	1 week
2011-01-28 22:35:46 +00:00
Pawel Jakub Dawidek
49499e981e Implement function that drops privileges by:
- chrooting to /var/empty (user hast home directory),
- setting groups to 'hast' (user hast primary group),
- setting real group id, effective group id and saved group id to 'hast',
- setting real user id, effective user id and saved user id to 'hast'.
At the end verify that those operations where successfull.

MFC after:	1 week
2011-01-28 22:33:47 +00:00
Pawel Jakub Dawidek
f463896e5e Use newly added descriptors_assert() function to ensure only expected
descriptors are open.

MFC after:	1 week
2011-01-28 21:57:42 +00:00
Pawel Jakub Dawidek
579fd4b2ff Add function to assert that the only descriptors we have open are the ones
we expect to be open. Also assert that they point at expected type.

Because openlog(3) API is unable to tell us descriptor number it is using, we
have to close syslog socket, remember assert message in local buffer and if we
fail on assertion, reopen syslog socket and log the message.

MFC after:	1 week
2011-01-28 21:56:47 +00:00
Pawel Jakub Dawidek
da1783ea29 Close all unneeded descriptors after fork(2).
MFC after:	1 week
2011-01-28 21:52:37 +00:00
Pawel Jakub Dawidek
d64c0992e4 Add comments to places where we treat errors as ciritical, but it is possible
to handle them more gracefully.

MFC after:	1 week
2011-01-28 21:51:40 +00:00
Pawel Jakub Dawidek
c3c56f8e41 Add function to close all unneeded descriptors after fork(2).
MFC after:	1 week
2011-01-28 21:48:15 +00:00
Pawel Jakub Dawidek
70db96bf67 Initialize all global variables on pjdlog_init().
MFC after:	1 week
2011-01-28 21:36:01 +00:00
Pawel Jakub Dawidek
19654a238e Remember created control connection so on fork(2) we can close it in child.
Found with:	procstat(1)
MFC after:	1 week
2011-01-27 19:33:57 +00:00
Pawel Jakub Dawidek
c0dbce0016 Close the control socket before exiting, so it will be unlinked.
MFC after:	1 week
2011-01-27 19:31:35 +00:00
Pawel Jakub Dawidek
94bf851dc1 Extend pjdlog_verify() to support the following additional macros:
PJDLOG_RVERIFY() - always check expression and on false log the given message
	and exit.
PJDLOG_RASSERT() - check expression when NDEBUG is not defined and on false log
	given message and exit.
PJDLOG_ABORT() - log the given message and exit.

MFC after:	1 week
2011-01-27 19:28:29 +00:00
Pawel Jakub Dawidek
eeb3cd677d Add functions to initialize/finalize pjdlog. This allows to open/close log
file at will.

MFC after:	1 week
2011-01-27 19:24:07 +00:00
Pawel Jakub Dawidek
6ef7ddd788 Use my copyright for 2011 work.
MFC after:	1 week
2011-01-27 19:18:42 +00:00
Pawel Jakub Dawidek
c62457374f Add LOG_NDELAY flag to openlog(3) - we want descriptor to be immediately open
so there are no surprises once we start chrooting or using capsicum.

MFC after:	1 week
2011-01-27 19:15:25 +00:00
Pawel Jakub Dawidek
c1410d7a90 - Remove obvious NOTREACHED comment after abort() call.
- Remove redundant newline at the end of the file.

MFC after:	1 week
2011-01-27 19:12:44 +00:00
Pawel Jakub Dawidek
6062588f8d Remove __dead2 from pjdlog_verify() prototype, it does return sometimes.
MFC after:	1 week
2011-01-27 19:10:24 +00:00
Pawel Jakub Dawidek
115f4e5c3e Don't open configuration file from worker process. Handle SIGHUP in the
master process only and pass changes to the worker processes over control
socket. This removes access to global namespace in preparation for capsicum
sandboxing.

MFC after:	2 weeks
2011-01-24 15:04:15 +00:00
Pawel Jakub Dawidek
79e82fe290 Add missing logs.
MFC after:	1 week
2011-01-22 23:30:01 +00:00
Pawel Jakub Dawidek
eed4e65fdb Add nv_assert() which allows to assert that the given name exists.
MFC after:	1 week
2011-01-22 22:38:18 +00:00
Pawel Jakub Dawidek
09d6ae1b34 Use more consistent function name with the others (pjdlogv_prefix_set()
instead of pjdlog_prefix_setv()).

MFC after:	1 week
2011-01-22 22:35:08 +00:00
Pawel Jakub Dawidek
911a2aa37a Use int16 for error.
MFC after:	1 week
2011-01-22 22:33:27 +00:00
Pawel Jakub Dawidek
5ed118d861 - On primary worker reload, update hr_exec field.
- Update comment.

MFC after:	1 week
2011-01-22 22:31:55 +00:00
Pawel Jakub Dawidek
ac7b0b09f3 execve(2), not fork(2) resets signal handler to the default value (if it isn't
ignored). Correct comment talking about that.

Pointed out by:	kib
MFC after:	3 days
2011-01-12 16:16:54 +00:00
Pawel Jakub Dawidek
bcaa0b6789 Add a note that when custom signal handler is installed for a signal,
signal action is restored to default in child after fork(2).
In this case there is no need to do anything with dummy SIGCHLD handler,
because after fork(2) it will be automatically reverted to SIG_IGN.

Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	3 days
2011-01-12 14:38:17 +00:00
Pawel Jakub Dawidek
9cc97e5803 Install default signal handlers before masking signals we want to handle.
It is possible that the parent process ignores some of them and sigtimedwait()
will never see them, eventhough they are masked.

The most common situation for this to happen is boot process where init(8)
ignores SIGHUP before starting to execute /etc/rc. This in turn caused
hastd(8) to ignore SIGHUP.

Reported by:	trasz
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	3 days
2011-01-12 14:35:29 +00:00
Pawel Jakub Dawidek
a7130d73a6 Detect when resource is configured more than once.
MFC after:	3 days
2010-12-26 19:08:41 +00:00
Pawel Jakub Dawidek
66db33a13b When node-specific configuration is missing in resource section, provide
more useful information. Instead of:

	hastd: remote address not configured for resource foo

Print the following:

	No resource foo configuration for this node (acceptable node names: freefall, freefall.freebsd.org, 44333332-4c44-4e31-4a30-313920202020).

MFC after:	3 days
2010-12-26 19:07:58 +00:00
Pawel Jakub Dawidek
fba1bf5a2c The 'ret' variable is of type ssize_t and we use proper format for it (%zd), so
no (bogus) cast is needed.

MFC after:	3 days
2010-12-16 19:48:03 +00:00
Pawel Jakub Dawidek
cd7b7ee577 Improve problems logging.
MFC after:	3 days
2010-12-16 07:30:47 +00:00
Pawel Jakub Dawidek
7208920499 Don't ignore errors from remote requests.
MFC after:	3 days
2010-12-16 07:29:58 +00:00
Pawel Jakub Dawidek
347bde360a Log the fact of launching and include protocol version number.
MFC after:	3 days
2010-12-16 07:28:40 +00:00
Rebecca Cran
e267ef95d5 Don't generate input() since it's not used. 2010-11-22 14:16:22 +00:00
Pawel Jakub Dawidek
d448536ceb Move timeout.tv_sec initialization outside the loop - sigtimedwait(2) won't
modify it.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-15 03:07:42 +00:00
Pawel Jakub Dawidek
1dd5a4bfa2 1. Exit when we cannot create incoming connection.
2. Improve logging to inform which connection can't be created.

Submitted by:	[1] Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-15 03:05:33 +00:00
Pawel Jakub Dawidek
448efa9421 Send packets to remote node only via the send thread to avoid possible
races - in this case a keepalive packet was send from wrong thread which
lead to connection dropping, because of corrupted packet.

Fix it by sending keepalive packets directly from the send thread.
As a bonus we now send keepalive packets only when connection is idle.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-11-02 22:13:08 +00:00
Pawel Jakub Dawidek
ce837469ba Before this change on first connect between primary and secondary we
initialize all the data. This is huge waste of time and resources if
there were no writes yet, as there is no real data to synchronize.

Optimize this by sending "virgin" argument to secondary, which gives it a hint
that synchronization is not needed.

In the common case (where noth nodes are configured at the same time) instead
of synchronizing everything, we don't synchronize at all.

MFC after:	1 week
2010-10-24 17:28:25 +00:00
Pawel Jakub Dawidek
b9ffbb0a94 Implement nv_exists() function that returns true if argument of the given
name exists.

MFC after:	3 days
2010-10-24 17:24:08 +00:00
Pawel Jakub Dawidek
3dea75d2a8 Move all NV defines into nv.c, they are not used externally thus there is
no need to make then visible from outside.

MFC after:	3 days
2010-10-24 17:22:34 +00:00
Pawel Jakub Dawidek
1f39b27946 Simplify code a bit.
MFC after:	3 days
2010-10-24 15:44:23 +00:00
Pawel Jakub Dawidek
d7be7905ae Plug memory leak.
MFC after:	3 days
2010-10-24 15:42:16 +00:00
Pawel Jakub Dawidek
584a9bc3f8 Plug memory leaks.
Found with:	valgrind
MFC after:	3 days
2010-10-24 15:41:23 +00:00
Pawel Jakub Dawidek
2964aeb34a Load geom_gate.ko module after parsing arguments.
MFC after:	3 days
2010-10-24 15:38:58 +00:00
Pawel Jakub Dawidek
6c71649c5f Use closefrom(2) instead of close(2) in a loop.
MFC after:	1 week
2010-10-20 21:10:01 +00:00
Pawel Jakub Dawidek
3f562cce40 Log correct connection when canceling half-open connection.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-17 15:47:27 +00:00
Pawel Jakub Dawidek
bb317aa6ea Use one fprintf() instead of two.
MFC after:	3 days
2010-10-16 22:50:12 +00:00
Pawel Jakub Dawidek
c0a124e6ce Clear signal mask before executing a hook.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-16 22:48:48 +00:00
Pawel Jakub Dawidek
51c63dce86 We can't zero out ggio request, as we have some fields in there we initialize
once during start-up.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-08 15:05:39 +00:00
Pawel Jakub Dawidek
022f07b682 We close the event socketpair early in the mainloop to prevent spaming with
error messages, so when we clean up after child process, we have to check if
the event socketpair is still there.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-08 15:02:15 +00:00
Pawel Jakub Dawidek
4e47b646bb Clear ggate structures before using them. We don't initialize all the field
and there can be some garbage from the stack.

MFC after:	1 week
2010-10-07 18:23:28 +00:00
Pawel Jakub Dawidek
783ee75392 Log error message when we fail to destroy ggate provider.
MFC after:	3 days
2010-10-07 18:20:16 +00:00
Pawel Jakub Dawidek
4a88128b01 Start the guard thread first, so we can handle signals from the very begining.
Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2010-10-07 18:19:02 +00:00
Pawel Jakub Dawidek
b46198a5db Don't close local component on exit as we can hang waiting on g_waitidle.
I'm unable to reproduce the race described in comment anymore and also the
comment is incorrect - localfd represents local component from configuration
file, eg. /dev/da0 and not HAST provider.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	1 week
2010-10-07 18:16:22 +00:00
Pawel Jakub Dawidek
428ad0a9c4 Decrease report interval to 5 seconds, as this also means we will check for
signals every 5 seconds and not every 10 seconds as before.

MFC after:	3 days
2010-10-04 21:44:26 +00:00
Pawel Jakub Dawidek
5f24b330df hook_check() is now only used to report about long-running hooks, so the
argument is redundant, remove it.

MFC after:	3 days
2010-10-04 21:43:06 +00:00
Pawel Jakub Dawidek
41013c0b21 We can't mask ignored signal, so install dummy signal hander for SIGCHLD before
masking it.

This fixes bogus reports about hooks running for too long and other problems
related to garbage-collecting child processes.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-10-04 21:41:18 +00:00
Pawel Jakub Dawidek
b71de2e057 Plug memory leak on fork(2) failure.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-26 10:39:01 +00:00
Pawel Jakub Dawidek
9dd5a6cb0f Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:08:11 +00:00
Pawel Jakub Dawidek
196abd3518 Assert that descriptor numbers are sane.
MFC after:	3 days
2010-09-22 19:05:54 +00:00
Pawel Jakub Dawidek
8b70e6ae9c Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.

The fix is to start the control thread before sending any events.

Reported and fix suggested by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-09-22 19:03:11 +00:00
Pawel Jakub Dawidek
0c24d8e2a1 Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.

MFC after:	3 days
2010-09-22 18:57:06 +00:00
Pawel Jakub Dawidek
c56cf19ebf If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.

MFC after:	3 days
2010-09-22 18:39:43 +00:00
Pawel Jakub Dawidek
351b9a37a4 Sort includes.
MFC after:	3 days
2010-09-22 18:38:02 +00:00
Pawel Jakub Dawidek
e43e02f1a4 Add __dead2 to functions that we know they are going to exit.
MFC after:	3 days
2010-09-20 13:23:43 +00:00
Pawel Jakub Dawidek
6d19256b15 Include process PID in log messages.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:05:13 +00:00
Pawel Jakub Dawidek
8ecdeae9d9 Correct error message.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	2 weeks
2010-08-31 12:03:29 +00:00
Pawel Jakub Dawidek
71c895eb1f Forgot to add event.c and event.h in r212038.
Pointed out by:	pluknet <pluknet@gmail.com>
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 09:38:43 +00:00
Pawel Jakub Dawidek
852ac373cb Mask only those signals that we want to handle.
Suggested by:	jilles
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-31 06:22:03 +00:00
Pawel Jakub Dawidek
5bdff860e7 Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.

Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.

This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:26:10 +00:00
Pawel Jakub Dawidek
6b276294af We only want to know if descriptors are ready for reading.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:19:21 +00:00
Pawel Jakub Dawidek
eea2deaad0 When someone gives NULL as data, assume this is because he want to declare
connection side only.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 23:16:45 +00:00
Pawel Jakub Dawidek
6be3a25c85 Use pjdlog_exit() before fork().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 22:28:04 +00:00
Pawel Jakub Dawidek
b938cdcc9b Constify arguments we can constify.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 22:26:42 +00:00
Pawel Jakub Dawidek
5b41e64486 Execute hook when connection between the nodes is established or lost.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:31:30 +00:00
Pawel Jakub Dawidek
2be8fd75ff Execute hook when split-brain is detected.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:12:10 +00:00
Pawel Jakub Dawidek
6d0c801ea9 Use sigtimedwait(2) for signals handling in primary process.
This fixes various races and eliminates use of pthread* API in signal handler.

Pointed out by:	kib
With help from:	jilles
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-30 00:06:05 +00:00
Pawel Jakub Dawidek
ff6bb1f8b3 - Move functionality responsible for checking one connection to separate
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 22:55:21 +00:00
Pawel Jakub Dawidek
ee087cdf97 Disconnect after logging errors.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 22:17:53 +00:00
Pawel Jakub Dawidek
a870e771b9 - Call hook on role change.
- Document new event.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:42:45 +00:00
Pawel Jakub Dawidek
ecc99c890e Allow to run hooks from the main hastd process.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:41:53 +00:00
Pawel Jakub Dawidek
25ec2e3e2b - Add hook_fini() which should be called after fork() from the main hastd
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
  and once it can recognize it, it will pass pid and status to hook_check_one().

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:39:49 +00:00
Pawel Jakub Dawidek
572cdb2216 Implement mtx_destroy() and rw_destroy().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-29 21:37:21 +00:00
Pawel Jakub Dawidek
5da2320932 When SIGTERM or SIGINT is received, terminate worker processes.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:28:02 +00:00
Pawel Jakub Dawidek
4767ee29f1 When logging to stdout/stderr, flush after each log.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:26:55 +00:00
Pawel Jakub Dawidek
b9cf0cf5fa Correct when we log interrupted synchronization.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 21:20:32 +00:00
Pawel Jakub Dawidek
eba09893fd Check if no signals were delivered just before going to sleep.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 20:49:06 +00:00
Pawel Jakub Dawidek
01125a9381 Add hooks execution.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 20:48:12 +00:00
Pawel Jakub Dawidek
ac59403c39 Document new 'exec' parameter.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 15:20:31 +00:00
Pawel Jakub Dawidek
0becad39a7 Allow to execute specified program on various HAST events.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 15:16:52 +00:00
Pawel Jakub Dawidek
1cdaf10c45 - Run hooks in background - don't block waiting for them to finish.
- Keep all hooks we're running in a global list, so we can report when
  they finish and also report when they are running for too long.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:38:12 +00:00
Pawel Jakub Dawidek
e64887c4d6 When logging to stdout/stderr don't close those descriptors after fork().
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:35:39 +00:00
Pawel Jakub Dawidek
3f828c18e5 Reduce indent where possible.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:28:39 +00:00
Pawel Jakub Dawidek
f7fe83f9f8 Implement keepalive mechanism inside HAST protocol so we can detect secondary
node failures quickly for HAST resources that are rarely modified.

Remove XXX from a comment now that the guard thread never sleeps infinitely.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:26:37 +00:00
Pawel Jakub Dawidek
8f8c798c13 - Remove redundant and incorrect 'old' word from debug message.
- Log disconnects as warnings.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:12:53 +00:00
Pawel Jakub Dawidek
e23d2d0187 Don't increase number synchronized bytes in case of an error.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:10:25 +00:00
Pawel Jakub Dawidek
53d9b386eb Log that synchronization was interrupted in a proper place.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:08:10 +00:00
Pawel Jakub Dawidek
55ce1e7c8b We have sync_start() function to start synchronization, introduce sync_stop()
function to stop it.

MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:06:00 +00:00
Pawel Jakub Dawidek
16bd7026a2 Add QUEUE_INSERT() and QUEUE_TAKE() macros that simplify the code a bit.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 14:01:28 +00:00
Pawel Jakub Dawidek
6e5f008ac4 Add mtx_owned() implementation.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 13:58:38 +00:00
Pawel Jakub Dawidek
7087d13fae Make comment more readable.
MFC after:	2 weeks
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
2010-08-27 13:54:17 +00:00
Pawel Jakub Dawidek
28df1f238a For some setups sending data in 128kB chunks makes communication very slow. No
idea why. 32kB on the other hand seems to work properly everywhere.

Reported by:	Thomas Steen Rasmussen <thomas@gibfest.dk>
MFC after:	3 weeks
2010-08-18 12:09:27 +00:00
Pawel Jakub Dawidek
471bb09914 The 'size' variable is there to limit how many bytes we want to copy from
'addr'. It is very likely that size of 'addr' is larger than 'size', so checking
strlcpy() return value is bogus.

MFC after:	3 weeks
2010-08-16 21:59:56 +00:00
Joel Dahl
c2025a7660 Fix typos, spelling, formatting and mdoc mistakes found by Nobuyuki while
translating these manual pages.  Minor corrections by me.

Submitted by:	Nobuyuki Koganemaru <n-kogane@syd.odn.ne.jp>
2010-08-16 15:18:30 +00:00
Pawel Jakub Dawidek
44d63cff2e Document 'none' value for remote.
Reviewed by:	dougb
MFC after:	1 month
2010-08-05 19:54:57 +00:00
Pawel Jakub Dawidek
0989854d45 Implement configuration reload on SIGHUP. This includes:
- Load added resources.
- Stop and forget removed resources.
- Update modified resources in least intrusive way, ie. don't touch
  /dev/hast/<name> unless path to local component or provider name were
  modified.

Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	1 month
2010-08-05 19:16:31 +00:00
Pawel Jakub Dawidek
bbbb114cda Prepare configuration parsing code to be called multiple times:
- Don't exit on errors if not requested.
- Don't keep configuration in global variable, but allocate memory for
  configuration.
- Call yyrestart() before yyparse() so that on error in configuration file
  we will start from the begining next time and not from the place we left of.

MFC after:	1 month
2010-08-05 19:08:54 +00:00
Pawel Jakub Dawidek
a00829bb71 Make control_set_role() more public. We will need it soon.
MFC after:	1 month
2010-08-05 19:04:29 +00:00
Pawel Jakub Dawidek
f377917cdc Allow to use 'none' keywork as remote address in case second cluster node
is not setup yet.

MFC after:	1 month
2010-08-05 19:01:57 +00:00
Pawel Jakub Dawidek
a2ef0636b4 Reset signal handlers after fork().
MFC after:	1 month
2010-08-05 18:58:00 +00:00
Pawel Jakub Dawidek
005f438bf5 - Use pjdlog_exitx() to log errors and exit instead of errx().
- Use 'unable to' (instead of 'cannot') consistently.

MFC after:	1 month
2010-08-05 18:56:24 +00:00
Pawel Jakub Dawidek
2c5dadc9cf Assert that various buffers we are large enough.
MFC after:	1 month
2010-08-05 18:27:41 +00:00
Pawel Jakub Dawidek
524840d8d0 Problem with assertion is that it logs on stderr. Add two macros:
PJDLOG_ASSERT() and PJDLOG_VERIFY() that will check the given condition
and log the problem where appropriate. The difference between those
two is that PJDLOG_VERIFY() always work and PJDLOG_ASSERT() can be
turned off by defining NDEBUG.

MFC after:	1 month
2010-08-05 18:26:38 +00:00
Pawel Jakub Dawidek
6b97e48326 Keep $FreeBSD$ in __FBSDID() only for C files.
MFC after:	1 month
2010-08-05 18:23:43 +00:00
Pawel Jakub Dawidek
e3031161eb Mark two more places that we won't reach.
MFC after:	1 month
2010-08-05 18:21:45 +00:00
Pawel Jakub Dawidek
9bf24e1a00 Now that TCP will be checked last we don't need any knowledge about other
protocols.

MFC after:	1 month
2010-08-05 17:57:59 +00:00
Pawel Jakub Dawidek
50692f84c6 Add an argument to the proto_register() function which allows protocol to
declare it is the default and be placed at the end of the queue so it is
checked last.

MFC after:	1 month
2010-08-05 17:56:41 +00:00
Joel Dahl
a53bb70bda Spelling fixes. 2010-07-31 21:09:49 +00:00
Pawel Jakub Dawidek
1a517c3ec5 Actually, only the fullsync mode is implemented, not memsync mode.
Correct manual page.

MFC after:	3 days
2010-07-22 08:30:14 +00:00
Pawel Jakub Dawidek
f3bd74124a Correct various log messages.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-06-14 21:46:48 +00:00
Pawel Jakub Dawidek
96610dd9b6 Fix typos.
MFC after:	3 days
2010-06-14 21:44:58 +00:00
Pawel Jakub Dawidek
328e0f4b04 Initialize gctl_seq for synchronization requests.
Reported by:	hiroshi@soupacific.com
Analysed by:	Mikolaj Golub <to.my.trociny@gmail.com>
Tested by:	hiroshi@soupacific.com, Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-06-14 21:44:20 +00:00
Pawel Jakub Dawidek
7a716e072e Plug memory leak.
Found by:	Coverity Prevent
CID:		7057
MFC after:	3 days
2010-06-14 21:41:22 +00:00
Pawel Jakub Dawidek
b0dfbe5b27 Plug memory leak.
Found by:	Coverity Prevent
CID:		7056
MFC after:	3 days
2010-06-14 21:37:25 +00:00
Pawel Jakub Dawidek
9f31eddba0 Plug memory leak.
Found by:	Coverity Prevent
CID:		7051
MFC after:	3 days
2010-06-14 21:33:18 +00:00
Pawel Jakub Dawidek
6744284aec Plug memory leaks.
Found by:	Coverity Prevent
CID:		7052, 7053, 7054, 7055
MFC after:	3 days
2010-06-14 21:25:20 +00:00
Pawel Jakub Dawidek
9fab3c1b94 Remove macros that are not really needed. The idea was to have them in case
we grow more descriptors, but I'll reconsider readding them once we get there.

Passing (a = b) expression to FD_ISSET() is bad idea, as FD_ISSET() evaluates
its argument twice.

Found by:	Coverity Prevent
CID:		5243
MFC after:	3 days
2010-06-14 21:18:58 +00:00
Pawel Jakub Dawidek
a58b195e35 Eliminate dead code.
Found by:	Coverity Prevent
CID:		5158
MFC after:	3 days
2010-06-14 21:01:13 +00:00
Ulrich Spörlein
0b31f1f731 mdoc: move remaining sections into consistent order
This pertains mostly to FILES, HISTORY, EXIT STATUS and AUTHORS sections.

Found by:	mdocml lint run
Reviewed by:	ru
2010-05-13 12:08:11 +00:00
Pawel Jakub Dawidek
d92714eea8 Default connection timeout is way too long. To make it shorter we have to
make socket non-blocking, connect() and if we get EINPROGRESS, we have to
wait using select(). Very complex, but I know no other way to define
connection timeout for a given socket.

Reported by:	hiroshi@soupacific.com
MFC after:	3 days
2010-04-29 21:55:20 +00:00
Pawel Jakub Dawidek
c6ddcbe009 - Check if the worker process was killed by signal and restart it.
- Improve logging.

Pointed out by:	Garrett Cooper <yanefbsd@gmail.com>
MFC after:	3 days
2010-04-29 15:42:24 +00:00
Pawel Jakub Dawidek
5571414ca8 Fix a problem where hastd will stuck in recv(2) after sending request to
secondary, which died between send(2) and recv(2). Do it by adding timeout
to recv(2) for primary incoming and outgoing sockets and secondary outgoing
socket.

Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
Tested by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-04-29 15:36:32 +00:00
Pawel Jakub Dawidek
83a5671405 Restart worker thread only if the problem was temporary.
In case of persistent problem we don't want to loop forever.

MFC after:	3 days
2010-04-28 22:41:06 +00:00
Pawel Jakub Dawidek
5abfc9c145 Mark temporary issues as such.
MFC after:	3 days
2010-04-28 22:39:47 +00:00
Pawel Jakub Dawidek
06c117d1d1 Use WEXITSTATUS() to obtain real exit code.
MFC after:	3 days
2010-04-28 22:26:30 +00:00
Pawel Jakub Dawidek
1228041bd5 Don't assume that "resource" property is in metadata.
Reported by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-04-28 22:23:29 +00:00
Pawel Jakub Dawidek
36df4f8d05 Fix compilation with WITHOUT_CRYPT or WITHOUT_OPENSSL options.
Reported by:	Andrei V. Lavreniyuk <andy.lavr@reactor-xg.kiev.ua>
MFC after:	3 days
2010-04-22 19:18:10 +00:00
Pawel Jakub Dawidek
20ec52dc4b Fix log size calculation which caused message truncation.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-04-16 06:49:12 +00:00
Pawel Jakub Dawidek
09398e9bd4 Fix control socket leak when worker process exits.
Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
MFC after:	3 days
2010-04-16 06:47:29 +00:00
Pawel Jakub Dawidek
20b77db949 Increase ggate queue size to maximum value.
HAST was not able to stand heavy random load.

Reported by:	Hiroyuki Yamagami
MFC after:	3 days
2010-04-15 17:04:08 +00:00
Pawel Jakub Dawidek
0d9014f354 Don't hold connection lock when doing reconnects as it makes I/Os wait for
connection timeouts.

Reported by:	Kevin Day <toasty@dragondata.com>
2010-03-27 16:35:07 +00:00
Ulrich Spörlein
7729e3ba40 Remove redundant WARNS?=6 overrides and inherit the WARNS setting from
the toplevel directory.

This does not change any WARNS level and survives a make universe.

Approved by:        ed (co-mentor)
2010-03-02 18:44:08 +00:00
Ruslan Ermilov
c59ee18a21 Fixed static linkage. 2010-02-26 09:41:16 +00:00
Pawel Jakub Dawidek
2e1facf96f Changing proto_socketpair.c compilation and linking order revealed
a problem - we should simply ignore proto_server() if address
doesn't start with socketpair://, and not abort.
2010-02-21 19:56:47 +00:00
Pawel Jakub Dawidek
32115b105a Please welcome HAST - Highly Avalable Storage.
HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by:	FreeBSD Foundation
Sponsored by:	OMCnet Internet Service GmbH
Sponsored by:	TransIP BV
2010-02-18 23:16:19 +00:00