freebsd-dev

Author	SHA1	Message	Date
Pawel Jakub Dawidek	584a9bc3f8	Plug memory leaks. Found with: valgrind MFC after: 3 days	2010-10-24 15:41:23 +00:00
Pawel Jakub Dawidek	51c63dce86	We can't zero out ggio request, as we have some fields in there we initialize once during start-up. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 3 days	2010-10-08 15:05:39 +00:00
Pawel Jakub Dawidek	4e47b646bb	Clear ggate structures before using them. We don't initialize all the field and there can be some garbage from the stack. MFC after: 1 week	2010-10-07 18:23:28 +00:00
Pawel Jakub Dawidek	783ee75392	Log error message when we fail to destroy ggate provider. MFC after: 3 days	2010-10-07 18:20:16 +00:00
Pawel Jakub Dawidek	4a88128b01	Start the guard thread first, so we can handle signals from the very begining. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2010-10-07 18:19:02 +00:00
Pawel Jakub Dawidek	b46198a5db	Don't close local component on exit as we can hang waiting on g_waitidle. I'm unable to reproduce the race described in comment anymore and also the comment is incorrect - localfd represents local component from configuration file, eg. /dev/da0 and not HAST provider. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2010-10-07 18:16:22 +00:00
Pawel Jakub Dawidek	9dd5a6cb0f	Switch to sigprocmask(2) API also in the main process and secondary process. This way the primary process inherits signal mask from the main process, which fixes a race where signal is delivered to the primary process before configuring signal mask. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 3 days	2010-09-22 19:08:11 +00:00
Pawel Jakub Dawidek	8b70e6ae9c	Fix possible deadlock where worker process sends an event to the main process while the main process sends control message to the worker process, but worker process hasn't started control thread yet, because it waits for reply from the main process. The fix is to start the control thread before sending any events. Reported and fix suggested by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 3 days	2010-09-22 19:03:11 +00:00
Pawel Jakub Dawidek	e43e02f1a4	Add __dead2 to functions that we know they are going to exit. MFC after: 3 days	2010-09-20 13:23:43 +00:00
Pawel Jakub Dawidek	852ac373cb	Mask only those signals that we want to handle. Suggested by: jilles MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-31 06:22:03 +00:00
Pawel Jakub Dawidek	5bdff860e7	Because it is very hard to make fork(2) from threaded process safe (we are limited to async-signal safe functions in the child process), move all hooks execution to the main (non-threaded) process. Do it by maintaining connection (socketpair) between child and parent and sending events from the child to parent, so it can execute the hook. This is step in right direction for others reasons too. For example there is one less problem to drop privs in worker processes. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-30 23:26:10 +00:00
Pawel Jakub Dawidek	6be3a25c85	Use pjdlog_exit() before fork(). MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-30 22:28:04 +00:00
Pawel Jakub Dawidek	5b41e64486	Execute hook when connection between the nodes is established or lost. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-30 00:31:30 +00:00
Pawel Jakub Dawidek	2be8fd75ff	Execute hook when split-brain is detected. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-30 00:12:10 +00:00
Pawel Jakub Dawidek	6d0c801ea9	Use sigtimedwait(2) for signals handling in primary process. This fixes various races and eliminates use of pthread* API in signal handler. Pointed out by: kib With help from: jilles MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-30 00:06:05 +00:00
Pawel Jakub Dawidek	ff6bb1f8b3	- Move functionality responsible for checking one connection to separate function to make code more readable. - Be sure not to reconnect too often in case of signal delivery, etc. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-29 22:55:21 +00:00
Pawel Jakub Dawidek	ee087cdf97	Disconnect after logging errors. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-29 22:17:53 +00:00
Pawel Jakub Dawidek	ecc99c890e	Allow to run hooks from the main hastd process. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-29 21:41:53 +00:00
Pawel Jakub Dawidek	b9cf0cf5fa	Correct when we log interrupted synchronization. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 21:20:32 +00:00
Pawel Jakub Dawidek	eba09893fd	Check if no signals were delivered just before going to sleep. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 20:49:06 +00:00
Pawel Jakub Dawidek	01125a9381	Add hooks execution. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 20:48:12 +00:00
Pawel Jakub Dawidek	0becad39a7	Allow to execute specified program on various HAST events. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 15:16:52 +00:00
Pawel Jakub Dawidek	f7fe83f9f8	Implement keepalive mechanism inside HAST protocol so we can detect secondary node failures quickly for HAST resources that are rarely modified. Remove XXX from a comment now that the guard thread never sleeps infinitely. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 14:26:37 +00:00
Pawel Jakub Dawidek	8f8c798c13	- Remove redundant and incorrect 'old' word from debug message. - Log disconnects as warnings. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 14:12:53 +00:00
Pawel Jakub Dawidek	e23d2d0187	Don't increase number synchronized bytes in case of an error. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 14:10:25 +00:00
Pawel Jakub Dawidek	53d9b386eb	Log that synchronization was interrupted in a proper place. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 14:08:10 +00:00
Pawel Jakub Dawidek	55ce1e7c8b	We have sync_start() function to start synchronization, introduce sync_stop() function to stop it. MFC after: 2 weeks Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com	2010-08-27 14:06:00 +00:00
Pawel Jakub Dawidek	0989854d45	Implement configuration reload on SIGHUP. This includes: - Load added resources. - Stop and forget removed resources. - Update modified resources in least intrusive way, ie. don't touch /dev/hast/<name> unless path to local component or provider name were modified. Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com MFC after: 1 month	2010-08-05 19:16:31 +00:00
Pawel Jakub Dawidek	f377917cdc	Allow to use 'none' keywork as remote address in case second cluster node is not setup yet. MFC after: 1 month	2010-08-05 19:01:57 +00:00
Pawel Jakub Dawidek	a2ef0636b4	Reset signal handlers after fork(). MFC after: 1 month	2010-08-05 18:58:00 +00:00
Pawel Jakub Dawidek	328e0f4b04	Initialize gctl_seq for synchronization requests. Reported by: hiroshi@soupacific.com Analysed by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: hiroshi@soupacific.com, Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 3 days	2010-06-14 21:44:20 +00:00
Pawel Jakub Dawidek	b0dfbe5b27	Plug memory leak. Found by: Coverity Prevent CID: 7056 MFC after: 3 days	2010-06-14 21:37:25 +00:00
Pawel Jakub Dawidek	5571414ca8	Fix a problem where hastd will stuck in recv(2) after sending request to secondary, which died between send(2) and recv(2). Do it by adding timeout to recv(2) for primary incoming and outgoing sockets and secondary outgoing socket. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 3 days	2010-04-29 15:36:32 +00:00
Pawel Jakub Dawidek	5abfc9c145	Mark temporary issues as such. MFC after: 3 days	2010-04-28 22:39:47 +00:00
Pawel Jakub Dawidek	20b77db949	Increase ggate queue size to maximum value. HAST was not able to stand heavy random load. Reported by: Hiroyuki Yamagami MFC after: 3 days	2010-04-15 17:04:08 +00:00
Pawel Jakub Dawidek	0d9014f354	Don't hold connection lock when doing reconnects as it makes I/Os wait for connection timeouts. Reported by: Kevin Day <toasty@dragondata.com>	2010-03-27 16:35:07 +00:00
Pawel Jakub Dawidek	32115b105a	Please welcome HAST - Highly Avalable Storage. HAST allows to transparently store data on two physically separated machines connected over the TCP/IP network. HAST works in Primary-Secondary (Master-Backup, Master-Slave) configuration, which means that only one of the cluster nodes can be active at any given time. Only Primary node is able to handle I/O requests to HAST-managed devices. Currently HAST is limited to two cluster nodes in total. HAST operates on block level - it provides disk-like devices in /dev/hast/ directory for use by file systems and/or applications. Working on block level makes it transparent for file systems and applications. There in no difference between using HAST-provided device and raw disk, partition, etc. All of them are just regular GEOM providers in FreeBSD. For more information please consult hastd(8), hastctl(8) and hast.conf(5) manual pages, as well as http://wiki.FreeBSD.org/HAST. Sponsored by: FreeBSD Foundation Sponsored by: OMCnet Internet Service GmbH Sponsored by: TransIP BV	2010-02-18 23:16:19 +00:00

37 Commits