Commit Graph

322 Commits

Author SHA1 Message Date
trociny
cc424717c7 MFC r257155, r257582, r259191, r259192, r259193, r259194, r259195, r259196:
r257155:

Make hastctl list command output current queue sizes.

Reviewed by:	pjd

r257582 (pjd):

Correct alignment.

r259191:

For memsync replication, hio_countdown is used not only as an
indication when a request can be moved to done queue, but also for
detecting the current state of memsync request.

This approach has problems, e.g. leaking a request if memsynk ack from
the secondary failed, or racy usage of write_complete, which should be
called only once per write request, but for memsync can be entered by
local_send_thread and ggate_send_thread simultaneously.

So the following approach is implemented instead:

1) Use hio_countdown only for counting components we waiting to
   complete, i.e. initially it is always 2 for any replication mode.

2) To distinguish between "memsync ack" and "memsync fin" responses
   from the secondary, add and use hio_memsyncacked field.

3) write_complete() in component threads is called only before
   releasing hio_countdown (i.e. before the hio may be returned to the
   done queue).

4) Add and use hio_writecount refcounter to detect when
   write_complete() can be called in memsync case.

Reported by:	Pete French petefrench ingresso.co.uk
Tested by:	Pete French petefrench ingresso.co.uk

r259192:

Add some macros to make the code more readable (no functional chages).

r259193:

Fix compiler warnings.

r259194:

In remote_send_thread, if sending a request fails don't take the
request back from the receive queue -- it might already be processed
by remote_recv_thread, which lead to crashes like below:

  (primary) Unable to receive reply header: Connection reset by peer.
  (primary) Unable to send request (Connection reset by peer):
      WRITE(954662912, 131072).
  (primary) Disconnected from kopusha:7772.
  (primary) Increasing localcnt to 1.
  (primary) Assertion failed: (old > 0), function refcnt_release,
      file refcnt.h, line 62.

Taking the request back was not necessary (it would properly be
processed by the remote_recv_thread) and only complicated things.

r259195:

Send wakeup to threads waiting on empty queue before releasing the
lock to decrease spurious wakeups.

Submitted by:	davidxu

r259196:

Check remote protocol version only for the first connection (when it
is actually sent by the remote node).

Otherwise it generated confusing "Negotiated protocol version 1" debug
messages when processing the second connection.
2013-12-28 19:21:22 +00:00
trociny
6952838572 MFC r257154:
Merging local and remote bitmaps must be protected by hr_amp lock.

This is believed to fix hastd crashes, which might occur during
synchronization, triggered by the failed assertion:

 Assertion failed: (amp->am_memtab[ext] > 0),
 function activemap_write_complete, file activemap.c, line 351.

Approved by:	re (glebius)
2013-10-31 20:30:26 +00:00
trociny
5db794f1d0 Fix comments.
Approved by:	re (marius)
MFC after:	3 days
2013-09-19 20:20:59 +00:00
trociny
5ea0541321 When updating the map of dirty extents, most recently used extents are
kept dirty to reduce the number of on-disk metadata updates. The
sequence of operations is:

1) acquire the activemap lock;
2) update in-memory map;
3) if the list of keepdirty extents is changed, update on-disk metadata;
4) release the lock.

On-disk updates are not frequent in comparison with in-memory updates,
while require much more time. So situations are possible when one
thread is updating on-disk metadata and another one is waiting for the
activemap lock just to update the in-memory map.

Improve this by introducing additional, on-disk map lock: when
in-memory map is updated and it is detected that the on-disk map needs
update too, the on-disk map lock is acquired and the on-memory lock is
released before flushing the map.

Reported by:	Yamagi Burmeister yamagi.org
Tested by:	Yamagi Burmeister yamagi.org
Reviewed by:	pjd
Approved by:	re (marius)
MFC after:	2 weeks
2013-09-19 20:19:08 +00:00
trociny
2604d523b6 Use cv_broadcast() instead of cv_signal() when waking up threads
waiting on an empty queue as the queue may have several consumers.

Before the fix the following scenario was possible: 2 threads are
waiting on empty queue, 2 threads are inserting simultaneously. The
first inserting thread detects that the queue is empty and is going to
send the signal, but before it sends the second thread inserts
too. When the first sends the signal only one of the waiting threads
receive it while the other one may wait forever.

The scenario above is is believed to be the cause of the observed
cases, when ggate_recv_thread() was getting stuck on taking free
request, while the free queue was not empty.

Reviewed by:	pjd
Tested by:	Yamagi Burmeister yamagi.org
Approved by:	re (marius)
MFC after:	2 weeks
2013-09-19 20:15:24 +00:00
pjd
029a6f5d92 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
trociny
5092fcd640 Make hastctl(1) ('list' command) output a worker pid.
Reviewed by:	pjd
MFC after:	3 days
2013-07-01 18:41:07 +00:00
schweikh
19b98edeaa Correct some grammar. 2013-06-30 17:59:40 +00:00
ed
3764c06cce Don't let hastd use C11 atomics.
Due to possible concerns about the stability of C11 atomics, use our
existing atomics API instead.

Requested by:	pjd
2013-06-29 20:13:39 +00:00
ed
4ea862e26e Let hastd use C11 atomics.
C11 atomics now work on all the architectures. Have at least a single
piece of software in our base system that uses C11 atomics. This
somewhat makes it less likely that we break it because of LLVM imports,
etc.
2013-06-15 22:17:59 +00:00
jkim
d752b745df Improve compatibility with old flex and fix build with GCC. 2013-05-22 17:47:45 +00:00
trociny
f39c963dc2 Get rid of libl dependency. We needed it only to provide yywrap. But
yywrap is not necessary when parsing a single hast.conf file.

Suggested by:	kib
Reviewed by:	pjd
2013-05-11 09:51:44 +00:00
ed
a09c9f5550 Partially revert my last change.
I forgot that I still had a locally applied patch to my copy of Clang
that needs to be pushed in before we should use C11 atomics.
2013-04-27 05:06:25 +00:00
ed
e802849740 Use C11 <stdatomic.h> instead of our non-standard <machine/atomic.h>.
Reviewed by:	pjd
2013-04-27 05:01:29 +00:00
ed
120125784a Add the Clang specific -Wmissing-variable-declarations to WARNS=6.
This compiler flag enforces that that people either mark variables
static or use an external declarations for the variable, similar to how
-Wmissing-prototypes works for functions.

Due to the fact that Yacc/Lex generate code that cannot trivially be
changed to not warn because of this (lots of yy* variables), add a
NO_WMISSING_VARIABLE_DECLARATIONS that can be used to turn off this
specific compiler warning.

Announced on:	toolchain@
2013-04-19 19:45:00 +00:00
pjd
3d6eb97a85 Now that ioctl(2) is allowed in capability mode and we can limit ioctls for the
given descriptors, use Capsicum sandboxing for hastd in primary and secondary
modes. Allow for DIOCGDELETE and DIOCGFLUSH ioctls on provider descriptor and
for G_GATE_CMD_MODIFY, G_GATE_CMD_START, G_GATE_CMD_DONE and G_GATE_CMD_DESTROY
on GEOM Gate descriptor.

Sponsored by:	The FreeBSD Foundation
2013-03-14 23:14:47 +00:00
pjd
aa00097868 Minor corrections. 2013-03-14 23:11:52 +00:00
pjd
5f4ba049f3 Delete requests can be larger than MAXPHYS. 2013-03-14 23:03:48 +00:00
trociny
8690e69f6a Add i/o error counters to hastd(8) and make hastctl(8) display
them.  This may be useful for detecting problems with HAST disks.

Discussed with and reviewed by:	pjd
MFC after:	1 week
2013-02-25 20:09:07 +00:00
pjd
0bf19fd812 - Add support for 'memsync' mode. This is the fastest replication mode that's
why it will now be the default.
- Bump protocol version to 2 and add backward compatibility for version 1.
- Allow to specify hosts by kern.hostid as well (in addition to hostname and
  kern.hostuuid) in configuration file.

Sponsored by:	Panzura
Tested by:	trociny
2013-02-17 21:12:34 +00:00
kevlo
804c67a486 Fix socket calls on error post-r243965.
Submitted by:	Garrett Cooper
2012-12-21 15:54:13 +00:00
pjd
712fa475ac Revert r228695. We use __func__ here as a format to distinguish between
abort and assert. It would be cleaner to use NULL or "" here, but gcc
complains in both cases.
2012-11-05 00:38:14 +00:00
trociny
48789932a6 Metaflush on/off values don't need quotes.
Reviewed by:	pjd
MFC after:	3 days
2012-07-16 20:43:28 +00:00
pjd
1fb1092d75 Make use of GEOM Gate direct reads feature. This allows HAST to serve
reads with native speed of the underlying provider.
There are three situations when direct reads are not used:
1. Data is being synchronized and synchronization source is the secondary
   node, which means secondary node has more recent data and we should read
   from it.
2. Local read failed and we have to try to read from the secondary node.
3. Local component is unavailable and all I/O requests are served from the
   secondary node.

Sponsored by:	Panzura, http://www.panzura.com
MFC after:	1 month
2012-07-04 20:20:48 +00:00
pjd
c34c233e83 Check if there is cmsg at all.
MFC after:	3 days
2012-07-01 16:26:07 +00:00
hselasky
6a1a47c355 Revert: r236909
Pointyhat: me
2012-06-11 20:27:52 +00:00
hselasky
6b0d4730f1 Use the correct clock source when computing timeouts.
MFC after:	1 week
2012-06-11 19:20:59 +00:00
pjd
2143a387af Simplify the code by using snprlcat().
MFC after:	3 days
2012-06-03 10:50:46 +00:00
wblock
9fa9a2acad Fixes to man8 groff mandoc style, usage mistakes, or typos.
PR:		168016
Submitted by:	Nobuyuki Koganemaru
Approved by:	gjb
MFC after:	3 days
2012-05-24 02:24:03 +00:00
bapt
310ab6d7ff Fix world after byacc import:
- old yacc(1) use to magicially append stdlib.h, while new one don't
- new yacc(1) do declare yyparse by itself, fix redundant declaration of
  'yyparse'

Approved by:	des (mentor)
2012-05-22 16:33:10 +00:00
gjb
67d88d49d4 General mdoc(7) and typo fixes.
PR:		167804
Submitted by:	Nobuyuki Koganemaru (kogane!jp.freebsd.org)
MFC after:	3 days
2012-05-12 15:08:22 +00:00
trociny
5780d99776 If hastd is invoked with "-P pidfile" option always create pidfile
regardless of whether -F (foreground) option is set or not.

Also, if -P option is specified, ignore pidfile setting from configuration
not only on start but on reload too. This fixes the issue when for hastd
run with -P option reload caused the pidfile change.

Reviewed by:	pjd
MFC after:	1 week
2012-03-29 20:11:16 +00:00
trociny
4eba690e91 Fix typo.
MFC after:	3 days
2012-03-23 20:18:48 +00:00
pjd
3e86e21237 Nice range comparison.
MFC after:	3 days
2012-02-11 16:41:52 +00:00
trociny
1d510ea553 If a local write request is from the synchronization thread, when it
is synchronizing data that is out of date on the local component, we
should not send G_GATE_CMD_DONE acknowledge to the kernel.

This fixes the issue, observed in async mode, when on synchronization
from the remote component the worker terminated with "G_GATE_CMD_DONE
failed" error.

Reported by:	Artem Kajalainen <artem kayalaynen ru>
Reviewed by:	pjd
MFC after:	1 week
2012-02-05 15:23:32 +00:00
trociny
d2786f0a2c Fix the regression introduced in r226859: if the local component is
out of date BIO_READ requests got lost instead of being sent to the
remote component.

Reviewed by:	pjd
MFC after:	1 week
2012-02-05 15:21:08 +00:00
pjd
306b4958cf Fix typo in comment.
MFC after:	3 days
2012-02-04 07:59:12 +00:00
pjd
538ff2e745 - Fix documentation to note that /etc/hast.conf is the default configuration
file for hastd(8) and hastctl(8) and not hast.conf.
- In copyright statement correct that this file is documentation, not software.
- Bump date.

MFC after:	3 days
2012-01-24 23:43:13 +00:00
pjd
983cddf8a6 Free memory that won't be used in child.
MFC after:	1 week
2012-01-22 11:20:42 +00:00
pjd
bf37868a59 Fix minor memory leak.
MFC after:	3 days
2012-01-21 20:13:37 +00:00
pjd
62f2fd64bb Remove another unused token.
MFC after:	3 days
2012-01-20 21:49:56 +00:00
pjd
8e6c8deb43 Remove unused token 'port'.
MFC after:	3 days
2012-01-20 21:45:24 +00:00
pjd
b07aec8b25 Style cleanups.
MFC after:	3 days
2012-01-13 23:25:35 +00:00
pjd
e437b28043 - Fix a bug where pidfile was removed in SIGHUP when it hasn't changed in
configuration file.
- Log the fact that pidfile has changed.

MFC after:	3 days
2012-01-10 22:41:09 +00:00
pjd
c5fe5a76f2 For functions that return -1 on failure check exactly for -1 and not for
any negative number.

MFC after:	3 days
2012-01-10 22:39:07 +00:00
pjd
29f76d890e Don't touch pidfiles when running in foreground. Before that change we
would create an empty pidfile on start and check if it changed on SIGHUP.

MFC after:	3 days
2012-01-10 22:24:57 +00:00
uqs
5f1ca9b982 Spelling fixes for sbin/ 2012-01-07 16:09:33 +00:00
pjd
cbcf1832ad fork(2) returns -1 on failure, not some random negative number.
MFC after:	3 days
2012-01-06 23:44:26 +00:00
pjd
a272071bf0 Constify argument.
MFC after:	3 days
2012-01-06 12:27:17 +00:00
dim
9d35683411 Use NO_WCAST_ALIGN for usr.bin/hastctl and usr.bin/hastd; the alignment
warnings in sbin/hastd/lzf.c are only emitted for i386 and amd64, and
there they can be safely ignored.

MFC after:	1 week
2011-12-19 15:46:15 +00:00