ip_mroute: rework ip_mroute

Approved by:     mw
Obtained from:   Semihalf
Sponsored by:    Stormshield
Differential Revision: https://reviews.freebsd.org/D30354

Changes:
1. add spinlock to bw_meter

If two contexts read and modify bw_meter values
it might happen that these are corrupted.
Guard only code fragments which do read-and-modify.
Context which only do "reads" are not done inside
spinlock block. The only sideffect that can happen is
an 1-p;acket outdated value reported back to userspace.

2. replace all locks with a single RWLOCK

Multiple locks caused a performance issue in routing
hot path, when two of them had to be taken. All locks
were replaced with single RWLOCK which makes the hot
path able to take only shared access to lock most of
the times.
All configuration routines have to take exclusive lock
(as it was done before) but these operation are very rare
compared to packet routing.

3. redesign MFC expire and UPCALL expire

Use generic kthread and cv_wait/cv_signal for deferring
work. Previously, upcalls could be sent from two contexts
which complicated the design. All upcall sending is now
done in a kthread which allows hot path to work more
efficient in some rare cases.

4. replace mutex-guarded linked list with lock free buf_ring

All message and data is now passed over lockless buf_ring.
This allowed to remove some heavy locking when linked
lists were used.
This commit is contained in:
Wojciech Macek 2021-05-17 12:45:18 +02:00
parent 947bd2479b
commit d40cd26a86
2 changed files with 420 additions and 330 deletions

File diff suppressed because it is too large Load Diff

View File

@ -199,7 +199,7 @@ struct bw_upcall {
};
/* max. number of upcalls to deliver together */
#define BW_UPCALLS_MAX 128
#define BW_UPCALLS_MAX 1024
/* min. threshold time interval for bandwidth measurement */
#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3
#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0
@ -264,6 +264,10 @@ struct vif {
u_long v_pkt_out; /* # pkts out on interface */
u_long v_bytes_in; /* # bytes in on interface */
u_long v_bytes_out; /* # bytes out on interface */
#ifdef _KERNEL
struct mtx v_spin; /* Spin mutex for pkt stats */
char v_spin_name[32];
#endif
};
#if defined(_KERNEL) || defined (_NETSTAT)
@ -287,8 +291,7 @@ struct mfc {
for Lower-or-EQual case */
struct bw_meter *mfc_bw_meter_geq; /* list of bandwidth meters
for Greater-or-EQual case */
u_long mfc_nstall; /* # of packets awaiting mfc */
TAILQ_HEAD(, rtdetq) mfc_stall; /* q of packets awaiting mfc */
struct buf_ring *mfc_stall_ring; /* ring of awaiting mfc */
};
#endif /* _KERNEL */
@ -349,6 +352,8 @@ struct bw_meter {
#ifdef _KERNEL
struct callout bm_meter_callout; /* Periodic callout */
void* arg; /* custom argument */
struct mtx bm_spin; /* meter spin lock */
char bm_spin_name[32];
#endif
};