Commit Graph

10 Commits

Author SHA1 Message Date
Ed Maste
33482dae89 libc: fix undefined behavior from signed overflow in strstr and memmem
unsigned char promotes to int, which can overflow when shifted left by
24 bits or more. this has been reported multiple times but then
forgotten. it's expected to be benign UB, but can trap when built with
explicit overflow catching (ubsan or similar). fix it now.

note that promotion to uint32_t is safe and portable even outside of
the assumptions usually made in musl, since either uint32_t has rank
at least unsigned int, so that no further default promotions happen,
or int is wide enough that the shift can't overflow. this is a
desirable property to have in case someone wants to reuse the code
elsewhere.

musl commit: 593caa456309714402ca4cb77c3770f4c24da9da

Obtained from:	musl
2020-11-19 00:03:15 +00:00
Ed Maste
7dbcd06e63 libc: optimize memmem two-way bad character shift
first, the condition (mem && k < p) is redundant, because mem being
nonzero implies the needle is periodic with period exactly p, in which
case any byte that appears in the needle must appear in the last p
bytes of the needle, bounding the shift (k) by p.

second, the whole point of replacing the shift k by mem (=l-p) is to
prevent shifting by less than mem when discarding the memory on shift,
in which case linear time could not be guaranteed. but as written, the
check also replaced shifts greater than mem by mem, reducing the
benefit of the shift. there is no possible benefit to this reduction of
the shift; since mem is being cleared, the full shift is valid and
more optimal. so only replace the shift by mem when it would be less
than mem.

musl commits:
8f5a820d147da36bcdbddd201b35d293699dacd8
122d67f846cb0be2c9e1c3880db9eb9545bbe38c

Obtained from:	musl
MFC after:	2 weeks
2020-11-19 00:02:12 +00:00
Ed Maste
4874ddfd37 clang-format libc string functions imported from musl
We have adopted these and don't consider them 'contrib' code, so bring
them closer to style(9).  This is a followon to r315467 and r351700.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-11-18 22:01:34 +00:00
Gleb Smirnoff
9004dbddea Avoid OOB reads in memmem(3).
commit 51bdcdc424bd7169c8cccdc2de7cad17f5ea0f70
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Fri Jun 30 00:35:33 2017 +0300

    fix OOB reads in Xbyte_memmem

    Reported by Leah Neukirchen.

Reviewed by:	emaste
Approved by:	re (kib)
2018-10-15 20:20:57 +00:00
Pedro F. Giffuni
d915a14ef0 libc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-25 17:12:48 +00:00
Ed Maste
01dc206b22 libc: add reference to two-way algorithm and bad shift table in memmem/strstr
Requested by:	ed
2017-03-18 00:53:24 +00:00
Ed Maste
88521634e9 libc: Use musl's O(n) memmem and strstr
It is O(n) in the length of the haystack (big) string, and has special
cases for short needle (little) strings, of one to four bytes, to avoid
excessive overhead.

There are a small set of nearly trivial cases where the startup overhead
of the musl implementation makes it slightly slower -- for example, a 31
byte needle that matches the beginning of the haystack.  It's faster for
non-trivial cases, and significantly so for inputs that trigger worst-
case behaviour of the previous implementation.  As an example, in my
tests a 16K needle that matches the end of a 64K haystack is nearly
2000x faster with this implementation.

Reviewed by:	bapt (earlier), ed (earlier)
Obtained from:	musl (snapshot at commit c718f9fc)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D2601
2017-03-18 00:51:39 +00:00
Ed Maste
16150352f5 memmem(3): empty little string matches the beginning of the big string
This function originated in glibc, and this matches their behaviour
(and NetBSD, OpenBSD, and musl).

An empty big string (arg "l") is handled by the existing
l_len < s_len test.

Reviewed by:	bapt, ngie
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D2657
2015-05-26 21:16:07 +00:00
Daniel Gerzo
bd604b4b06 - ANSIfy function definitions
- use nul when we are looking for a terminating character where appropriate

Approved by:	imp
2009-02-03 17:58:20 +00:00
Andre Oppermann
6050c8fe05 Add the function memmem(3) as found in glibc and others.
It is the binary equivalent to strstr(3).

 void *memmem(const void *big, size_t big_len,
	const void *little, size_t little_len);

Submitted by:	Pascal Gloor <pascal.gloor at spale.com>
MFC after:	3 days
2005-08-25 18:26:58 +00:00