Vendor import of xz-5.2.5 (trimmed).

2020-03-21 19:13:22 +00:00 · 2020-03-21 19:13:22 +00:00 · b89a971493
commit b89a971493
parent 9657691eff
72 changed files with 2011 additions and 462 deletions
--- a/1332
+++ b/1332
--- a/114
+++ b/114
@ -9,7 +9,7 @@ XZ Utils
       1.3. Documentation for liblzma
    2. Version numbering
    3. Reporting bugs
-    4. Translating the xz tool
+    4. Translations
    5. Other implementations of the .xz format
    6. Contact information

@ -55,9 +55,11 @@ XZ Utils
    Similarly, it is possible that some day there is a filter that will
    compress better than LZMA2.

-    XZ Utils doesn't support multithreaded compression or decompression
-    yet. It has been planned though and taken into account when designing
-    the .xz file format.
+    XZ Utils supports multithreaded compression. XZ Utils doesn't support
+    multithreaded decompression yet. It has been planned though and taken
+    into account when designing the .xz file format. In the future, files
+    that were created in threaded mode can be decompressed in threaded
+    mode too.


 1. Documentation
@ -103,14 +105,13 @@ XZ Utils
    and data type as Doxygen tags. These docs should be quite OK as
    a quick reference.

-    I have planned to write a bunch of very well documented example
-    programs, which (due to comments) should work as a tutorial to
-    various features of liblzma. No such example programs have been
-    written yet.
+    There are a few example/tutorial programs that should help in
+    getting started with liblzma. In the source package the examples
+    are in "doc/examples" and in binary packages they may be under
+    "examples" in the same directory as this README.

-    For now, if you have never used liblzma, libbzip2, or zlib, I
-    recommend learning the *basics* of the zlib API. Once you know that,
-    it should be easier to learn liblzma.
+    Since the liblzma API has similarities to the zlib API, some people
+    may find it useful to read the zlib docs and tutorial too:

        http://zlib.net/manual.html
        http://zlib.net/zlib_how.html
@ -192,91 +193,18 @@ XZ Utils
    system.


-4. Translating the xz tool
--------------------------
+4. Translations
+---------------

-    The messages from the xz tool have been translated into a few
-    languages. Before starting to translate into a new language, ask
-    the author whether someone else hasn't already started working on it.
+    The xz command line tool and all man pages can be translated.
+    The translations are handled via the Translation Project. If you
+    wish to help translating xz, please join the Translation Project:

-    Test your translation. Testing includes comparing the translated
-    output to the original English version by running the same commands
-    in both your target locale and with LC_ALL=C. Ask someone to
-    proof-read and test the translation.
+        https://translationproject.org/html/translators.html

-    Testing can be done e.g. by installing xz into a temporary directory:
-
-        ./configure --disable-shared --prefix=/tmp/xz-test
-        # <Edit the .po file in the po directory.>
-        make -C po update-po
-        make install
-        bash debug/translation.bash | less
-        bash debug/translation.bash | less -S  # For --list outputs
-
-    Repeat the above as needed (no need to re-run configure though).
-
-    Note especially the following:
-
-      - The output of --help and --long-help must look nice on
-        an 80-column terminal. It's OK to add extra lines if needed.
-
-      - In contrast, don't add extra lines to error messages and such.
-        They are often preceded with e.g. a filename on the same line,
-        so you have no way to predict where to put a \n. Let the terminal
-        do the wrapping even if it looks ugly. Adding new lines will be
-        even uglier in the generic case even if it looks nice in a few
-        limited examples.
-
-      - Be careful with column alignment in tables and table-like output
-        (--list, --list --verbose --verbose, --info-memory, --help, and
-        --long-help):
-
-          * All descriptions of options in --help should start in the
-            same column (but it doesn't need to be the same column as
-            in the English messages; just be consistent if you change it).
-            Check that both --help and --long-help look OK, since they
-            share several strings.
-
-          * --list --verbose and --info-memory print lines that have
-            the format "Description:   %s". If you need a longer
-            description, you can put extra space between the colon
-            and %s. Then you may need to add extra space to other
-            strings too so that the result as a whole looks good (all
-            values start at the same column).
-
-          * The columns of the actual tables in --list --verbose --verbose
-            should be aligned properly. Abbreviate if necessary. It might
-            be good to keep at least 2 or 3 spaces between column headings
-            and avoid spaces in the headings so that the columns stand out
-            better, but this is a matter of opinion. Do what you think
-            looks best.
-
-      - Be careful to put a period at the end of a sentence when the
-        original version has it, and don't put it when the original
-        doesn't have it. Similarly, be careful with \n characters
-        at the beginning and end of the strings.
-
-      - Read the TRANSLATORS comments that have been extracted from the
-        source code and included in xz.pot. If they suggest testing the
-        translation with some type of command, do it. If testing needs
-        input files, use e.g. tests/files/good-*.xz.
-
-      - When updating the translation, read the fuzzy (modified) strings
-        carefully, and don't mark them as updated before you actually
-        have updated them. Reading through the unchanged messages can be
-        good too; sometimes you may find a better wording for them.
-
-      - If you find language problems in the original English strings,
-        feel free to suggest improvements. Ask if something is unclear.
-
-      - The translated messages should be understandable (sometimes this
-        may be a problem with the original English messages too). Don't
-        make a direct word-by-word translation from English especially if
-        the result doesn't sound good in your language.
-
-    In short, take your time and pay attention to the details. Making
-    a good translation is not a quick and trivial thing to do. The
-    translated xz should look as polished as the English version.
+    Several strings will change in a future version of xz so if you
+    wish to start a new translation, look at the code in the xz git
+    repostiory instead of a 5.2.x release.


 5. Other implementations of the .xz format
--- a/10
+++ b/10
@ -23,6 +23,7 @@ has been important. :-) In alphabetical order:
  - Milo Casagrande
  - Marek Černocký
  - Tomer Chachamu
+  - Antoine Cœur
  - Gabi Davar
  - Chris Donawa
  - Andrew Dudman
@ -45,6 +46,7 @@ has been important. :-) In alphabetical order:
  - Peter Ivanov
  - Jouk Jansen
  - Jun I Jin
+  - Kiyoshi Kanazawa
  - Per Øyvind Karlsen
  - Thomas Klausner
  - Richard Koch
@ -58,10 +60,13 @@ has been important. :-) In alphabetical order:
  - Andraž 'ruskie' Levstik
  - Cary Lewis
  - Wim Lewis
+  - Xin Li
  - Eric Lindblad
  - Lorenzo De Liso
  - Bela Lubkin
  - Gregory Margo
+  - Julien Marrec
+  - Martin Matuška
  - Jim Meyering
  - Arkadiusz Miskiewicz
  - Conley Moorhous
@ -72,6 +77,7 @@ has been important. :-) In alphabetical order:
  - Jonathan Nieder
  - Andre Noll
  - Peter O'Gorman
+  - Filip Palian
  - Peter Pallinger
  - Rui Paulo
  - Igor Pavlov
@ -92,10 +98,12 @@ has been important. :-) In alphabetical order:
  - Alexandre Sauvé
  - Benno Schulenberg
  - Andreas Schwab
+  - Bhargava Shastry
  - Dan Shechter
  - Stuart Shelton
  - Sebastian Andrzej Siewior
  - Brad Smith
+  - Bruce Stark
  - Pippijn van Steenhoven
  - Jonathan Stott
  - Dan Stromberg
@ -103,9 +111,11 @@ has been important. :-) In alphabetical order:
  - Paul Townsend
  - Mohammed Adnène Trojette
  - Alexey Tourbin
+  - Loganaden Velvindron
  - Patrick J. Volkerding
  - Martin Väth
  - Adam Walling
+  - Jeffrey Walton
  - Christian Weisgerber
  - Bert Wesarg
  - Fredrik Wikstrom
--- a/src/common/sysdefs.h
+++ b/src/common/sysdefs.h
@ -44,9 +44,7 @@

 // Some pre-C99 systems have SIZE_MAX in limits.h instead of stdint.h. The
 // limits are also used to figure out some macros missing from pre-C99 systems.
-#ifdef HAVE_LIMITS_H
-#	include <limits.h>
-#endif
+#include <limits.h>

 // Be more compatible with systems that have non-conforming inttypes.h.
 // We assume that int is 32-bit and that long is either 32-bit or 64-bit.
@ -153,9 +151,7 @@ typedef unsigned char _Bool;

 // string.h should be enough but let's include strings.h and memory.h too if
 // they exists, since that shouldn't do any harm, but may improve portability.
-#ifdef HAVE_STRING_H
-#	include <string.h>
-#endif
+#include <string.h>

 #ifdef HAVE_STRINGS_H
 #	include <strings.h>
@ -193,7 +189,8 @@ typedef unsigned char _Bool;
 #	define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0]))
 #endif

-#if (__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4
+#if defined(__GNUC__) \
+		&& ((__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4)
 #	define lzma_attr_alloc_size(x) __attribute__((__alloc_size__(x)))
 #else
 #	define lzma_attr_alloc_size(x)
--- a/src/common/tuklib_cpucores.c
+++ b/src/common/tuklib_cpucores.c
@ -56,14 +56,14 @@ tuklib_cpucores(void)
 #elif defined(TUKLIB_CPUCORES_SCHED_GETAFFINITY)
 	cpu_set_t cpu_mask;
 	if (sched_getaffinity(0, sizeof(cpu_mask), &cpu_mask) == 0)
-		ret = CPU_COUNT(&cpu_mask);
+		ret = (uint32_t)CPU_COUNT(&cpu_mask);

 #elif defined(TUKLIB_CPUCORES_CPUSET)
 	cpuset_t set;
 	if (cpuset_getaffinity(CPU_LEVEL_WHICH, CPU_WHICH_PID, -1,
 			sizeof(set), &set) == 0) {
 #	ifdef CPU_COUNT
-		ret = CPU_COUNT(&set);
+		ret = (uint32_t)CPU_COUNT(&set);
 #	else
 		for (unsigned i = 0; i < CPU_SETSIZE; ++i)
 			if (CPU_ISSET(i, &set))
@ -77,7 +77,7 @@ tuklib_cpucores(void)
 	size_t cpus_size = sizeof(cpus);
 	if (sysctl(name, 2, &cpus, &cpus_size, NULL, 0) != -1
 			&& cpus_size == sizeof(cpus) && cpus > 0)
-		ret = cpus;
+		ret = (uint32_t)cpus;

 #elif defined(TUKLIB_CPUCORES_SYSCONF)
 #	ifdef _SC_NPROCESSORS_ONLN
@ -88,12 +88,12 @@ tuklib_cpucores(void)
 	const long cpus = sysconf(_SC_NPROC_ONLN);
 #	endif
 	if (cpus > 0)
-		ret = cpus;
+		ret = (uint32_t)cpus;

 #elif defined(TUKLIB_CPUCORES_PSTAT_GETDYNAMIC)
 	struct pst_dynamic pst;
 	if (pstat_getdynamic(&pst, sizeof(pst), 1, 0) != -1)
-		ret = pst.psd_proc_cnt;
+		ret = (uint32_t)pst.psd_proc_cnt;
 #endif

 	return ret;
--- a/src/common/tuklib_exit.c
+++ b/src/common/tuklib_exit.c
@ -14,6 +14,7 @@

 #include <stdlib.h>
 #include <stdio.h>
+#include <string.h>

 #include "tuklib_gettext.h"
 #include "tuklib_progname.h"
--- a/src/common/tuklib_integer.h
+++ b/src/common/tuklib_integer.h
@ -6,22 +6,26 @@
 /// This file provides macros or functions to do some basic integer and bit
 /// operations.
 ///
-/// Endianness related integer operations (XX = 16, 32, or 64; Y = b or l):
+/// Native endian inline functions (XX = 16, 32, or 64):
+///   - Unaligned native endian reads: readXXne(ptr)
+///   - Unaligned native endian writes: writeXXne(ptr, num)
+///   - Aligned native endian reads: aligned_readXXne(ptr)
+///   - Aligned native endian writes: aligned_writeXXne(ptr, num)
+///
+/// Endianness-converting integer operations (these can be macros!)
+/// (XX = 16, 32, or 64; Y = b or l):
 ///   - Byte swapping: bswapXX(num)
-///   - Byte order conversions to/from native: convXXYe(num)
-///   - Aligned reads: readXXYe(ptr)
-///   - Aligned writes: writeXXYe(ptr, num)
-///   - Unaligned reads (16/32-bit only): unaligned_readXXYe(ptr)
-///   - Unaligned writes (16/32-bit only): unaligned_writeXXYe(ptr, num)
+///   - Byte order conversions to/from native (byteswaps if Y isn't
+///     the native endianness): convXXYe(num)
+///   - Unaligned reads (16/32-bit only): readXXYe(ptr)
+///   - Unaligned writes (16/32-bit only): writeXXYe(ptr, num)
+///   - Aligned reads: aligned_readXXYe(ptr)
+///   - Aligned writes: aligned_writeXXYe(ptr, num)
 ///
-/// Since they can macros, the arguments should have no side effects since
-/// they may be evaluated more than once.
+/// Since the above can macros, the arguments should have no side effects
+/// because they may be evaluated more than once.
 ///
-/// \todo       PowerPC and possibly some other architectures support
-///             byte swapping load and store instructions. This file
-///             doesn't take advantage of those instructions.
-///
-/// Bit scan operations for non-zero 32-bit integers:
+/// Bit scan operations for non-zero 32-bit integers (inline functions):
 ///   - Bit scan reverse (find highest non-zero bit): bsr32(num)
 ///   - Count leading zeros: clz32(num)
 ///   - Count trailing zeros: ctz32(num)
@ -42,13 +46,26 @@
 #define TUKLIB_INTEGER_H

 #include "tuklib_common.h"
+#include <string.h>
+
+// Newer Intel C compilers require immintrin.h for _bit_scan_reverse()
+// and such functions.
+#if defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 1500)
+#	include <immintrin.h>
+#endif


-////////////////////////////////////////
-// Operating system specific features //
-////////////////////////////////////////
+///////////////////
+// Byte swapping //
+///////////////////

-#if defined(HAVE_BYTESWAP_H)
+#if defined(HAVE___BUILTIN_BSWAPXX)
+	// GCC >= 4.8 and Clang
+#	define bswap16(n) __builtin_bswap16(n)
+#	define bswap32(n) __builtin_bswap32(n)
+#	define bswap64(n) __builtin_bswap64(n)
+
+#elif defined(HAVE_BYTESWAP_H)
 	// glibc, uClibc, dietlibc
 #	include <byteswap.h>
 #	ifdef HAVE_BSWAP_16
@ -97,45 +114,33 @@
 #	endif
 #endif

-
-////////////////////////////////
-// Compiler-specific features //
-////////////////////////////////
-
-// Newer Intel C compilers require immintrin.h for _bit_scan_reverse()
-// and such functions.
-#if defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 1500)
-#	include <immintrin.h>
-#endif
-
-
-///////////////////
-// Byte swapping //
-///////////////////
-
 #ifndef bswap16
-#	define bswap16(num) \
-		(((uint16_t)(num) << 8) | ((uint16_t)(num) >> 8))
+#	define bswap16(n) (uint16_t)( \
+		  (((n) & 0x00FFU) << 8) \
+		| (((n) & 0xFF00U) >> 8) \
+	)
 #endif

 #ifndef bswap32
-#	define bswap32(num) \
-		( (((uint32_t)(num) << 24)                       ) \
-		| (((uint32_t)(num) <<  8) & UINT32_C(0x00FF0000)) \
-		| (((uint32_t)(num) >>  8) & UINT32_C(0x0000FF00)) \
-		| (((uint32_t)(num) >> 24)                       ) )
+#	define bswap32(n) (uint32_t)( \
+		  (((n) & UINT32_C(0x000000FF)) << 24) \
+		| (((n) & UINT32_C(0x0000FF00)) << 8) \
+		| (((n) & UINT32_C(0x00FF0000)) >> 8) \
+		| (((n) & UINT32_C(0xFF000000)) >> 24) \
+	)
 #endif

 #ifndef bswap64
-#	define bswap64(num) \
-		( (((uint64_t)(num) << 56)                               ) \
-		| (((uint64_t)(num) << 40) & UINT64_C(0x00FF000000000000)) \
-		| (((uint64_t)(num) << 24) & UINT64_C(0x0000FF0000000000)) \
-		| (((uint64_t)(num) <<  8) & UINT64_C(0x000000FF00000000)) \
-		| (((uint64_t)(num) >>  8) & UINT64_C(0x00000000FF000000)) \
-		| (((uint64_t)(num) >> 24) & UINT64_C(0x0000000000FF0000)) \
-		| (((uint64_t)(num) >> 40) & UINT64_C(0x000000000000FF00)) \
-		| (((uint64_t)(num) >> 56)                               ) )
+#	define bswap64(n) (uint64_t)( \
+		  (((n) & UINT64_C(0x00000000000000FF)) << 56) \
+		| (((n) & UINT64_C(0x000000000000FF00)) << 40) \
+		| (((n) & UINT64_C(0x0000000000FF0000)) << 24) \
+		| (((n) & UINT64_C(0x00000000FF000000)) << 8) \
+		| (((n) & UINT64_C(0x000000FF00000000)) >> 8) \
+		| (((n) & UINT64_C(0x0000FF0000000000)) >> 24) \
+		| (((n) & UINT64_C(0x00FF000000000000)) >> 40) \
+		| (((n) & UINT64_C(0xFF00000000000000)) >> 56) \
+	)
 #endif

 // Define conversion macros using the basic byte swapping macros.
@ -180,76 +185,76 @@
 #endif


-//////////////////////////////
-// Aligned reads and writes //
-//////////////////////////////
+////////////////////////////////
+// Unaligned reads and writes //
+////////////////////////////////
+
+// The traditional way of casting e.g. *(const uint16_t *)uint8_pointer
+// is bad even if the uint8_pointer is properly aligned because this kind
+// of casts break strict aliasing rules and result in undefined behavior.
+// With unaligned pointers it's even worse: compilers may emit vector
+// instructions that require aligned pointers even if non-vector
+// instructions work with unaligned pointers.
+//
+// Using memcpy() is the standard compliant way to do unaligned access.
+// Many modern compilers inline it so there is no function call overhead.
+// For those compilers that don't handle the memcpy() method well, the
+// old casting method (that violates strict aliasing) can be requested at
+// build time. A third method, casting to a packed struct, would also be
+// an option but isn't provided to keep things simpler (it's already a mess).
+// Hopefully this is flexible enough in practice.

 static inline uint16_t
-read16be(const uint8_t *buf)
+read16ne(const uint8_t *buf)
 {
-	uint16_t num = *(const uint16_t *)buf;
-	return conv16be(num);
-}
-
-
-static inline uint16_t
-read16le(const uint8_t *buf)
-{
-	uint16_t num = *(const uint16_t *)buf;
-	return conv16le(num);
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
+	return *(const uint16_t *)buf;
+#else
+	uint16_t num;
+	memcpy(&num, buf, sizeof(num));
+	return num;
+#endif
 }


 static inline uint32_t
-read32be(const uint8_t *buf)
+read32ne(const uint8_t *buf)
 {
-	uint32_t num = *(const uint32_t *)buf;
-	return conv32be(num);
-}
-
-
-static inline uint32_t
-read32le(const uint8_t *buf)
-{
-	uint32_t num = *(const uint32_t *)buf;
-	return conv32le(num);
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
+	return *(const uint32_t *)buf;
+#else
+	uint32_t num;
+	memcpy(&num, buf, sizeof(num));
+	return num;
+#endif
 }


 static inline uint64_t
-read64be(const uint8_t *buf)
+read64ne(const uint8_t *buf)
 {
-	uint64_t num = *(const uint64_t *)buf;
-	return conv64be(num);
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
+	return *(const uint64_t *)buf;
+#else
+	uint64_t num;
+	memcpy(&num, buf, sizeof(num));
+	return num;
+#endif
 }


-static inline uint64_t
-read64le(const uint8_t *buf)
-{
-	uint64_t num = *(const uint64_t *)buf;
-	return conv64le(num);
-}
-
-
-// NOTE: Possible byte swapping must be done in a macro to allow GCC
-// to optimize byte swapping of constants when using glibc's or *BSD's
-// byte swapping macros. The actual write is done in an inline function
-// to make type checking of the buf pointer possible similarly to readXXYe()
-// functions.
-
-#define write16be(buf, num) write16ne((buf), conv16be(num))
-#define write16le(buf, num) write16ne((buf), conv16le(num))
-#define write32be(buf, num) write32ne((buf), conv32be(num))
-#define write32le(buf, num) write32ne((buf), conv32le(num))
-#define write64be(buf, num) write64ne((buf), conv64be(num))
-#define write64le(buf, num) write64ne((buf), conv64le(num))
-
-
 static inline void
 write16ne(uint8_t *buf, uint16_t num)
 {
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
 	*(uint16_t *)buf = num;
+#else
+	memcpy(buf, &num, sizeof(num));
+#endif
 	return;
 }

@ -257,7 +262,12 @@ write16ne(uint8_t *buf, uint16_t num)
 static inline void
 write32ne(uint8_t *buf, uint32_t num)
 {
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
 	*(uint32_t *)buf = num;
+#else
+	memcpy(buf, &num, sizeof(num));
+#endif
 	return;
 }

@ -265,90 +275,114 @@ write32ne(uint8_t *buf, uint32_t num)
 static inline void
 write64ne(uint8_t *buf, uint64_t num)
 {
+#if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \
+		&& defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING)
 	*(uint64_t *)buf = num;
+#else
+	memcpy(buf, &num, sizeof(num));
+#endif
 	return;
 }


-////////////////////////////////
-// Unaligned reads and writes //
-////////////////////////////////
-
-// NOTE: TUKLIB_FAST_UNALIGNED_ACCESS indicates only support for 16-bit and
-// 32-bit unaligned integer loads and stores. It's possible that 64-bit
-// unaligned access doesn't work or is slower than byte-by-byte access.
-// Since unaligned 64-bit is probably not needed as often as 16-bit or
-// 32-bit, we simply don't support 64-bit unaligned access for now.
-#ifdef TUKLIB_FAST_UNALIGNED_ACCESS
-#	define unaligned_read16be read16be
-#	define unaligned_read16le read16le
-#	define unaligned_read32be read32be
-#	define unaligned_read32le read32le
-#	define unaligned_write16be write16be
-#	define unaligned_write16le write16le
-#	define unaligned_write32be write32be
-#	define unaligned_write32le write32le
-
-#else
-
 static inline uint16_t
-unaligned_read16be(const uint8_t *buf)
+read16be(const uint8_t *buf)
 {
+#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint16_t num = read16ne(buf);
+	return conv16be(num);
+#else
 	uint16_t num = ((uint16_t)buf[0] << 8) | (uint16_t)buf[1];
 	return num;
+#endif
 }


 static inline uint16_t
-unaligned_read16le(const uint8_t *buf)
+read16le(const uint8_t *buf)
 {
+#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint16_t num = read16ne(buf);
+	return conv16le(num);
+#else
 	uint16_t num = ((uint16_t)buf[0]) | ((uint16_t)buf[1] << 8);
 	return num;
+#endif
 }


 static inline uint32_t
-unaligned_read32be(const uint8_t *buf)
+read32be(const uint8_t *buf)
 {
+#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint32_t num = read32ne(buf);
+	return conv32be(num);
+#else
 	uint32_t num = (uint32_t)buf[0] << 24;
 	num |= (uint32_t)buf[1] << 16;
 	num |= (uint32_t)buf[2] << 8;
 	num |= (uint32_t)buf[3];
 	return num;
+#endif
 }


 static inline uint32_t
-unaligned_read32le(const uint8_t *buf)
+read32le(const uint8_t *buf)
 {
+#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint32_t num = read32ne(buf);
+	return conv32le(num);
+#else
 	uint32_t num = (uint32_t)buf[0];
 	num |= (uint32_t)buf[1] << 8;
 	num |= (uint32_t)buf[2] << 16;
 	num |= (uint32_t)buf[3] << 24;
 	return num;
+#endif
 }


+// NOTE: Possible byte swapping must be done in a macro to allow the compiler
+// to optimize byte swapping of constants when using glibc's or *BSD's
+// byte swapping macros. The actual write is done in an inline function
+// to make type checking of the buf pointer possible.
+#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+#	define write16be(buf, num) write16ne(buf, conv16be(num))
+#	define write32be(buf, num) write32ne(buf, conv32be(num))
+#endif
+
+#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+#	define write16le(buf, num) write16ne(buf, conv16le(num))
+#	define write32le(buf, num) write32ne(buf, conv32le(num))
+#endif
+
+
+#ifndef write16be
 static inline void
-unaligned_write16be(uint8_t *buf, uint16_t num)
+write16be(uint8_t *buf, uint16_t num)
 {
 	buf[0] = (uint8_t)(num >> 8);
 	buf[1] = (uint8_t)num;
 	return;
 }
+#endif


+#ifndef write16le
 static inline void
-unaligned_write16le(uint8_t *buf, uint16_t num)
+write16le(uint8_t *buf, uint16_t num)
 {
 	buf[0] = (uint8_t)num;
 	buf[1] = (uint8_t)(num >> 8);
 	return;
 }
+#endif


+#ifndef write32be
 static inline void
-unaligned_write32be(uint8_t *buf, uint32_t num)
+write32be(uint8_t *buf, uint32_t num)
 {
 	buf[0] = (uint8_t)(num >> 24);
 	buf[1] = (uint8_t)(num >> 16);
@ -356,10 +390,12 @@ unaligned_write32be(uint8_t *buf, uint32_t num)
 	buf[3] = (uint8_t)num;
 	return;
 }
+#endif


+#ifndef write32le
 static inline void
-unaligned_write32le(uint8_t *buf, uint32_t num)
+write32le(uint8_t *buf, uint32_t num)
 {
 	buf[0] = (uint8_t)num;
 	buf[1] = (uint8_t)(num >> 8);
@ -367,10 +403,184 @@ unaligned_write32le(uint8_t *buf, uint32_t num)
 	buf[3] = (uint8_t)(num >> 24);
 	return;
 }
-
 #endif


+//////////////////////////////
+// Aligned reads and writes //
+//////////////////////////////
+
+// Separate functions for aligned reads and writes are provided since on
+// strict-align archs aligned access is much faster than unaligned access.
+//
+// Just like in the unaligned case, memcpy() is needed to avoid
+// strict aliasing violations. However, on archs that don't support
+// unaligned access the compiler cannot know that the pointers given
+// to memcpy() are aligned which results in slow code. As of C11 there is
+// no standard way to tell the compiler that we know that the address is
+// aligned but some compilers have language extensions to do that. With
+// such language extensions the memcpy() method gives excellent results.
+//
+// What to do on a strict-align system when no known language extentensions
+// are available? Falling back to byte-by-byte access would be safe but ruin
+// optimizations that have been made specifically with aligned access in mind.
+// As a compromise, aligned reads will fall back to non-compliant type punning
+// but aligned writes will be byte-by-byte, that is, fast reads are preferred
+// over fast writes. This obviously isn't great but hopefully it's a working
+// compromise for now.
+//
+// __builtin_assume_aligned is support by GCC >= 4.7 and clang >= 3.6.
+#ifdef HAVE___BUILTIN_ASSUME_ALIGNED
+#	define tuklib_memcpy_aligned(dest, src, size) \
+		memcpy(dest, __builtin_assume_aligned(src, size), size)
+#else
+#	define tuklib_memcpy_aligned(dest, src, size) \
+		memcpy(dest, src, size)
+#	ifndef TUKLIB_FAST_UNALIGNED_ACCESS
+#		define TUKLIB_USE_UNSAFE_ALIGNED_READS 1
+#	endif
+#endif
+
+
+static inline uint16_t
+aligned_read16ne(const uint8_t *buf)
+{
+#if defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING) \
+		|| defined(TUKLIB_USE_UNSAFE_ALIGNED_READS)
+	return *(const uint16_t *)buf;
+#else
+	uint16_t num;
+	tuklib_memcpy_aligned(&num, buf, sizeof(num));
+	return num;
+#endif
+}
+
+
+static inline uint32_t
+aligned_read32ne(const uint8_t *buf)
+{
+#if defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING) \
+		|| defined(TUKLIB_USE_UNSAFE_ALIGNED_READS)
+	return *(const uint32_t *)buf;
+#else
+	uint32_t num;
+	tuklib_memcpy_aligned(&num, buf, sizeof(num));
+	return num;
+#endif
+}
+
+
+static inline uint64_t
+aligned_read64ne(const uint8_t *buf)
+{
+#if defined(TUKLIB_USE_UNSAFE_TYPE_PUNNING) \
+		|| defined(TUKLIB_USE_UNSAFE_ALIGNED_READS)
+	return *(const uint64_t *)buf;
+#else
+	uint64_t num;
+	tuklib_memcpy_aligned(&num, buf, sizeof(num));
+	return num;
+#endif
+}
+
+
+static inline void
+aligned_write16ne(uint8_t *buf, uint16_t num)
+{
+#ifdef TUKLIB_USE_UNSAFE_TYPE_PUNNING
+	*(uint16_t *)buf = num;
+#else
+	tuklib_memcpy_aligned(buf, &num, sizeof(num));
+#endif
+	return;
+}
+
+
+static inline void
+aligned_write32ne(uint8_t *buf, uint32_t num)
+{
+#ifdef TUKLIB_USE_UNSAFE_TYPE_PUNNING
+	*(uint32_t *)buf = num;
+#else
+	tuklib_memcpy_aligned(buf, &num, sizeof(num));
+#endif
+	return;
+}
+
+
+static inline void
+aligned_write64ne(uint8_t *buf, uint64_t num)
+{
+#ifdef TUKLIB_USE_UNSAFE_TYPE_PUNNING
+	*(uint64_t *)buf = num;
+#else
+	tuklib_memcpy_aligned(buf, &num, sizeof(num));
+#endif
+	return;
+}
+
+
+static inline uint16_t
+aligned_read16be(const uint8_t *buf)
+{
+	uint16_t num = aligned_read16ne(buf);
+	return conv16be(num);
+}
+
+
+static inline uint16_t
+aligned_read16le(const uint8_t *buf)
+{
+	uint16_t num = aligned_read16ne(buf);
+	return conv16le(num);
+}
+
+
+static inline uint32_t
+aligned_read32be(const uint8_t *buf)
+{
+	uint32_t num = aligned_read32ne(buf);
+	return conv32be(num);
+}
+
+
+static inline uint32_t
+aligned_read32le(const uint8_t *buf)
+{
+	uint32_t num = aligned_read32ne(buf);
+	return conv32le(num);
+}
+
+
+static inline uint64_t
+aligned_read64be(const uint8_t *buf)
+{
+	uint64_t num = aligned_read64ne(buf);
+	return conv64be(num);
+}
+
+
+static inline uint64_t
+aligned_read64le(const uint8_t *buf)
+{
+	uint64_t num = aligned_read64ne(buf);
+	return conv64le(num);
+}
+
+
+// These need to be macros like in the unaligned case.
+#define aligned_write16be(buf, num) aligned_write16ne((buf), conv16be(num))
+#define aligned_write16le(buf, num) aligned_write16ne((buf), conv16le(num))
+#define aligned_write32be(buf, num) aligned_write32ne((buf), conv32be(num))
+#define aligned_write32le(buf, num) aligned_write32ne((buf), conv32le(num))
+#define aligned_write64be(buf, num) aligned_write64ne((buf), conv64be(num))
+#define aligned_write64le(buf, num) aligned_write64ne((buf), conv64le(num))
+
+
+////////////////////
+// Bit operations //
+////////////////////
+
 static inline uint32_t
 bsr32(uint32_t n)
 {
@ -383,44 +593,42 @@ bsr32(uint32_t n)
 	// multiple architectures. On x86, __builtin_clz() ^ 31U becomes
 	// either plain BSR (so the XOR gets optimized away) or LZCNT and
 	// XOR (if -march indicates that SSE4a instructions are supported).
-	return __builtin_clz(n) ^ 31U;
+	return (uint32_t)__builtin_clz(n) ^ 31U;

 #elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
 	uint32_t i;
 	__asm__("bsrl %1, %0" : "=r" (i) : "rm" (n));
 	return i;

-#elif defined(_MSC_VER) && _MSC_VER >= 1400
-	// MSVC isn't supported by tuklib, but since this code exists,
-	// it doesn't hurt to have it here anyway.
-	uint32_t i;
-	_BitScanReverse((DWORD *)&i, n);
+#elif defined(_MSC_VER)
+	unsigned long i;
+	_BitScanReverse(&i, n);
 	return i;

 #else
 	uint32_t i = 31;

-	if ((n & UINT32_C(0xFFFF0000)) == 0) {
+	if ((n & 0xFFFF0000) == 0) {
 		n <<= 16;
 		i = 15;
 	}

-	if ((n & UINT32_C(0xFF000000)) == 0) {
+	if ((n & 0xFF000000) == 0) {
 		n <<= 8;
 		i -= 8;
 	}

-	if ((n & UINT32_C(0xF0000000)) == 0) {
+	if ((n & 0xF0000000) == 0) {
 		n <<= 4;
 		i -= 4;
 	}

-	if ((n & UINT32_C(0xC0000000)) == 0) {
+	if ((n & 0xC0000000) == 0) {
 		n <<= 2;
 		i -= 2;
 	}

-	if ((n & UINT32_C(0x80000000)) == 0)
+	if ((n & 0x80000000) == 0)
 		--i;

 	return i;
@ -435,7 +643,7 @@ clz32(uint32_t n)
 	return _bit_scan_reverse(n) ^ 31U;

 #elif TUKLIB_GNUC_REQ(3, 4) && UINT_MAX == UINT32_MAX
-	return __builtin_clz(n);
+	return (uint32_t)__builtin_clz(n);

 #elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
 	uint32_t i;
@ -444,35 +652,35 @@ clz32(uint32_t n)
 		: "=r" (i) : "rm" (n));
 	return i;

-#elif defined(_MSC_VER) && _MSC_VER >= 1400
-	uint32_t i;
-	_BitScanReverse((DWORD *)&i, n);
+#elif defined(_MSC_VER)
+	unsigned long i;
+	_BitScanReverse(&i, n);
 	return i ^ 31U;

 #else
 	uint32_t i = 0;

-	if ((n & UINT32_C(0xFFFF0000)) == 0) {
+	if ((n & 0xFFFF0000) == 0) {
 		n <<= 16;
 		i = 16;
 	}

-	if ((n & UINT32_C(0xFF000000)) == 0) {
+	if ((n & 0xFF000000) == 0) {
 		n <<= 8;
 		i += 8;
 	}

-	if ((n & UINT32_C(0xF0000000)) == 0) {
+	if ((n & 0xF0000000) == 0) {
 		n <<= 4;
 		i += 4;
 	}

-	if ((n & UINT32_C(0xC0000000)) == 0) {
+	if ((n & 0xC0000000) == 0) {
 		n <<= 2;
 		i += 2;
 	}

-	if ((n & UINT32_C(0x80000000)) == 0)
+	if ((n & 0x80000000) == 0)
 		++i;

 	return i;
@ -487,42 +695,42 @@ ctz32(uint32_t n)
 	return _bit_scan_forward(n);

 #elif TUKLIB_GNUC_REQ(3, 4) && UINT_MAX >= UINT32_MAX
-	return __builtin_ctz(n);
+	return (uint32_t)__builtin_ctz(n);

 #elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
 	uint32_t i;
 	__asm__("bsfl %1, %0" : "=r" (i) : "rm" (n));
 	return i;

-#elif defined(_MSC_VER) && _MSC_VER >= 1400
-	uint32_t i;
-	_BitScanForward((DWORD *)&i, n);
+#elif defined(_MSC_VER)
+	unsigned long i;
+	_BitScanForward(&i, n);
 	return i;

 #else
 	uint32_t i = 0;

-	if ((n & UINT32_C(0x0000FFFF)) == 0) {
+	if ((n & 0x0000FFFF) == 0) {
 		n >>= 16;
 		i = 16;
 	}

-	if ((n & UINT32_C(0x000000FF)) == 0) {
+	if ((n & 0x000000FF) == 0) {
 		n >>= 8;
 		i += 8;
 	}

-	if ((n & UINT32_C(0x0000000F)) == 0) {
+	if ((n & 0x0000000F) == 0) {
 		n >>= 4;
 		i += 4;
 	}

-	if ((n & UINT32_C(0x00000003)) == 0) {
+	if ((n & 0x00000003) == 0) {
 		n >>= 2;
 		i += 2;
 	}

-	if ((n & UINT32_C(0x00000001)) == 0)
+	if ((n & 0x00000001) == 0)
 		++i;

 	return i;
--- a/src/common/tuklib_mbstr.h
+++ b/src/common/tuklib_mbstr.h
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       tuklib_mstr.h
+/// \file       tuklib_mbstr.h
 /// \brief      Utility functions for handling multibyte strings
 ///
 /// If not enough multibyte string support is available in the C library,
--- a/src/common/tuklib_mbstr_fw.c
+++ b/src/common/tuklib_mbstr_fw.c
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       tuklib_mstr_fw.c
+/// \file       tuklib_mbstr_fw.c
 /// \brief      Get the field width for printf() e.g. to align table columns
 //
 //  Author:     Lasse Collin
--- a/src/common/tuklib_mbstr_width.c
+++ b/src/common/tuklib_mbstr_width.c
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       tuklib_mstr_width.c
+/// \file       tuklib_mbstr_width.c
 /// \brief      Calculate width of a multibyte string
 //
 //  Author:     Lasse Collin
@ -11,6 +11,7 @@
 ///////////////////////////////////////////////////////////////////////////////

 #include "tuklib_mbstr.h"
+#include <string.h>

 #if defined(HAVE_MBRTOWC) && defined(HAVE_WCWIDTH)
 #	include <wchar.h>
@ -50,7 +51,7 @@ tuklib_mbstr_width(const char *str, size_t *bytes)
 		if (wc_width < 0)
 			return (size_t)-1;

-		width += wc_width;
+		width += (size_t)wc_width;
 	}

 	// Require that the string ends in the initial shift state.
--- a/src/liblzma/api/lzma.h
+++ b/src/liblzma/api/lzma.h
@ -224,7 +224,8 @@
 #		else
 #			define lzma_nothrow throw()
 #		endif
-#	elif __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)
+#	elif defined(__GNUC__) && (__GNUC__ > 3 \
+			|| (__GNUC__ == 3 && __GNUC_MINOR__ >= 3))
 #		define lzma_nothrow __attribute__((__nothrow__))
 #	else
 #		define lzma_nothrow
@ -241,7 +242,7 @@
 * break anything if these are sometimes enabled and sometimes not, only
 * affects warnings and optimizations.
 */
-#if __GNUC__ >= 3
+#if defined(__GNUC__) && __GNUC__ >= 3
 #	ifndef lzma_attribute
 #		define lzma_attribute(attr) __attribute__(attr)
 #	endif
--- a/src/liblzma/api/lzma/block.h
+++ b/src/liblzma/api/lzma/block.h
@ -448,7 +448,7 @@ extern LZMA_API(lzma_vli) lzma_block_total_size(const lzma_block *block)
 *              - LZMA_MEM_ERROR
 *              - LZMA_OPTIONS_ERROR
 *              - LZMA_UNSUPPORTED_CHECK: block->check specifies a Check ID
- *                that is not supported by this buid of liblzma. Initializing
+ *                that is not supported by this build of liblzma. Initializing
 *                the encoder failed.
 *              - LZMA_PROG_ERROR
 */
--- a/src/liblzma/api/lzma/filter.h
+++ b/src/liblzma/api/lzma/filter.h
@ -341,9 +341,10 @@ extern LZMA_API(lzma_ret) lzma_properties_encode(
 * \param       filter      filter->id must have been set to the correct
 *                          Filter ID. filter->options doesn't need to be
 *                          initialized (it's not freed by this function). The
- *                          decoded options will be stored to filter->options.
- *                          filter->options is set to NULL if there are no
- *                          properties or if an error occurs.
+ *                          decoded options will be stored in filter->options;
+ *                          it's application's responsibility to free it when
+ *                          appropriate. filter->options is set to NULL if
+ *                          there are no properties or if an error occurs.
 * \param       allocator   Custom memory allocator used to allocate the
 *                          options. Set to NULL to use the default malloc(),
 *                          and in case of an error, also free().
--- a/src/liblzma/api/lzma/hardware.h
+++ b/src/liblzma/api/lzma/hardware.h
@ -6,7 +6,7 @@
 * ways to limit the resource usage. Applications linking against liblzma
 * need to do the actual decisions how much resources to let liblzma to use.
 * To ease making these decisions, liblzma provides functions to find out
- * the relevant capabilities of the underlaying hardware. Currently there
+ * the relevant capabilities of the underlying hardware. Currently there
 * is only a function to find out the amount of RAM, but in the future there
 * will be also a function to detect how many concurrent threads the system
 * can run.
--- a/src/liblzma/api/lzma/lzma12.h
+++ b/src/liblzma/api/lzma/lzma12.h
@ -301,7 +301,7 @@ typedef struct {
 	 * (2^ pb =2^2=4), which is often a good choice when there's
 	 * no better guess.
 	 *
-	 * When the aligment is known, setting pb accordingly may reduce
+	 * When the alignment is known, setting pb accordingly may reduce
 	 * the file size a little. E.g. with text files having one-byte
 	 * alignment (US-ASCII, ISO-8859-*, UTF-8), setting pb=0 can
 	 * improve compression slightly. For UTF-16 text, pb=1 is a good
--- a/src/liblzma/api/lzma/version.h
+++ b/src/liblzma/api/lzma/version.h
@ -22,7 +22,7 @@
 */
 #define LZMA_VERSION_MAJOR 5
 #define LZMA_VERSION_MINOR 2
-#define LZMA_VERSION_PATCH 4
+#define LZMA_VERSION_PATCH 5
 #define LZMA_VERSION_STABILITY LZMA_VERSION_STABILITY_STABLE

 #ifndef LZMA_VERSION_COMMIT
--- a/src/liblzma/api/lzma/vli.h
+++ b/src/liblzma/api/lzma/vli.h
@ -54,7 +54,7 @@
 *
 * Valid VLI values are in the range [0, LZMA_VLI_MAX]. Unknown value is
 * indicated with LZMA_VLI_UNKNOWN, which is the maximum value of the
- * underlaying integer type.
+ * underlying integer type.
 *
 * lzma_vli will be uint64_t for the foreseeable future. If a bigger size
 * is needed in the future, it is guaranteed that 2 * LZMA_VLI_MAX will
--- a/src/liblzma/check/crc32_fast.c
+++ b/src/liblzma/check/crc32_fast.c
@ -49,7 +49,7 @@ lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc)

 		// Calculate the CRC32 using the slice-by-eight algorithm.
 		while (buf < limit) {
-			crc ^= *(const uint32_t *)(buf);
+			crc ^= aligned_read32ne(buf);
 			buf += 4;

 			crc = lzma_crc32_table[7][A(crc)]
@ -57,7 +57,7 @@ lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc)
 			    ^ lzma_crc32_table[5][C(crc)]
 			    ^ lzma_crc32_table[4][D(crc)];

-			const uint32_t tmp = *(const uint32_t *)(buf);
+			const uint32_t tmp = aligned_read32ne(buf);
 			buf += 4;

 			// At least with some compilers, it is critical for
--- a/src/liblzma/check/crc32_table.c
+++ b/src/liblzma/check/crc32_table.c
@ -12,6 +12,9 @@

 #include "common.h"

+// Having the declaration here silences clang -Wmissing-variable-declarations.
+extern const uint32_t lzma_crc32_table[8][256];
+
 #ifdef WORDS_BIGENDIAN
 #	include "crc32_table_be.h"
 #else
--- a/src/liblzma/check/crc64_fast.c
+++ b/src/liblzma/check/crc64_fast.c
@ -47,9 +47,9 @@ lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
 		while (buf < limit) {
 #ifdef WORDS_BIGENDIAN
 			const uint32_t tmp = (crc >> 32)
-					^ *(const uint32_t *)(buf);
+					^ aligned_read32ne(buf);
 #else
-			const uint32_t tmp = crc ^ *(const uint32_t *)(buf);
+			const uint32_t tmp = crc ^ aligned_read32ne(buf);
 #endif
 			buf += 4;

--- a/src/liblzma/check/crc64_table.c
+++ b/src/liblzma/check/crc64_table.c
@ -12,6 +12,9 @@

 #include "common.h"

+// Having the declaration here silences clang -Wmissing-variable-declarations.
+extern const uint64_t lzma_crc64_table[4][256];
+
 #ifdef WORDS_BIGENDIAN
 #	include "crc64_table_be.h"
 #else
--- a/src/liblzma/common/alone_decoder.c
+++ b/src/liblzma/common/alone_decoder.c
@ -50,8 +50,7 @@ typedef struct {


 static lzma_ret
-alone_decode(void *coder_ptr,
-		const lzma_allocator *allocator lzma_attribute((__unused__)),
+alone_decode(void *coder_ptr, const lzma_allocator *allocator,
 		const uint8_t *restrict in, size_t *restrict in_pos,
 		size_t in_size, uint8_t *restrict out,
 		size_t *restrict out_pos, size_t out_size,
--- a/src/liblzma/common/alone_encoder.c
+++ b/src/liblzma/common/alone_encoder.c
@ -1,7 +1,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       alone_decoder.c
-/// \brief      Decoder for LZMA_Alone files
+/// \file       alone_encoder.c
+/// \brief      Encoder for LZMA_Alone files
 //
 //  Author:     Lasse Collin
 //
@ -31,8 +31,7 @@ typedef struct {


 static lzma_ret
-alone_encode(void *coder_ptr,
-		const lzma_allocator *allocator lzma_attribute((__unused__)),
+alone_encode(void *coder_ptr, const lzma_allocator *allocator,
 		const uint8_t *restrict in, size_t *restrict in_pos,
 		size_t in_size, uint8_t *restrict out,
 		size_t *restrict out_pos, size_t out_size,
@ -122,7 +121,7 @@ alone_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	if (d != UINT32_MAX)
 		++d;

-	unaligned_write32le(coder->header + 1, d);
+	write32le(coder->header + 1, d);

 	// - Uncompressed size (always unknown and using EOPM)
 	memset(coder->header + 1 + 4, 0xFF, 8);
--- a/src/liblzma/common/block_header_decoder.c
+++ b/src/liblzma/common/block_header_decoder.c
@ -67,7 +67,7 @@ lzma_block_header_decode(lzma_block *block,
 	const size_t in_size = block->header_size - 4;

 	// Verify CRC32
-	if (lzma_crc32(in, in_size, 0) != unaligned_read32le(in + in_size))
+	if (lzma_crc32(in, in_size, 0) != read32le(in + in_size))
 		return LZMA_DATA_ERROR;

 	// Check for unsupported flags.
@ -98,7 +98,7 @@ lzma_block_header_decode(lzma_block *block,
 		block->uncompressed_size = LZMA_VLI_UNKNOWN;

 	// Filter Flags
-	const size_t filter_count = (in[1] & 3) + 1;
+	const size_t filter_count = (in[1] & 3U) + 1;
 	for (size_t i = 0; i < filter_count; ++i) {
 		const lzma_ret ret = lzma_filter_flags_decode(
 				&block->filters[i], allocator,
--- a/src/liblzma/common/block_header_encoder.c
+++ b/src/liblzma/common/block_header_encoder.c
@ -126,7 +126,7 @@ lzma_block_header_encode(const lzma_block *block, uint8_t *out)
 	memzero(out + out_pos, out_size - out_pos);

 	// CRC32
-	unaligned_write32le(out + out_size, lzma_crc32(out, out_size, 0));
+	write32le(out + out_size, lzma_crc32(out, out_size, 0));

 	return LZMA_OK;
 }
--- a/src/liblzma/common/block_util.c
+++ b/src/liblzma/common/block_util.c
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       block_header.c
+/// \file       block_util.c
 /// \brief      Utility functions to handle lzma_block
 //
 //  Author:     Lasse Collin
--- a/src/liblzma/common/common.c
+++ b/src/liblzma/common/common.c
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       common.h
+/// \file       common.c
 /// \brief      Common functions needed in many places in liblzma
 //
 //  Author:     Lasse Collin
@ -99,7 +99,11 @@ lzma_bufcpy(const uint8_t *restrict in, size_t *restrict in_pos,
 	const size_t out_avail = out_size - *out_pos;
 	const size_t copy_size = my_min(in_avail, out_avail);

-	memcpy(out + *out_pos, in + *in_pos, copy_size);
+	// Call memcpy() only if there is something to copy. If there is
+	// nothing to copy, in or out might be NULL and then the memcpy()
+	// call would trigger undefined behavior.
+	if (copy_size > 0)
+		memcpy(out + *out_pos, in + *in_pos, copy_size);

 	*in_pos += copy_size;
 	*out_pos += copy_size;
--- a/src/liblzma/common/filter_common.h
+++ b/src/liblzma/common/filter_common.h
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       filter_common.c
+/// \file       filter_common.h
 /// \brief      Filter-specific stuff common for both encoder and decoder
 //
 //  Author:     Lasse Collin
--- a/src/liblzma/common/filter_decoder.h
+++ b/src/liblzma/common/filter_decoder.h
@ -1,6 +1,6 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
-/// \file       filter_decoder.c
+/// \file       filter_decoder.h
 /// \brief      Filter ID mapping to filter-specific functions
 //
 //  Author:     Lasse Collin
--- a/src/liblzma/common/filter_flags_encoder.c
+++ b/src/liblzma/common/filter_flags_encoder.c
@ -1,7 +1,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
 /// \file       filter_flags_encoder.c
-/// \brief      Decodes a Filter Flags field
+/// \brief      Encodes a Filter Flags field
 //
 //  Author:     Lasse Collin
 //
--- a/src/liblzma/common/hardware_physmem.c
+++ b/src/liblzma/common/hardware_physmem.c
@ -19,7 +19,7 @@ extern LZMA_API(uint64_t)
 lzma_physmem(void)
 {
 	// It is simpler to make lzma_physmem() a wrapper for
-	// tuklib_physmem() than to hack appropriate symbol visiblity
+	// tuklib_physmem() than to hack appropriate symbol visibility
 	// support for the tuklib modules.
 	return tuklib_physmem();
 }
--- a/src/liblzma/common/index.c
+++ b/src/liblzma/common/index.c
@ -105,7 +105,7 @@ typedef struct {


 typedef struct {
-	/// Every index_stream is a node in the tree of Sreams.
+	/// Every index_stream is a node in the tree of Streams.
 	index_tree_node node;

 	/// Number of this Stream (first one is 1)
@ -166,7 +166,7 @@ struct lzma_index_s {
 	lzma_vli index_list_size;

 	/// How many Records to allocate at once in lzma_index_append().
-	/// This defaults to INDEX_GROUP_SIZE but can be overriden with
+	/// This defaults to INDEX_GROUP_SIZE but can be overridden with
 	/// lzma_index_prealloc().
 	size_t prealloc;

@ -825,8 +825,8 @@ lzma_index_cat(lzma_index *restrict dest, lzma_index *restrict src,
 				s->groups.root = &newg->node;
 			}

-			if (s->groups.rightmost == &g->node)
-				s->groups.rightmost = &newg->node;
+			assert(s->groups.rightmost == &g->node);
+			s->groups.rightmost = &newg->node;

 			lzma_free(g, allocator);

--- a/src/liblzma/common/memcmplen.h
+++ b/src/liblzma/common/memcmplen.h
@ -61,8 +61,7 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 	// to __builtin_clzll().
 #define LZMA_MEMCMPLEN_EXTRA 8
 	while (len < limit) {
-		const uint64_t x = *(const uint64_t *)(buf1 + len)
-				- *(const uint64_t *)(buf2 + len);
+		const uint64_t x = read64ne(buf1 + len) - read64ne(buf2 + len);
 		if (x != 0) {
 #	if defined(_M_X64) // MSVC or Intel C compiler on Windows
 			unsigned long tmp;
@ -99,15 +98,7 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 			_mm_loadu_si128((const __m128i *)(buf2 + len))));

 		if (x != 0) {
-#	if defined(__INTEL_COMPILER)
-			len += _bit_scan_forward(x);
-#	elif defined(_MSC_VER)
-			unsigned long tmp;
-			_BitScanForward(&tmp, x);
-			len += tmp;
-#	else
-			len += __builtin_ctz(x);
-#	endif
+			len += ctz32(x);
 			return my_min(len, limit);
 		}

@ -120,8 +111,7 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 	// Generic 32-bit little endian method
 #	define LZMA_MEMCMPLEN_EXTRA 4
 	while (len < limit) {
-		uint32_t x = *(const uint32_t *)(buf1 + len)
-				- *(const uint32_t *)(buf2 + len);
+		uint32_t x = read32ne(buf1 + len) - read32ne(buf2 + len);
 		if (x != 0) {
 			if ((x & 0xFFFF) == 0) {
 				len += 2;
@ -143,8 +133,7 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 	// Generic 32-bit big endian method
 #	define LZMA_MEMCMPLEN_EXTRA 4
 	while (len < limit) {
-		uint32_t x = *(const uint32_t *)(buf1 + len)
-				^ *(const uint32_t *)(buf2 + len);
+		uint32_t x = read32ne(buf1 + len) ^ read32ne(buf2 + len);
 		if (x != 0) {
 			if ((x & 0xFFFF0000) == 0) {
 				len += 2;
--- a/src/liblzma/common/stream_encoder_mt.c
+++ b/src/liblzma/common/stream_encoder_mt.c
@ -700,7 +700,7 @@ stream_encode_mt(void *coder_ptr, const lzma_allocator *allocator,
 				ret = coder->thread_error;
 				if (ret != LZMA_OK) {
 					assert(ret != LZMA_STREAM_END);
-					break;
+					break; // Break out of mythread_sync.
 				}

 				// Try to read compressed data to out[].
@ -958,7 +958,7 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	// Validate the filter chain so that we can give an error in this
 	// function instead of delaying it to the first call to lzma_code().
 	// The memory usage calculation verifies the filter chain as
-	// a side effect so we take advatange of that.
+	// a side effect so we take advantage of that.
 	if (lzma_raw_encoder_memusage(filters) == UINT64_MAX)
 		return LZMA_OPTIONS_ERROR;

--- a/src/liblzma/common/stream_flags_decoder.c
+++ b/src/liblzma/common/stream_flags_decoder.c
@ -38,7 +38,7 @@ lzma_stream_header_decode(lzma_stream_flags *options, const uint8_t *in)
 	// and unsupported files.
 	const uint32_t crc = lzma_crc32(in + sizeof(lzma_header_magic),
 			LZMA_STREAM_FLAGS_SIZE, 0);
-	if (crc != unaligned_read32le(in + sizeof(lzma_header_magic)
+	if (crc != read32le(in + sizeof(lzma_header_magic)
 			+ LZMA_STREAM_FLAGS_SIZE))
 		return LZMA_DATA_ERROR;

@ -67,7 +67,7 @@ lzma_stream_footer_decode(lzma_stream_flags *options, const uint8_t *in)
 	// CRC32
 	const uint32_t crc = lzma_crc32(in + sizeof(uint32_t),
 			sizeof(uint32_t) + LZMA_STREAM_FLAGS_SIZE, 0);
-	if (crc != unaligned_read32le(in))
+	if (crc != read32le(in))
 		return LZMA_DATA_ERROR;

 	// Stream Flags
@ -75,7 +75,7 @@ lzma_stream_footer_decode(lzma_stream_flags *options, const uint8_t *in)
 		return LZMA_OPTIONS_ERROR;

 	// Backward Size
-	options->backward_size = unaligned_read32le(in + sizeof(uint32_t));
+	options->backward_size = read32le(in + sizeof(uint32_t));
 	options->backward_size = (options->backward_size + 1) * 4;

 	return LZMA_OK;
--- a/src/liblzma/common/stream_flags_encoder.c
+++ b/src/liblzma/common/stream_flags_encoder.c
@ -46,8 +46,8 @@ lzma_stream_header_encode(const lzma_stream_flags *options, uint8_t *out)
 	const uint32_t crc = lzma_crc32(out + sizeof(lzma_header_magic),
 			LZMA_STREAM_FLAGS_SIZE, 0);

-	unaligned_write32le(out + sizeof(lzma_header_magic)
-			+ LZMA_STREAM_FLAGS_SIZE, crc);
+	write32le(out + sizeof(lzma_header_magic) + LZMA_STREAM_FLAGS_SIZE,
+			crc);

 	return LZMA_OK;
 }
@ -66,7 +66,7 @@ lzma_stream_footer_encode(const lzma_stream_flags *options, uint8_t *out)
 	if (!is_backward_size_valid(options))
 		return LZMA_PROG_ERROR;

-	unaligned_write32le(out + 4, options->backward_size / 4 - 1);
+	write32le(out + 4, options->backward_size / 4 - 1);

 	// Stream Flags
 	if (stream_flags_encode(options, out + 2 * 4))
@ -76,7 +76,7 @@ lzma_stream_footer_encode(const lzma_stream_flags *options, uint8_t *out)
 	const uint32_t crc = lzma_crc32(
 			out + 4, 4 + LZMA_STREAM_FLAGS_SIZE, 0);

-	unaligned_write32le(out, crc);
+	write32le(out, crc);

 	// Magic
 	memcpy(out + 2 * 4 + LZMA_STREAM_FLAGS_SIZE,
--- a/src/liblzma/common/vli_decoder.c
+++ b/src/liblzma/common/vli_decoder.c
@ -72,7 +72,7 @@ lzma_vli_decode(lzma_vli *restrict vli, size_t *vli_pos,
 		// corrupt.
 		//
 		// If we need bigger integers in future, old versions liblzma
-		// will confusingly indicate the file being corrupt istead of
+		// will confusingly indicate the file being corrupt instead of
 		// unsupported. I suppose it's still better this way, because
 		// in the foreseeable future (writing this in 2008) the only
 		// reason why files would appear having over 63-bit integers
--- a/src/liblzma/delta/delta_decoder.c
+++ b/src/liblzma/delta/delta_decoder.c
@ -70,7 +70,7 @@ lzma_delta_props_decode(void **options, const lzma_allocator *allocator,
 		return LZMA_MEM_ERROR;

 	opt->type = LZMA_DELTA_TYPE_BYTE;
-	opt->dist = props[0] + 1;
+	opt->dist = props[0] + 1U;

 	*options = opt;

--- a/src/liblzma/lz/lz_decoder.c
+++ b/src/liblzma/lz/lz_decoder.c
@ -91,11 +91,17 @@ decode_buffer(lzma_coder *coder,
 				in, in_pos, in_size);

 		// Copy the decoded data from the dictionary to the out[]
-		// buffer.
+		// buffer. Do it conditionally because out can be NULL
+		// (in which case copy_size is always 0). Calling memcpy()
+		// with a null-pointer is undefined even if the third
+		// argument is 0.
 		const size_t copy_size = coder->dict.pos - dict_start;
 		assert(copy_size <= out_size - *out_pos);
-		memcpy(out + *out_pos, coder->dict.buf + dict_start,
-				copy_size);
+
+		if (copy_size > 0)
+			memcpy(out + *out_pos, coder->dict.buf + dict_start,
+					copy_size);
+
 		*out_pos += copy_size;

 		// Reset the dictionary if so requested by coder->lz.code().
@ -125,8 +131,7 @@ decode_buffer(lzma_coder *coder,


 static lzma_ret
-lz_decode(void *coder_ptr,
-		const lzma_allocator *allocator lzma_attribute((__unused__)),
+lz_decode(void *coder_ptr, const lzma_allocator *allocator,
 		const uint8_t *restrict in, size_t *restrict in_pos,
 		size_t in_size, uint8_t *restrict out,
 		size_t *restrict out_pos, size_t out_size,
@ -241,7 +246,7 @@ lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	if (lz_options.dict_size < 4096)
 		lz_options.dict_size = 4096;

-	// Make dictionary size a multipe of 16. Some LZ-based decoders like
+	// Make dictionary size a multiple of 16. Some LZ-based decoders like
 	// LZMA use the lowest bits lzma_dict.pos to know the alignment of the
 	// data. Aligned buffer is also good when memcpying from the
 	// dictionary to the output buffer, since applications are
--- a/src/liblzma/lz/lz_encoder_hash.h
+++ b/src/liblzma/lz/lz_encoder_hash.h
@ -39,7 +39,7 @@
 // Endianness doesn't matter in hash_2_calc() (no effect on the output).
 #ifdef TUKLIB_FAST_UNALIGNED_ACCESS
 #	define hash_2_calc() \
-		const uint32_t hash_value = *(const uint16_t *)(cur)
+		const uint32_t hash_value = read16ne(cur)
 #else
 #	define hash_2_calc() \
 		const uint32_t hash_value \
--- a/src/liblzma/lz/lz_encoder_mf.c
+++ b/src/liblzma/lz/lz_encoder_mf.c
@ -113,7 +113,7 @@ normalize(lzma_mf *mf)
 	// may be match finders that use larger resolution than one byte.
 	const uint32_t subvalue
 			= (MUST_NORMALIZE_POS - mf->cyclic_size);
-				// & (~(UINT32_C(1) << 10) - 1);
+				// & ~((UINT32_C(1) << 10) - 1);

 	for (uint32_t i = 0; i < mf->hash_count; ++i) {
 		// If the distance is greater than the dictionary size,
--- a/src/liblzma/lzma/fastpos.h
+++ b/src/liblzma/lzma/fastpos.h
@ -101,7 +101,7 @@ extern const uint8_t lzma_fastpos[1 << FASTPOS_BITS];
 	(UINT32_C(1) << (FASTPOS_BITS + fastpos_shift(extra, n)))

 #define fastpos_result(dist, extra, n) \
-	lzma_fastpos[(dist) >> fastpos_shift(extra, n)] \
+	(uint32_t)(lzma_fastpos[(dist) >> fastpos_shift(extra, n)]) \
 			+ 2 * fastpos_shift(extra, n)


--- a/src/liblzma/lzma/fastpos_tablegen.c
+++ b/src/liblzma/lzma/fastpos_tablegen.c
@ -11,7 +11,6 @@
 //
 ///////////////////////////////////////////////////////////////////////////////

-#include <sys/types.h>
 #include <inttypes.h>
 #include <stdio.h>
 #include "fastpos.h"
--- a/src/liblzma/lzma/lzma2_decoder.c
+++ b/src/liblzma/lzma/lzma2_decoder.c
@ -136,7 +136,7 @@ lzma2_decode(void *coder_ptr, lzma_dict *restrict dict,
 		break;

 	case SEQ_UNCOMPRESSED_2:
-		coder->uncompressed_size += in[(*in_pos)++] + 1;
+		coder->uncompressed_size += in[(*in_pos)++] + 1U;
 		coder->sequence = SEQ_COMPRESSED_0;
 		coder->lzma.set_uncompressed(coder->lzma.coder,
 				coder->uncompressed_size);
@ -148,7 +148,7 @@ lzma2_decode(void *coder_ptr, lzma_dict *restrict dict,
 		break;

 	case SEQ_COMPRESSED_1:
-		coder->compressed_size += in[(*in_pos)++] + 1;
+		coder->compressed_size += in[(*in_pos)++] + 1U;
 		coder->sequence = coder->next_sequence;
 		break;

@ -297,8 +297,8 @@ lzma_lzma2_props_decode(void **options, const lzma_allocator *allocator,
 	if (props[0] == 40) {
 		opt->dict_size = UINT32_MAX;
 	} else {
-		opt->dict_size = 2 | (props[0] & 1);
-		opt->dict_size <<= props[0] / 2 + 11;
+		opt->dict_size = 2 | (props[0] & 1U);
+		opt->dict_size <<= props[0] / 2U + 11;
 	}

 	opt->preset_dict = NULL;
--- a/src/liblzma/lzma/lzma_common.h
+++ b/src/liblzma/lzma/lzma_common.h
@ -122,7 +122,8 @@ typedef enum {
 ///     byte; and
 ///   - the highest literal_context_bits bits of the previous byte.
 #define literal_subcoder(probs, lc, lp_mask, pos, prev_byte) \
-	((probs)[(((pos) & lp_mask) << lc) + ((prev_byte) >> (8 - lc))])
+	((probs)[(((pos) & (lp_mask)) << (lc)) \
+			+ ((uint32_t)(prev_byte) >> (8U - (lc)))])


 static inline void
--- a/src/liblzma/lzma/lzma_decoder.c
+++ b/src/liblzma/lzma/lzma_decoder.c
@ -398,7 +398,7 @@ lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr,
 				// ("match byte") to "len" to minimize the
 				// number of variables we need to store
 				// between decoder calls.
-				len = dict_get(&dict, rep0) << 1;
+				len = (uint32_t)(dict_get(&dict, rep0)) << 1;

 				// The usage of "offset" allows omitting some
 				// branches, which should give tiny speed
@ -569,7 +569,7 @@ lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr,
 #ifdef HAVE_SMALL
 					do {
 						rc_bit(probs[symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_DIST_MODEL);
 					} while (++offset < limit);
 #else
@ -577,25 +577,25 @@ lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr,
 					case 5:
 						assert(offset == 0);
 						rc_bit(probs[symbol], ,
-							rep0 += 1,
+							rep0 += 1U,
 							SEQ_DIST_MODEL);
 						++offset;
 						--limit;
 					case 4:
 						rc_bit(probs[symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_DIST_MODEL);
 						++offset;
 						--limit;
 					case 3:
 						rc_bit(probs[symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_DIST_MODEL);
 						++offset;
 						--limit;
 					case 2:
 						rc_bit(probs[symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_DIST_MODEL);
 						++offset;
 						--limit;
@ -607,7 +607,7 @@ lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr,
 						// the unneeded updating of
 						// "symbol".
 						rc_bit_last(probs[symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_DIST_MODEL);
 					}
 #endif
@ -635,7 +635,7 @@ lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr,
 					do {
 						rc_bit(coder->pos_align[
 								symbol], ,
-							rep0 += 1 << offset,
+							rep0 += 1U << offset,
 							SEQ_ALIGN);
 					} while (++offset < ALIGN_BITS);
 #else
@ -1049,7 +1049,7 @@ lzma_lzma_props_decode(void **options, const lzma_allocator *allocator,
 	// All dictionary sizes are accepted, including zero. LZ decoder
 	// will automatically use a dictionary at least a few KiB even if
 	// a smaller dictionary is requested.
-	opt->dict_size = unaligned_read32le(props + 1);
+	opt->dict_size = read32le(props + 1);

 	opt->preset_dict = NULL;
 	opt->preset_dict_size = 0;
--- a/src/liblzma/lzma/lzma_encoder.c
+++ b/src/liblzma/lzma/lzma_encoder.c
@ -663,7 +663,7 @@ lzma_lzma_props_encode(const void *options, uint8_t *out)
 	if (lzma_lzma_lclppb_encode(opt, out))
 		return LZMA_PROG_ERROR;

-	unaligned_write32le(out + 1, opt->dict_size);
+	write32le(out + 1, opt->dict_size);

 	return LZMA_OK;
 }
--- a/src/liblzma/lzma/lzma_encoder_optimum_normal.c
+++ b/src/liblzma/lzma/lzma_encoder_optimum_normal.c
@ -636,9 +636,10 @@ helper2(lzma_lzma1_encoder *coder, uint32_t *reps, const uint8_t *buf,
 		uint32_t len_test_2 = len_test + 1;
 		const uint32_t limit = my_min(buf_avail_full,
 				len_test_2 + nice_len);
-		for (; len_test_2 < limit
-				&& buf[len_test_2] == buf_back[len_test_2];
-				++len_test_2) ;
+		// NOTE: len_test_2 may be greater than limit so the call to
+		// lzma_memcmplen() must be done conditionally.
+		if (len_test_2 < limit)
+			len_test_2 = lzma_memcmplen(buf, buf_back, len_test_2, limit);

 		len_test_2 -= len_test + 1;

@ -732,9 +733,12 @@ helper2(lzma_lzma1_encoder *coder, uint32_t *reps, const uint8_t *buf,
 				const uint32_t limit = my_min(buf_avail_full,
 						len_test_2 + nice_len);

-				for (; len_test_2 < limit &&
-						buf[len_test_2] == buf_back[len_test_2];
-						++len_test_2) ;
+				// NOTE: len_test_2 may be greater than limit
+				// so the call to lzma_memcmplen() must be
+				// done conditionally.
+				if (len_test_2 < limit)
+					len_test_2 = lzma_memcmplen(buf, buf_back,
+							len_test_2, limit);

 				len_test_2 -= len_test + 1;

--- a/src/liblzma/lzma/lzma_encoder_private.h
+++ b/src/liblzma/lzma/lzma_encoder_private.h
@ -25,8 +25,7 @@
 // MATCH_LEN_MIN bytes. Unaligned access gives tiny gain so there's no
 // reason to not use it when it is supported.
 #ifdef TUKLIB_FAST_UNALIGNED_ACCESS
-#	define not_equal_16(a, b) \
-		(*(const uint16_t *)(a) != *(const uint16_t *)(b))
+#	define not_equal_16(a, b) (read16ne(a) != read16ne(b))
 #else
 #	define not_equal_16(a, b) \
 		((a)[0] != (b)[0] || (a)[1] != (b)[1])
--- a/src/liblzma/simple/arm.c
+++ b/src/liblzma/simple/arm.c
@ -22,9 +22,9 @@ arm_code(void *simple lzma_attribute((__unused__)),
 	size_t i;
 	for (i = 0; i + 4 <= size; i += 4) {
 		if (buffer[i + 3] == 0xEB) {
-			uint32_t src = (buffer[i + 2] << 16)
-					| (buffer[i + 1] << 8)
-					| (buffer[i + 0]);
+			uint32_t src = ((uint32_t)(buffer[i + 2]) << 16)
+					| ((uint32_t)(buffer[i + 1]) << 8)
+					| (uint32_t)(buffer[i + 0]);
 			src <<= 2;

 			uint32_t dest;
--- a/src/liblzma/simple/armthumb.c
+++ b/src/liblzma/simple/armthumb.c
@ -23,10 +23,10 @@ armthumb_code(void *simple lzma_attribute((__unused__)),
 	for (i = 0; i + 4 <= size; i += 2) {
 		if ((buffer[i + 1] & 0xF8) == 0xF0
 				&& (buffer[i + 3] & 0xF8) == 0xF8) {
-			uint32_t src = ((buffer[i + 1] & 0x7) << 19)
-					| (buffer[i + 0] << 11)
-					| ((buffer[i + 3] & 0x7) << 8)
-					| (buffer[i + 2]);
+			uint32_t src = (((uint32_t)(buffer[i + 1]) & 7) << 19)
+				| ((uint32_t)(buffer[i + 0]) << 11)
+				| (((uint32_t)(buffer[i + 3]) & 7) << 8)
+				| (uint32_t)(buffer[i + 2]);

 			src <<= 1;

--- a/src/liblzma/simple/ia64.c
+++ b/src/liblzma/simple/ia64.c
@ -70,7 +70,7 @@ ia64_code(void *simple lzma_attribute((__unused__)),
 				inst_norm |= (uint64_t)(dest & 0x100000)
 						<< (36 - 20);

-				instruction &= (1 << bit_res) - 1;
+				instruction &= (1U << bit_res) - 1;
 				instruction |= (inst_norm << bit_res);

 				for (size_t j = 0; j < 6; j++)
--- a/src/liblzma/simple/powerpc.c
+++ b/src/liblzma/simple/powerpc.c
@ -25,10 +25,11 @@ powerpc_code(void *simple lzma_attribute((__unused__)),
 		if ((buffer[i] >> 2) == 0x12
 				&& ((buffer[i + 3] & 3) == 1)) {

-			const uint32_t src = ((buffer[i + 0] & 3) << 24)
-					| (buffer[i + 1] << 16)
-					| (buffer[i + 2] << 8)
-					| (buffer[i + 3] & (~3));
+			const uint32_t src
+				= (((uint32_t)(buffer[i + 0]) & 3) << 24)
+				| ((uint32_t)(buffer[i + 1]) << 16)
+				| ((uint32_t)(buffer[i + 2]) << 8)
+				| ((uint32_t)(buffer[i + 3]) & ~UINT32_C(3));

 			uint32_t dest;
 			if (is_encoder)
--- a/src/liblzma/simple/simple_coder.c
+++ b/src/liblzma/simple/simple_coder.c
@ -118,7 +118,15 @@ simple_code(void *coder_ptr, const lzma_allocator *allocator,
 		// coder->pos and coder->size yet. This way the coder can be
 		// restarted if the next filter in the chain returns e.g.
 		// LZMA_MEM_ERROR.
-		memcpy(out + *out_pos, coder->buffer + coder->pos, buf_avail);
+		//
+		// Do the memcpy() conditionally because out can be NULL
+		// (in which case buf_avail is always 0). Calling memcpy()
+		// with a null-pointer is undefined even if the third
+		// argument is 0.
+		if (buf_avail > 0)
+			memcpy(out + *out_pos, coder->buffer + coder->pos,
+					buf_avail);
+
 		*out_pos += buf_avail;

 		// Copy/Encode/Decode more data to out[].
--- a/src/liblzma/simple/simple_decoder.c
+++ b/src/liblzma/simple/simple_decoder.c
@ -28,7 +28,7 @@ lzma_simple_props_decode(void **options, const lzma_allocator *allocator,
 	if (opt == NULL)
 		return LZMA_MEM_ERROR;

-	opt->start_offset = unaligned_read32le(props);
+	opt->start_offset = read32le(props);

 	// Don't leave an options structure allocated if start_offset is zero.
 	if (opt->start_offset == 0)
--- a/src/liblzma/simple/simple_encoder.c
+++ b/src/liblzma/simple/simple_encoder.c
@ -32,7 +32,7 @@ lzma_simple_props_encode(const void *options, uint8_t *out)
 	if (opt == NULL || opt->start_offset == 0)
 		return LZMA_OK;

-	unaligned_write32le(out, opt->start_offset);
+	write32le(out, opt->start_offset);

 	return LZMA_OK;
 }
--- a/src/liblzma/simple/x86.c
+++ b/src/liblzma/simple/x86.c
@ -97,7 +97,7 @@ x86_code(void *simple_ptr, uint32_t now_pos, bool is_encoder,
 				if (!Test86MSByte(b))
 					break;

-				src = dest ^ ((1 << (32 - i * 8)) - 1);
+				src = dest ^ ((1U << (32 - i * 8)) - 1);
 			}

 			buffer[buffer_pos + 4]
--- a/src/xz/args.c
+++ b/src/xz/args.c
@ -88,7 +88,7 @@ parse_block_list(char *str)
 			// There is no string, that is, a comma follows
 			// another comma. Use the previous value.
 			//
-			// NOTE: We checked earler that the first char
+			// NOTE: We checked earlier that the first char
 			// of the whole list cannot be a comma.
 			assert(i > 0);
 			opt_block_list[i] = opt_block_list[i - 1];
@ -218,7 +218,7 @@ parse_real(args_info *args, int argc, char **argv)
 		// Compression preset (also for decompression if --format=raw)
 		case '0': case '1': case '2': case '3': case '4':
 		case '5': case '6': case '7': case '8': case '9':
-			coder_set_preset(c - '0');
+			coder_set_preset((uint32_t)(c - '0'));
 			break;

 		// --memlimit-compress
@ -683,7 +683,7 @@ args_parse(args_info *args, int argc, char **argv)
 		// We got at least one filename from the command line, or
 		// --files or --files0 was specified.
 		args->arg_names = argv + optind;
-		args->arg_count = argc - optind;
+		args->arg_count = (unsigned int)(argc - optind);
 	}

 	return;
--- a/src/xz/coder.c
+++ b/src/xz/coder.c
@ -612,6 +612,20 @@ split_block(uint64_t *block_remaining,
 }


+static bool
+coder_write_output(file_pair *pair)
+{
+	if (opt_mode != MODE_TEST) {
+		if (io_write(pair, &out_buf, IO_BUFFER_SIZE - strm.avail_out))
+			return true;
+	}
+
+	strm.next_out = out_buf.u8;
+	strm.avail_out = IO_BUFFER_SIZE;
+	return false;
+}
+
+
 /// Compress or decompress using liblzma.
 static bool
 coder_normal(file_pair *pair)
@ -635,7 +649,7 @@ coder_normal(file_pair *pair)
 	// only a single block is created.
 	uint64_t block_remaining = UINT64_MAX;

-	// next_block_remining for when we are in single-threaded mode and
+	// next_block_remaining for when we are in single-threaded mode and
 	// the Block in --block-list is larger than the --block-size=SIZE.
 	uint64_t next_block_remaining = 0;

@ -697,7 +711,7 @@ coder_normal(file_pair *pair)
 					action = LZMA_FULL_BARRIER;
 			}

-			if (action == LZMA_RUN && flush_needed)
+			if (action == LZMA_RUN && pair->flush_needed)
 				action = LZMA_SYNC_FLUSH;
 		}

@ -706,29 +720,23 @@ coder_normal(file_pair *pair)

 		// Write out if the output buffer became full.
 		if (strm.avail_out == 0) {
-			if (opt_mode != MODE_TEST && io_write(pair, &out_buf,
-					IO_BUFFER_SIZE - strm.avail_out))
+			if (coder_write_output(pair))
 				break;
-
-			strm.next_out = out_buf.u8;
-			strm.avail_out = IO_BUFFER_SIZE;
 		}

 		if (ret == LZMA_STREAM_END && (action == LZMA_SYNC_FLUSH
 				|| action == LZMA_FULL_BARRIER)) {
 			if (action == LZMA_SYNC_FLUSH) {
 				// Flushing completed. Write the pending data
-				// out immediatelly so that the reading side
+				// out immediately so that the reading side
 				// can decompress everything compressed so far.
-				if (io_write(pair, &out_buf, IO_BUFFER_SIZE
-						- strm.avail_out))
+				if (coder_write_output(pair))
 					break;

-				strm.next_out = out_buf.u8;
-				strm.avail_out = IO_BUFFER_SIZE;
-
-				// Set the time of the most recent flushing.
-				mytime_set_flush_time();
+				// Mark that we haven't seen any new input
+				// since the previous flush.
+				pair->src_has_seen_input = false;
+				pair->flush_needed = false;
 			} else {
 				// Start a new Block after LZMA_FULL_BARRIER.
 				if (opt_block_list == NULL) {
@ -758,9 +766,7 @@ coder_normal(file_pair *pair)
 				// as much data as possible, which can be good
 				// when trying to get at least some useful
 				// data out of damaged files.
-				if (opt_mode != MODE_TEST && io_write(pair,
-						&out_buf, IO_BUFFER_SIZE
-							- strm.avail_out))
+				if (coder_write_output(pair))
 					break;
 			}

@ -897,21 +903,23 @@ coder_run(const char *filename)
 			// is used.
 			if (opt_mode == MODE_TEST || !io_open_dest(pair)) {
 				// Remember the current time. It is needed
-				// for progress indicator and for timed
-				// flushing.
+				// for progress indicator.
 				mytime_set_start_time();

 				// Initialize the progress indicator.
+				const bool is_passthru = init_ret
+						== CODER_INIT_PASSTHRU;
 				const uint64_t in_size
-						= pair->src_st.st_size <= 0
-						? 0 : pair->src_st.st_size;
-				message_progress_start(&strm, in_size);
+					= pair->src_st.st_size <= 0
+					? 0 : (uint64_t)(pair->src_st.st_size);
+				message_progress_start(&strm,
+						is_passthru, in_size);

 				// Do the actual coding or passthru.
-				if (init_ret == CODER_INIT_NORMAL)
-					success = coder_normal(pair);
-				else
+				if (is_passthru)
 					success = coder_passthru(pair);
+				else
+					success = coder_normal(pair);

 				message_progress_end(success);
 			}
--- a/src/xz/file_io.c
+++ b/src/xz/file_io.c
@ -170,8 +170,11 @@ static void
 io_sandbox_enter(int src_fd)
 {
 	if (!sandbox_allowed) {
-		message(V_DEBUG, _("Sandbox is disabled due "
-				"to incompatible command line arguments"));
+		// This message is more often annoying than useful so
+		// it's commented out. It can be useful when developing
+		// the sandboxing code.
+		//message(V_DEBUG, _("Sandbox is disabled due "
+		//		"to incompatible command line arguments"));
 		return;
 	}

@ -213,7 +216,8 @@ io_sandbox_enter(int src_fd)
 #	error ENABLE_SANDBOX is defined but no sandboxing method was found.
 #endif

-	message(V_DEBUG, _("Sandbox was successfully enabled"));
+	// This message is annoying in xz -lvv.
+	//message(V_DEBUG, _("Sandbox was successfully enabled"));
 	return;

 error:
@ -266,11 +270,8 @@ io_wait(file_pair *pair, int timeout, bool is_reading)
 			return IO_WAIT_ERROR;
 		}

-		if (ret == 0) {
-			assert(opt_flush_timeout != 0);
-			flush_needed = true;
+		if (ret == 0)
 			return IO_WAIT_TIMEOUT;
-		}

 		if (pfd[0].revents != 0)
 			return IO_WAIT_MORE;
@ -360,13 +361,14 @@ io_copy_attrs(const file_pair *pair)
 	// Try changing the owner of the file. If we aren't root or the owner
 	// isn't already us, fchown() probably doesn't succeed. We warn
 	// about failing fchown() only if we are root.
-	if (fchown(pair->dest_fd, pair->src_st.st_uid, -1) && warn_fchown)
+	if (fchown(pair->dest_fd, pair->src_st.st_uid, (gid_t)(-1))
+			&& warn_fchown)
 		message_warning(_("%s: Cannot set the file owner: %s"),
 				pair->dest_name, strerror(errno));

 	mode_t mode;

-	if (fchown(pair->dest_fd, -1, pair->src_st.st_gid)) {
+	if (fchown(pair->dest_fd, (uid_t)(-1), pair->src_st.st_gid)) {
 		message_warning(_("%s: Cannot set the file group: %s"),
 				pair->dest_name, strerror(errno));
 		// We can still safely copy some additional permissions:
@ -751,6 +753,8 @@ io_open_src(const char *src_name)
 		.src_fd = -1,
 		.dest_fd = -1,
 		.src_eof = false,
+		.src_has_seen_input = false,
+		.flush_needed = false,
 		.dest_try_sparse = false,
 		.dest_pending_sparse = 0,
 	};
@ -1109,16 +1113,16 @@ io_fix_src_pos(file_pair *pair, size_t rewind_size)


 extern size_t
-io_read(file_pair *pair, io_buf *buf_union, size_t size)
+io_read(file_pair *pair, io_buf *buf, size_t size)
 {
 	// We use small buffers here.
 	assert(size < SSIZE_MAX);

-	uint8_t *buf = buf_union->u8;
-	size_t left = size;
+	size_t pos = 0;

-	while (left > 0) {
-		const ssize_t amount = read(pair->src_fd, buf, left);
+	while (pos < size) {
+		const ssize_t amount = read(
+				pair->src_fd, buf->u8 + pos, size - pos);

 		if (amount == 0) {
 			pair->src_eof = true;
@ -1135,10 +1139,15 @@ io_read(file_pair *pair, io_buf *buf_union, size_t size)

 #ifndef TUKLIB_DOSLIKE
 			if (IS_EAGAIN_OR_EWOULDBLOCK(errno)) {
-				const io_wait_ret ret = io_wait(pair,
-						mytime_get_flush_timeout(),
-						true);
-				switch (ret) {
+				// Disable the flush-timeout if no input has
+				// been seen since the previous flush and thus
+				// there would be nothing to flush after the
+				// timeout expires (avoids busy waiting).
+				const int timeout = pair->src_has_seen_input
+						? mytime_get_flush_timeout()
+						: -1;
+
+				switch (io_wait(pair, timeout, true)) {
 				case IO_WAIT_MORE:
 					continue;

@ -1146,7 +1155,8 @@ io_read(file_pair *pair, io_buf *buf_union, size_t size)
 					return SIZE_MAX;

 				case IO_WAIT_TIMEOUT:
-					return size - left;
+					pair->flush_needed = true;
+					return pos;

 				default:
 					message_bug();
@ -1160,11 +1170,15 @@ io_read(file_pair *pair, io_buf *buf_union, size_t size)
 			return SIZE_MAX;
 		}

-		buf += (size_t)(amount);
-		left -= (size_t)(amount);
+		pos += (size_t)(amount);
+
+		if (!pair->src_has_seen_input) {
+			pair->src_has_seen_input = true;
+			mytime_set_flush_time();
+		}
 	}

-	return size - left;
+	return pos;
 }


@ -1272,8 +1286,15 @@ io_write(file_pair *pair, const io_buf *buf, size_t size)
 		// if the file ends with sparse block, we must also return
 		// if size == 0 to avoid doing the lseek().
 		if (size == IO_BUFFER_SIZE) {
-			if (is_sparse(buf)) {
-				pair->dest_pending_sparse += size;
+			// Even if the block was sparse, treat it as non-sparse
+			// if the pending sparse amount is large compared to
+			// the size of off_t. In practice this only matters
+			// on 32-bit systems where off_t isn't always 64 bits.
+			const off_t pending_max
+				= (off_t)(1) << (sizeof(off_t) * CHAR_BIT - 2);
+			if (is_sparse(buf) && pair->dest_pending_sparse
+					< pending_max) {
+				pair->dest_pending_sparse += (off_t)(size);
 				return false;
 			}
 		} else if (size == 0) {
--- a/src/xz/file_io.h
+++ b/src/xz/file_io.h
@ -20,7 +20,10 @@


 /// is_sparse() accesses the buffer as uint64_t for maximum speed.
-/// Use an union to make sure that the buffer is properly aligned.
+/// The u32 and u64 members must only be access through this union
+/// to avoid strict aliasing violations. Taking a pointer of u8
+/// should be fine as long as uint8_t maps to unsigned char which
+/// can alias anything.
 typedef union {
 	uint8_t u8[IO_BUFFER_SIZE];
 	uint32_t u32[IO_BUFFER_SIZE / sizeof(uint32_t)];
@ -46,6 +49,13 @@ typedef struct {
 	/// True once end of the source file has been detected.
 	bool src_eof;

+	/// For --flush-timeout: True if at least one byte has been read
+	/// since the previous flush or the start of the file.
+	bool src_has_seen_input;
+
+	/// For --flush-timeout: True when flushing is needed.
+	bool flush_needed;
+
 	/// If true, we look for long chunks of zeros and try to create
 	/// a sparse file.
 	bool dest_try_sparse;
--- a/src/xz/main.c
+++ b/src/xz/main.c
@ -159,7 +159,7 @@ main(int argc, char **argv)
 	// Initialize handling of error/warning/other messages.
 	message_init();

-	// Set hardware-dependent default values. These can be overriden
+	// Set hardware-dependent default values. These can be overridden
 	// on the command line, thus this must be done before args_parse().
 	hardware_init();

@ -326,5 +326,5 @@ main(int argc, char **argv)
 	if (es == E_WARNING && no_warn)
 		es = E_SUCCESS;

-	tuklib_exit(es, E_ERROR, message_verbosity_get() != V_SILENT);
+	tuklib_exit((int)es, E_ERROR, message_verbosity_get() != V_SILENT);
 }
--- a/src/xz/message.c
+++ b/src/xz/message.c
@ -56,6 +56,11 @@ static bool progress_active = false;
 /// Pointer to lzma_stream used to do the encoding or decoding.
 static lzma_stream *progress_strm;

+/// This is true if we are in passthru mode (not actually compressing or
+/// decompressing) and thus cannot use lzma_get_progress(progress_strm, ...).
+/// That is, we are using coder_passthru() in coder.c.
+static bool progress_is_from_passthru;
+
 /// Expected size of the input stream is needed to show completion percentage
 /// and estimate remaining time.
 static uint64_t expected_in_size;
@ -241,11 +246,12 @@ message_filename(const char *src_name)


 extern void
-message_progress_start(lzma_stream *strm, uint64_t in_size)
+message_progress_start(lzma_stream *strm, bool is_passthru, uint64_t in_size)
 {
 	// Store the pointer to the lzma_stream used to do the coding.
 	// It is needed to find out the position in the stream.
 	progress_strm = strm;
+	progress_is_from_passthru = is_passthru;

 	// Store the expected size of the file. If we aren't printing any
 	// statistics, then is will be unused. But since it is possible
@ -434,8 +440,8 @@ progress_remaining(uint64_t in_pos, uint64_t elapsed)
 	// Calculate the estimate. Don't give an estimate of zero seconds,
 	// since it is possible that all the input has been already passed
 	// to the library, but there is still quite a bit of output pending.
-	uint32_t remaining = (double)(expected_in_size - in_pos)
-			* ((double)(elapsed) / 1000.0) / (double)(in_pos);
+	uint32_t remaining = (uint32_t)((double)(expected_in_size - in_pos)
+			* ((double)(elapsed) / 1000.0) / (double)(in_pos));
 	if (remaining < 1)
 		remaining = 1;

@ -507,7 +513,15 @@ progress_pos(uint64_t *in_pos,
 		uint64_t *compressed_pos, uint64_t *uncompressed_pos)
 {
 	uint64_t out_pos;
-	lzma_get_progress(progress_strm, in_pos, &out_pos);
+	if (progress_is_from_passthru) {
+		// In passthru mode the progress info is in total_in/out but
+		// the *progress_strm itself isn't initialized and thus we
+		// cannot use lzma_get_progress().
+		*in_pos = progress_strm->total_in;
+		out_pos = progress_strm->total_out;
+	} else {
+		lzma_get_progress(progress_strm, in_pos, &out_pos);
+	}

 	// It cannot have processed more input than it has been given.
 	assert(*in_pos <= progress_strm->total_in);
--- a/src/xz/message.h
+++ b/src/xz/message.h
@ -150,7 +150,8 @@ extern void message_filename(const char *src_name);
 /// \param      strm      Pointer to lzma_stream used for the coding.
 /// \param      in_size   Size of the input file, or zero if unknown.
 ///
-extern void message_progress_start(lzma_stream *strm, uint64_t in_size);
+extern void message_progress_start(lzma_stream *strm,
+		bool is_passthru, uint64_t in_size);


 /// Update the progress info if in verbose mode and enough time has passed
--- a/src/xz/mytime.c
+++ b/src/xz/mytime.c
@ -17,7 +17,6 @@
 #endif

 uint64_t opt_flush_timeout = 0;
-bool flush_needed;

 static uint64_t start_time;
 static uint64_t next_flush;
@ -39,11 +38,11 @@ mytime_now(void)
 	while (clock_gettime(clk_id, &tv))
 		clk_id = CLOCK_REALTIME;

-	return (uint64_t)(tv.tv_sec) * UINT64_C(1000) + tv.tv_nsec / 1000000;
+	return (uint64_t)tv.tv_sec * 1000 + (uint64_t)(tv.tv_nsec / 1000000);
 #else
 	struct timeval tv;
 	gettimeofday(&tv, NULL);
-	return (uint64_t)(tv.tv_sec) * UINT64_C(1000) + tv.tv_usec / 1000;
+	return (uint64_t)tv.tv_sec * 1000 + (uint64_t)(tv.tv_usec / 1000);
 #endif
 }

@ -52,8 +51,6 @@ extern void
 mytime_set_start_time(void)
 {
 	start_time = mytime_now();
-	next_flush = start_time + opt_flush_timeout;
-	flush_needed = false;
 	return;
 }

@ -69,7 +66,6 @@ extern void
 mytime_set_flush_time(void)
 {
 	next_flush = mytime_now() + opt_flush_timeout;
-	flush_needed = false;
 	return;
 }

--- a/src/xz/mytime.h
+++ b/src/xz/mytime.h
@ -21,10 +21,6 @@
 extern uint64_t opt_flush_timeout;


-/// \brief      True when flushing is needed due to expired timeout
-extern bool flush_needed;
-
-
 /// \brief      Store the time when (de)compression was started
 ///
 /// The start time is also stored as the time of the first flush.
@ -43,5 +39,5 @@ extern void mytime_set_flush_time(void);
 ///
 /// This returns -1 if no timed flushing is used.
 ///
-/// The return value is inteded for use with poll().
+/// The return value is intended for use with poll().
 extern int mytime_get_flush_timeout(void);
--- a/src/xz/options.c
+++ b/src/xz/options.c
@ -258,7 +258,7 @@ set_lzma(void *options, unsigned key, uint64_t value, const char *valuestr)
 		if (valuestr[0] < '0' || valuestr[0] > '9')
 			error_lzma_preset(valuestr);

-		uint32_t preset = valuestr[0] - '0';
+		uint32_t preset = (uint32_t)(valuestr[0] - '0');

 		// Currently only "e" is supported as a modifier,
 		// so keep this simple for now.
--- a/src/xz/private.h
+++ b/src/xz/private.h
@ -1,7 +1,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
 /// \file       private.h
-/// \brief      Common includes, definions, and prototypes
+/// \brief      Common includes, definitions, and prototypes
 //
 //  Author:     Lasse Collin
 //
--- a/src/xz/signals.c
+++ b/src/xz/signals.c
@ -23,7 +23,7 @@ volatile sig_atomic_t user_abort = false;
 /// been done.
 static volatile sig_atomic_t exit_signal = 0;

-/// Mask of signals for which have have established a signal handler to set
+/// Mask of signals for which we have established a signal handler to set
 /// user_abort to true.
 static sigset_t hooked_signals;

@ -152,7 +152,7 @@ signals_unblock(void)
 extern void
 signals_exit(void)
 {
-	const int sig = exit_signal;
+	const int sig = (int)exit_signal;

 	if (sig != 0) {
 #if defined(TUKLIB_DOSLIKE) || defined(__VMS)
@ -166,7 +166,7 @@ signals_exit(void)
 		sigfillset(&sa.sa_mask);
 		sa.sa_flags = 0;
 		sigaction(sig, &sa, NULL);
-		raise(exit_signal);
+		raise(sig);
 #endif
 	}

--- a/src/xz/util.c
+++ b/src/xz/util.c
@ -79,7 +79,7 @@ str_to_uint64(const char *name, const char *value, uint64_t min, uint64_t max)
 		result *= 10;

 		// Another overflow check
-		const uint32_t add = *value - '0';
+		const uint32_t add = (uint32_t)(*value - '0');
 		if (UINT64_MAX - add < result)
 			goto error;

@ -142,14 +142,24 @@ round_up_to_mib(uint64_t n)
 }


-/// Check if thousand separator is supported. Run-time checking is easiest,
-/// because it seems to be sometimes lacking even on POSIXish system.
+/// Check if thousands separator is supported. Run-time checking is easiest
+/// because it seems to be sometimes lacking even on a POSIXish system.
+/// Note that trying to use thousands separators when snprintf() doesn't
+/// support them results in undefined behavior. This just has happened to
+/// work well enough in practice.
+///
+/// DJGPP 2.05 added support for thousands separators but it's broken
+/// at least under WinXP with Finnish locale that uses a non-breaking space
+/// as the thousands separator. Workaround by disabling thousands separators
+/// for DJGPP builds.
 static void
 check_thousand_sep(uint32_t slot)
 {
 	if (thousand == UNKNOWN) {
 		bufs[slot][0] = '\0';
+#ifndef __DJGPP__
 		snprintf(bufs[slot], sizeof(bufs[slot]), "%'u", 1U);
+#endif
 		thousand = bufs[slot][0] == '1' ? WORKS : BROKEN;
 	}

@ -243,7 +253,7 @@ my_snprintf(char **pos, size_t *left, const char *fmt, ...)
 		*left = 0;
 	} else {
 		*pos += len;
-		*left -= len;
+		*left -= (size_t)(len);
 	}

 	return;
--- a/src/xz/xz.1
+++ b/src/xz/xz.1
@ -5,7 +5,7 @@
 .\" This file has been put into the public domain.
 .\" You can do whatever you want with this file.
 .\"
-.TH XZ 1 "2017-04-19" "Tukaani" "XZ Utils"
+.TH XZ 1 "2020-02-01" "Tukaani" "XZ Utils"
 .
 .SH NAME
 xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
@ -1071,7 +1071,7 @@ if using more threads would exceed the memory usage limit.
 Currently the only threading method is to split the input into
 blocks and compress them independently from each other.
 The default block size depends on the compression level and
-can be overriden with the
+can be overridden with the
 .BI \-\-block\-size= size
 option.
 .IP ""
@ -1570,7 +1570,7 @@ The old BCJ filters will still be useful in embedded systems,
 because the decoder of the new filter will be bigger
 and use more memory.
 .IP ""
-Different instruction sets have have different alignment:
+Different instruction sets have different alignment:
 .RS
 .RS
 .PP
--- a/src/xzdec/xzdec.c
+++ b/src/xzdec/xzdec.c
@ -37,7 +37,7 @@

 /// Error messages are suppressed if this is zero, which is the case when
 /// --quiet has been given at least twice.
-static unsigned int display_errors = 2;
+static int display_errors = 2;


 static void lzma_attribute((__format__(__printf__, 1, 2)))