cf1eeb33be
This speeds up buildworld by 16% on my system and buildkernel by 35%. Rather than calling mkdep(1), which is just a wrapper around 'cc -E', use the modern -MD -MT -MF flags to gather and generate dependencies during compilation. This flag was introduced in GCC "a long time ago", in GCC 3.0, and is also supported by Clang. (It appears that ICC also supports this but I do not have access to test it). This avoids running the preprocessor *twice* for every build, in both 'make depend' and 'make all'. This is especially noticeable when using ccache since it does not cache preprocessor results from mkdep(1) / 'cc -E', but still speeds up compilation with the -MD flags. For 'make depend' a tree-walk is still done to ensure that all DPSRCS are generated when expected, and that beforedepend/afterdepend and _EXTRADEPEND are all still respected. In time this may change but for now I've been conservative. The time for a tree-walk with -j combined with SUBDIR_PARALLEL is not significant. For example, it takes about 9 seconds with -j15 to walk all of src/ for 'make depend' now on my system. A .depend file is still generated with the various rules that apply to the final target, or custom rules. Otherwise there are now per-built-object-file .depend files, such as .depend.filename.o. These are included directly by make rather than populating .depend with a loop and .depend lines, which only added overhead to the now almost-NOP 'make depend' phase. Before this I experimented with having mkdep(1) called in parallel per-file. While this improved the kernel and lib/libc 'make depend' phase, it resulted in slower build times overall. The -M flags are removed from CFLAGS when linking since they have no effect. Enabling this by default, for src or out-of-src, can be done once more testing has been done, such as a ports exp-run, and with more compilers. The system I used for testing was: WITNESS Build options: -j20 WITH_LLDB=yes WITH_DEBUG_FILES=yes WITH_FAST_DEPEND=yes DISK: ZFS 3-way mirror with very slow disks using SSD l2arc/log. The arc was fully populated with src tree files. RAM: 76GiB CPU: Intel(R) Xeon(R) CPU L5520 @2.27GHz 2 package(s) x 4 core(s) x 2 SMT threads = hw.ncpu=16 buildworld: x buildworld-before + buildworld-fastdep +-------------------------------------------------------------------------------+ |+ | |+ | |+ xx x| | |_MA___|| |A | +-------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 3744.13 3794.31 3752.25 3763.5633 26.935139 + 3 3153.34 3155.16 3154.2 3154.2333 0.91045776 Difference at 95.0% confidence -609.33 +/- 43.1943 -16.1902% +/- 1.1477% (Student's t, pooled s = 19.0569) buildkernel: x buildkernel-before + buildkernel-fastdep +-------------------------------------------------------------------------------+ |+ x | |++ xx| | A|| |A| | +-------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 571.57 573.94 571.79 572.43333 1.3094401 + 3 369.12 370.57 369.3 369.66333 0.79033748 Difference at 95.0% confidence -202.77 +/- 2.45131 -35.4225% +/- 0.428227% (Student's t, pooled s = 1.0815) Sponsored by: EMC / Isilon Storage Division MFC after: 3 weeks Relnotes: yes