freebsd-dev/sys/amd64/include/md_var.h
Bruce Evans 1a5735873e On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default.  Fix END() statement forgotten in previous
commit.

Align the loop in sse2_pagezero().  Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.

Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration.  So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec.  But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory.  Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec.  Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes.  My test system with an old CPU and old DDR1 only
needed 5+ GB/sec.  My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.

Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster.  The
alignment doesn't matter much if the CPU is not very old.  The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes.  Just
align it to 32-bytes.
2016-08-29 13:07:21 +00:00

65 lines
2.8 KiB
C

/*-
* Copyright (c) 1995 Bruce D. Evans.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the author nor the names of contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _MACHINE_MD_VAR_H_
#define _MACHINE_MD_VAR_H_
#include <x86/x86_var.h>
extern uint64_t *vm_page_dump;
struct savefpu;
void amd64_db_resume_dbreg(void);
void amd64_syscall(struct thread *td, int traced);
void doreti_iret(void) __asm(__STRING(doreti_iret));
void doreti_iret_fault(void) __asm(__STRING(doreti_iret_fault));
void ld_ds(void) __asm(__STRING(ld_ds));
void ld_es(void) __asm(__STRING(ld_es));
void ld_fs(void) __asm(__STRING(ld_fs));
void ld_gs(void) __asm(__STRING(ld_gs));
void ld_fsbase(void) __asm(__STRING(ld_fsbase));
void ld_gsbase(void) __asm(__STRING(ld_gsbase));
void ds_load_fault(void) __asm(__STRING(ds_load_fault));
void es_load_fault(void) __asm(__STRING(es_load_fault));
void fs_load_fault(void) __asm(__STRING(fs_load_fault));
void gs_load_fault(void) __asm(__STRING(gs_load_fault));
void fsbase_load_fault(void) __asm(__STRING(fsbase_load_fault));
void gsbase_load_fault(void) __asm(__STRING(gsbase_load_fault));
void fpstate_drop(struct thread *td);
void pagezero(void *addr);
void setidt(int idx, alias_for_inthand_t *func, int typ, int dpl, int ist);
void sse2_pagezero(void *addr);
struct savefpu *get_pcb_user_save_td(struct thread *td);
struct savefpu *get_pcb_user_save_pcb(struct pcb *pcb);
#endif /* !_MACHINE_MD_VAR_H_ */