benchmarks: readd README.md
This commit is contained in:
parent
fc7b5f832b
commit
a1edccd647
3 changed files with 301 additions and 0 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -6,6 +6,7 @@
|
|||
!*.S
|
||||
!Makefile
|
||||
!README
|
||||
!README.md
|
||||
!LICENSE
|
||||
.*
|
||||
_*
|
||||
|
|
231
benchmarks/coremark/README.md
Normal file
231
benchmarks/coremark/README.md
Normal file
|
@ -0,0 +1,231 @@
|
|||
# Coremark
|
||||
|
||||
'''
|
||||
File: CoreMark
|
||||
|
||||
Topic: Welcome
|
||||
Copyright <20> 2009 EEMBC All rights reserved.
|
||||
CoreMark is a trademark of EEMBC and EEMBC is a registered trademark of the Embedded Microprocessor Benchmark Consortium.
|
||||
|
||||
CoreMark<EFBFBD>s primary goals are simplicity and providing a method for testing only a processor<6F>s core features.
|
||||
|
||||
For more information about EEMBC's comprehensive embedded benchmark suites, please see www.eembc.org.
|
||||
|
||||
Topic: Building and running
|
||||
Download the release files from the www.coremark.org.
|
||||
You can verify the download using the coremark_<version>.md5 file
|
||||
> md5sum -c coremark_<version>.md5
|
||||
|
||||
Unpack the distribution (tar -vzxf coremark_<version>.tgz && tar -vzxf coremark_<version>_docs.tgz)
|
||||
then change to the coremark_<version> folder.
|
||||
|
||||
To build and run the benchmark, type
|
||||
> make
|
||||
Full results are available in the files run1.log and run2.log.
|
||||
CoreMark result can be found in run1.log.
|
||||
|
||||
For self hosted Linux or Cygwin platforms, a simple make should work.
|
||||
|
||||
Cross Compile:
|
||||
For cross compile platforms please adjust <core_portme.mak>, <core_portme.h> (and possibly <core_portme.c>)
|
||||
according to the specific platform used.
|
||||
When porting to a new platform, it is recommended to copy one of the default port folders
|
||||
(e.g. mkdir <platform> && cp linux/* <platform>), adjust the porting files, and run
|
||||
> make PORT_DIR=<platform>
|
||||
|
||||
Systems without make:
|
||||
The following files need to be compiled:
|
||||
- <core_list_join.c>
|
||||
- <core_main.c>
|
||||
- <core_matrix.c>
|
||||
- <core_state.c>
|
||||
- <core_util.c>
|
||||
- <PORT_DIR>/<core_portme.c>
|
||||
|
||||
For example
|
||||
> gcc -O2 -o coremark.exe core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000
|
||||
> ./coremark.exe > run1.log
|
||||
The above will compile the benchmark for a performance run and 1000 iterations. Output is redirected to run1.log.
|
||||
|
||||
Make targets:
|
||||
run - Default target, creates run1.log and run2.log.
|
||||
run1.log - Run the benchmark with performance parameters, and output to run1.log
|
||||
run2.log - Run the benchmark with validation parameters, and output to run2.log
|
||||
run3.log - Run the benchmark with profile generation parameters, and output to run3.log
|
||||
compile - compile the benchmark executable
|
||||
link - link the benchmark executable
|
||||
check - test MD5 of sources that may not be modified
|
||||
clean - clean temporary files
|
||||
|
||||
ITERATIONS:
|
||||
By default, the benchmark will run between 10-100 seconds.
|
||||
To override, use ITERATIONS=N
|
||||
> make ITERATIONS=10
|
||||
Will run the benchmark for 10 iterations.
|
||||
It is recommended to set a specific number of iterations in certain situations e.g.:
|
||||
- Running with a simulator
|
||||
- Measuring power/energy
|
||||
- Timing cannot be restarted
|
||||
|
||||
Minimum required run time:
|
||||
Results are only valid for reporting if the benchmark ran for at least 10 secs!
|
||||
|
||||
XCFLAGS:
|
||||
To add compiler flags from the command line, use XCFLAGS e.g.
|
||||
> make XCFLAGS="-g -DMULTITHREAD=4 -DUSE_FORK=1"
|
||||
|
||||
o CORE_DEBUG
|
||||
|
||||
Define to compile for a debug run if you get incorrect CRC.
|
||||
> make XCFLAGS="-DCORE_DEBUG=1"
|
||||
|
||||
o Parallel Execution
|
||||
|
||||
Use XCFLAGS=-DMULTITHREAD=N where N is number of threads to run in parallel.
|
||||
Several implementations are available to execute in multiple contexts,
|
||||
or you can implement your own in <core_portme.c>.
|
||||
> make XCFLAGS="-DMULTITHREAD=4 -DUSE_PTHREAD"
|
||||
Above will compile the benchmark for execution on 4 cores, using POSIX Threads API.
|
||||
|
||||
REBUILD:
|
||||
To force rebuild, add the flag REBUILD to the command line
|
||||
> make REBUILD=1
|
||||
|
||||
Check core_portme.mak for more important options.
|
||||
|
||||
Run parameters for the benchmark executable:
|
||||
Coremark executable takes several parameters as follows (if main accepts arguments).
|
||||
1st - A seed value used for initialization of data.
|
||||
2nd - A seed value used for initialization of data.
|
||||
3rd - A seed value used for initialization of data.
|
||||
4th - Number of iterations (0 for auto : default value)
|
||||
5th - Reserved for internal use.
|
||||
6th - Reserved for internal use.
|
||||
7th - For malloc users only, ovreride the size of the input data buffer.
|
||||
|
||||
The run target from make will run coremark with 2 different data initialization seeds.
|
||||
|
||||
Alternative parameters:
|
||||
If not using malloc or command line arguments are not supported, the buffer size
|
||||
for the algorithms must be defined via the compiler define TOTAL_DATA_SIZE.
|
||||
TOTAL_DATA_SIZE must be set to 2000 bytes (default) for standard runs.
|
||||
The default for such a target when testing different configurations could be ...
|
||||
> make XCFLAGS="-DTOTAL_DATA_SIZE=6000 -DMAIN_HAS_NOARGC=1"
|
||||
|
||||
Topic: Documentation
|
||||
When you unpack the documentation (tar -vzxf coremark_<version>_docs.tgz) a docs folder will be created.
|
||||
Check the file docs/html/index.html and the website http://www.coremark.org for more info.
|
||||
|
||||
Topic: Submitting results
|
||||
CoreMark results can be submitted on the web.
|
||||
|
||||
Open a web browser and go to http://www.coremark.org/benchmark/index.php?pg=benchmark
|
||||
Select the link to add a new score and follow the instructions.
|
||||
|
||||
Topic: Run rules
|
||||
What is and is not allowed.
|
||||
|
||||
Required:
|
||||
1 - The benchmark needs to run for at least 10 seconds.
|
||||
2 - All validation must succeed for seeds 0,0,0x66 and 0x3415,0x3415,0x66,
|
||||
buffer size of 2000 bytes total.
|
||||
o If not using command line arguments to main:
|
||||
> make XCFLAGS="-DPERFORMANCE_RUN=1" REBUILD=1 run1.log
|
||||
> make XCFLAGS="-DVALIDATION_RUN=1" REBUILD=1 run2.log
|
||||
3 - If using profile guided optimization, profile must be generated using seeds of 8,8,8,
|
||||
and buffer size of 1200 bytes total.
|
||||
> make XCFLAGS="-DTOTAL_DATA_SIZE=1200 -DPROFILE_RUN=1" REBUILD=1 run3.log
|
||||
4 - All source files must be compiled with the same flags.
|
||||
5 - All data type sizes must match size in bits such that:
|
||||
o ee_u8 is an 8 bits datatype.
|
||||
o ee_s16 is an 16 bits datatype.
|
||||
o ee_u16 is an 16 bits datatype.
|
||||
o ee_s32 is an 32 bits datatype.
|
||||
o ee_u32 is an 32 bits datatype.
|
||||
|
||||
Allowed:
|
||||
- Changing number of iterations
|
||||
- Changing toolchain and build/load/run options
|
||||
- Changing method of acquiring a data memory block
|
||||
- Changing the method of acquiring seed values
|
||||
- Changing implementation in core_portme.c
|
||||
- Changing configuration values in core_portme.h
|
||||
- Changing core_portme.mak
|
||||
|
||||
Not allowed:
|
||||
- Changing of source file other then core_portme* (use make check to validate)
|
||||
|
||||
Topic: Reporting rules
|
||||
How to report results on a data sheet?
|
||||
|
||||
CoreMark 1.0 : N / C [/ P] [/ M]
|
||||
|
||||
N - Number of iterations per second with seeds 0,0,0x66,size=2000)
|
||||
C - Compiler version and flags
|
||||
P - Parameters such as data and code allocation specifics
|
||||
- This parameter *may* be omitted if all data was allocated on the heap in RAM.
|
||||
- This parameter *may not* be omitted when reporting CoreMark/MHz
|
||||
M - Type of parallel execution (if used) and number of contexts
|
||||
This parameter may be omitted if parallel execution was not used.
|
||||
|
||||
e.g.
|
||||
> CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2
|
||||
or
|
||||
> CoreMark 1.0 : 1400 / GCC 3.4 -O4
|
||||
|
||||
If reporting scaling results, the results must be reported as follows:
|
||||
|
||||
CoreMark/MHz 1.0 : N / C / P [/ M]
|
||||
|
||||
P - When reporting scaling results, memory parameter must also indicate memory frequency:core frequency ratio.
|
||||
- If the core has cache and cache frequency to core frequency ratio is configurable, that must also be included.
|
||||
|
||||
e.g.
|
||||
> CoreMark/MHz 1.0 : 1.47 / GCC 4.1.2 -O2 / DDR3(Heap) 30:1 Memory 1:1 Cache
|
||||
|
||||
|
||||
Topic: Log File Format
|
||||
The log files have the following format
|
||||
(start example)
|
||||
2K performance run parameters for coremark. (Run type)
|
||||
CoreMark Size : 666 (Buffer size)
|
||||
Total ticks : 25875 (platform dependent value)
|
||||
Total time (secs) : 25.875000 (actual time in seconds)
|
||||
Iterations/Sec : 3864.734300 (Performance value to report)
|
||||
Iterations : 100000 (number of iterations used)
|
||||
Compiler version : GCC3.4.4 (Compiler and version)
|
||||
Compiler flags : -O2 (Compiler and linker flags)
|
||||
Memory location : Code in flash, data in on chip RAM
|
||||
seedcrc : 0xe9f5 (identifier for the input seeds)
|
||||
[0]crclist : 0xe714 (validation for list part)
|
||||
[0]crcmatrix : 0x1fd7 (validation for matrix part)
|
||||
[0]crcstate : 0x8e3a (validation for state part)
|
||||
[0]crcfinal : 0x33ff (iteration dependent output)
|
||||
Correct operation validated. See readme.txt for run and reporting rules. (*Only when run is successful*)
|
||||
CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap (*Only on a successful performance run*)
|
||||
(end example)
|
||||
|
||||
Topic: Legal
|
||||
See LICENSE.txt or the word document file under docs/LICENSE.doc.
|
||||
For more information on your legal rights to use this benchmark, please see
|
||||
http://www.coremark.org/download/register.php?pg=register
|
||||
|
||||
Topic: Credits
|
||||
Many thanks to all of the individuals who helped with the development or testing of CoreMark including (Sorted by company name)
|
||||
o Alan Anderson, ADI
|
||||
o Adhikary Rajiv, ADI
|
||||
o Elena Stohr, ARM
|
||||
o Ian Rickards, ARM
|
||||
o Andrew Pickard, ARM
|
||||
o Trent Parker, CAVIUM
|
||||
o Shay Gal-On, EEMBC
|
||||
o Markus Levy, EEMBC
|
||||
o Ron Olson, IBM
|
||||
o Eyal Barzilay, MIPS
|
||||
o Jens Eltze, NEC
|
||||
o Hirohiko Ono, NEC
|
||||
o Ulrich Drees, NEC
|
||||
o Frank Roscheda, NEC
|
||||
o Rob Cosaro, NXP
|
||||
o Shumpei Kawasaki, RENESAS
|
||||
'''
|
69
benchmarks/microbench/README.md
Normal file
69
benchmarks/microbench/README.md
Normal file
|
@ -0,0 +1,69 @@
|
|||
# MicroBench
|
||||
|
||||
CPU正确性和性能测试用基准程序。对AbstractMachine的要求:
|
||||
|
||||
1. 需要实现TRM和IOE的API。
|
||||
2. 在IOE的全部实现均留空的情况下仍可运行。如果有正确实现的`AM_TIMER_UPTIME`,可以输出正确的统计时间。若这个功能没有实现(返回`0`),仍可进行正确性测试。
|
||||
3. 使用`putch(ch)`输出。
|
||||
4. 堆区`heap`必须初始化(堆区可为空)。如果`heap.start == heap.end`,即分配了空的堆区,只能运行不使用堆区的测试程序。每个基准程序会预先指定堆区的大小,堆区不足的基准程序将被忽略。
|
||||
|
||||
## 使用方法
|
||||
|
||||
同一组程序分成三组:test,train和ref。
|
||||
test数据规模很小,作为测试用,不计时不评分。
|
||||
train数据规模中等,可用于在仿真环境研究微结构行为,计时不评分。
|
||||
ref数据规模较大,作为衡量CPU性能用,计时并评分。
|
||||
|
||||
默认运行ref数据规模,使用
|
||||
```bash
|
||||
make ARCH=native run mainargs=test
|
||||
```
|
||||
运行test数据规模,使用
|
||||
```bash
|
||||
make ARCH=native run mainargs=train
|
||||
```
|
||||
运行train数据规模。
|
||||
|
||||
## 评分根据
|
||||
|
||||
每个benchmark都记录以`REF_CPU`为基础测得的运行时间微秒数。每个benchmark的评分是相对于`REF_CPU`的运行速度,与基准处理器一样快的得分为`REF_SCORE=100000`。
|
||||
|
||||
所有benchmark的平均得分是整体得分。
|
||||
|
||||
## 已有的基准程序
|
||||
|
||||
| 名称 | 描述 | ref堆区使用 |
|
||||
| ----- | ------------------------------------ | ----- |
|
||||
| qsort | 快速排序随机整数数组 | 640KB |
|
||||
| queen | 位运算实现的n皇后问题 | 0 |
|
||||
| bf | Brainf**k解释器,快速排序输入的字符串 | 32KB |
|
||||
| fib | Fibonacci数列f(n)=f(n-1)+…+f(n-m)的矩阵求解 | 256KB |
|
||||
| sieve | Eratosthenes筛法求素数 | 2MB |
|
||||
| 15pz | A*算法求解4x4数码问题 | 2MB |
|
||||
| dinic | Dinic算法求解二分图最大流 | 1MB |
|
||||
| lzip | Lzip数据压缩 | 4MB |
|
||||
| ssort | Skew算法后缀排序 | 4MB |
|
||||
| md5 | 计算长随机字符串的MD5校验和 | 16MB |
|
||||
|
||||
## 增加一个基准程序`foo`
|
||||
|
||||
在`src/`目录下建立名为`foo`的目录,将源代码文件放入。
|
||||
|
||||
每个基准程序需要实现三个函数:
|
||||
|
||||
* `void bench_foo_prepare();`:进行准备工作,如初始化随机数种子、为数组分配内存等。运行时环境不保证全局变量和堆区的初始值,因此基准程序使用的全局数据必须全部初始化。
|
||||
* `void bench_foo_run();`:实际运行基准程序。只有这个函数会被计时。
|
||||
* `int bench_foo_validate();`:验证基准程序运行结果。正确返回1,错误返回0。
|
||||
|
||||
在`benchmark.h`的`BENCHMARK_LIST`中增加相应的`def`项,格式参考已有的benchmark。
|
||||
|
||||
## 基准程序可以使用的库函数
|
||||
|
||||
虽然klib中提供了一些函数,但不同的klib实现会导致性能测试结果有差异。
|
||||
因此MicroBench中内置一些简单的库函数:
|
||||
|
||||
* `bench_memcpy(void *dst, const void *src, size_t n)`: 内存复制。
|
||||
* `bench_srand(uint seed)`:用seed初始化随机数种子。
|
||||
* `bench_rand()`:返回一个0..32767之间的随机数。
|
||||
* `bench_alloc`/`bench_free`:内存分配/回收。目前回收是空操作。
|
||||
|
Loading…
Reference in a new issue