benchmarks: readd README.md

This commit is contained in:
Zihao Yu 2020-09-28 13:42:17 +08:00
parent fc7b5f832b
commit a1edccd647
3 changed files with 301 additions and 0 deletions

1
.gitignore vendored
View file

@ -6,6 +6,7 @@
!*.S
!Makefile
!README
!README.md
!LICENSE
.*
_*

View file

@ -0,0 +1,231 @@
# Coremark
'''
File: CoreMark
Topic: Welcome
Copyright <20> 2009 EEMBC All rights reserved.
CoreMark is a trademark of EEMBC and EEMBC is a registered trademark of the Embedded Microprocessor Benchmark Consortium.
CoreMark<EFBFBD>s primary goals are simplicity and providing a method for testing only a processor<6F>s core features.
For more information about EEMBC's comprehensive embedded benchmark suites, please see www.eembc.org.
Topic: Building and running
Download the release files from the www.coremark.org.
You can verify the download using the coremark_<version>.md5 file
> md5sum -c coremark_<version>.md5
Unpack the distribution (tar -vzxf coremark_<version>.tgz && tar -vzxf coremark_<version>_docs.tgz)
then change to the coremark_<version> folder.
To build and run the benchmark, type
> make
Full results are available in the files run1.log and run2.log.
CoreMark result can be found in run1.log.
For self hosted Linux or Cygwin platforms, a simple make should work.
Cross Compile:
For cross compile platforms please adjust <core_portme.mak>, <core_portme.h> (and possibly <core_portme.c>)
according to the specific platform used.
When porting to a new platform, it is recommended to copy one of the default port folders
(e.g. mkdir <platform> && cp linux/* <platform>), adjust the porting files, and run
> make PORT_DIR=<platform>
Systems without make:
The following files need to be compiled:
- <core_list_join.c>
- <core_main.c>
- <core_matrix.c>
- <core_state.c>
- <core_util.c>
- <PORT_DIR>/<core_portme.c>
For example
> gcc -O2 -o coremark.exe core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000
> ./coremark.exe > run1.log
The above will compile the benchmark for a performance run and 1000 iterations. Output is redirected to run1.log.
Make targets:
run - Default target, creates run1.log and run2.log.
run1.log - Run the benchmark with performance parameters, and output to run1.log
run2.log - Run the benchmark with validation parameters, and output to run2.log
run3.log - Run the benchmark with profile generation parameters, and output to run3.log
compile - compile the benchmark executable
link - link the benchmark executable
check - test MD5 of sources that may not be modified
clean - clean temporary files
ITERATIONS:
By default, the benchmark will run between 10-100 seconds.
To override, use ITERATIONS=N
> make ITERATIONS=10
Will run the benchmark for 10 iterations.
It is recommended to set a specific number of iterations in certain situations e.g.:
- Running with a simulator
- Measuring power/energy
- Timing cannot be restarted
Minimum required run time:
Results are only valid for reporting if the benchmark ran for at least 10 secs!
XCFLAGS:
To add compiler flags from the command line, use XCFLAGS e.g.
> make XCFLAGS="-g -DMULTITHREAD=4 -DUSE_FORK=1"
o CORE_DEBUG
Define to compile for a debug run if you get incorrect CRC.
> make XCFLAGS="-DCORE_DEBUG=1"
o Parallel Execution
Use XCFLAGS=-DMULTITHREAD=N where N is number of threads to run in parallel.
Several implementations are available to execute in multiple contexts,
or you can implement your own in <core_portme.c>.
> make XCFLAGS="-DMULTITHREAD=4 -DUSE_PTHREAD"
Above will compile the benchmark for execution on 4 cores, using POSIX Threads API.
REBUILD:
To force rebuild, add the flag REBUILD to the command line
> make REBUILD=1
Check core_portme.mak for more important options.
Run parameters for the benchmark executable:
Coremark executable takes several parameters as follows (if main accepts arguments).
1st - A seed value used for initialization of data.
2nd - A seed value used for initialization of data.
3rd - A seed value used for initialization of data.
4th - Number of iterations (0 for auto : default value)
5th - Reserved for internal use.
6th - Reserved for internal use.
7th - For malloc users only, ovreride the size of the input data buffer.
The run target from make will run coremark with 2 different data initialization seeds.
Alternative parameters:
If not using malloc or command line arguments are not supported, the buffer size
for the algorithms must be defined via the compiler define TOTAL_DATA_SIZE.
TOTAL_DATA_SIZE must be set to 2000 bytes (default) for standard runs.
The default for such a target when testing different configurations could be ...
> make XCFLAGS="-DTOTAL_DATA_SIZE=6000 -DMAIN_HAS_NOARGC=1"
Topic: Documentation
When you unpack the documentation (tar -vzxf coremark_<version>_docs.tgz) a docs folder will be created.
Check the file docs/html/index.html and the website http://www.coremark.org for more info.
Topic: Submitting results
CoreMark results can be submitted on the web.
Open a web browser and go to http://www.coremark.org/benchmark/index.php?pg=benchmark
Select the link to add a new score and follow the instructions.
Topic: Run rules
What is and is not allowed.
Required:
1 - The benchmark needs to run for at least 10 seconds.
2 - All validation must succeed for seeds 0,0,0x66 and 0x3415,0x3415,0x66,
buffer size of 2000 bytes total.
o If not using command line arguments to main:
> make XCFLAGS="-DPERFORMANCE_RUN=1" REBUILD=1 run1.log
> make XCFLAGS="-DVALIDATION_RUN=1" REBUILD=1 run2.log
3 - If using profile guided optimization, profile must be generated using seeds of 8,8,8,
and buffer size of 1200 bytes total.
> make XCFLAGS="-DTOTAL_DATA_SIZE=1200 -DPROFILE_RUN=1" REBUILD=1 run3.log
4 - All source files must be compiled with the same flags.
5 - All data type sizes must match size in bits such that:
o ee_u8 is an 8 bits datatype.
o ee_s16 is an 16 bits datatype.
o ee_u16 is an 16 bits datatype.
o ee_s32 is an 32 bits datatype.
o ee_u32 is an 32 bits datatype.
Allowed:
- Changing number of iterations
- Changing toolchain and build/load/run options
- Changing method of acquiring a data memory block
- Changing the method of acquiring seed values
- Changing implementation in core_portme.c
- Changing configuration values in core_portme.h
- Changing core_portme.mak
Not allowed:
- Changing of source file other then core_portme* (use make check to validate)
Topic: Reporting rules
How to report results on a data sheet?
CoreMark 1.0 : N / C [/ P] [/ M]
N - Number of iterations per second with seeds 0,0,0x66,size=2000)
C - Compiler version and flags
P - Parameters such as data and code allocation specifics
- This parameter *may* be omitted if all data was allocated on the heap in RAM.
- This parameter *may not* be omitted when reporting CoreMark/MHz
M - Type of parallel execution (if used) and number of contexts
This parameter may be omitted if parallel execution was not used.
e.g.
> CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2
or
> CoreMark 1.0 : 1400 / GCC 3.4 -O4
If reporting scaling results, the results must be reported as follows:
CoreMark/MHz 1.0 : N / C / P [/ M]
P - When reporting scaling results, memory parameter must also indicate memory frequency:core frequency ratio.
- If the core has cache and cache frequency to core frequency ratio is configurable, that must also be included.
e.g.
> CoreMark/MHz 1.0 : 1.47 / GCC 4.1.2 -O2 / DDR3(Heap) 30:1 Memory 1:1 Cache
Topic: Log File Format
The log files have the following format
(start example)
2K performance run parameters for coremark. (Run type)
CoreMark Size : 666 (Buffer size)
Total ticks : 25875 (platform dependent value)
Total time (secs) : 25.875000 (actual time in seconds)
Iterations/Sec : 3864.734300 (Performance value to report)
Iterations : 100000 (number of iterations used)
Compiler version : GCC3.4.4 (Compiler and version)
Compiler flags : -O2 (Compiler and linker flags)
Memory location : Code in flash, data in on chip RAM
seedcrc : 0xe9f5 (identifier for the input seeds)
[0]crclist : 0xe714 (validation for list part)
[0]crcmatrix : 0x1fd7 (validation for matrix part)
[0]crcstate : 0x8e3a (validation for state part)
[0]crcfinal : 0x33ff (iteration dependent output)
Correct operation validated. See readme.txt for run and reporting rules. (*Only when run is successful*)
CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap (*Only on a successful performance run*)
(end example)
Topic: Legal
See LICENSE.txt or the word document file under docs/LICENSE.doc.
For more information on your legal rights to use this benchmark, please see
http://www.coremark.org/download/register.php?pg=register
Topic: Credits
Many thanks to all of the individuals who helped with the development or testing of CoreMark including (Sorted by company name)
o Alan Anderson, ADI
o Adhikary Rajiv, ADI
o Elena Stohr, ARM
o Ian Rickards, ARM
o Andrew Pickard, ARM
o Trent Parker, CAVIUM
o Shay Gal-On, EEMBC
o Markus Levy, EEMBC
o Ron Olson, IBM
o Eyal Barzilay, MIPS
o Jens Eltze, NEC
o Hirohiko Ono, NEC
o Ulrich Drees, NEC
o Frank Roscheda, NEC
o Rob Cosaro, NXP
o Shumpei Kawasaki, RENESAS
'''

View file

@ -0,0 +1,69 @@
# MicroBench
CPU正确性和性能测试用基准程序。对AbstractMachine的要求
1. 需要实现TRM和IOE的API。
2. 在IOE的全部实现均留空的情况下仍可运行。如果有正确实现的`AM_TIMER_UPTIME`,可以输出正确的统计时间。若这个功能没有实现(返回`0`),仍可进行正确性测试。
3. 使用`putch(ch)`输出。
4. 堆区`heap`必须初始化(堆区可为空)。如果`heap.start == heap.end`,即分配了空的堆区,只能运行不使用堆区的测试程序。每个基准程序会预先指定堆区的大小,堆区不足的基准程序将被忽略。
## 使用方法
同一组程序分成三组testtrain和ref。
test数据规模很小作为测试用不计时不评分。
train数据规模中等可用于在仿真环境研究微结构行为计时不评分。
ref数据规模较大作为衡量CPU性能用计时并评分。
默认运行ref数据规模使用
```bash
make ARCH=native run mainargs=test
```
运行test数据规模使用
```bash
make ARCH=native run mainargs=train
```
运行train数据规模。
## 评分根据
每个benchmark都记录以`REF_CPU`为基础测得的运行时间微秒数。每个benchmark的评分是相对于`REF_CPU`的运行速度,与基准处理器一样快的得分为`REF_SCORE=100000`。
所有benchmark的平均得分是整体得分。
## 已有的基准程序
| 名称 | 描述 | ref堆区使用 |
| ----- | ------------------------------------ | ----- |
| qsort | 快速排序随机整数数组 | 640KB |
| queen | 位运算实现的n皇后问题 | 0 |
| bf | Brainf**k解释器快速排序输入的字符串 | 32KB |
| fib | Fibonacci数列f(n)=f(n-1)+…+f(n-m)的矩阵求解 | 256KB |
| sieve | Eratosthenes筛法求素数 | 2MB |
| 15pz | A*算法求解4x4数码问题 | 2MB |
| dinic | Dinic算法求解二分图最大流 | 1MB |
| lzip | Lzip数据压缩 | 4MB |
| ssort | Skew算法后缀排序 | 4MB |
| md5 | 计算长随机字符串的MD5校验和 | 16MB |
## 增加一个基准程序`foo`
在`src/`目录下建立名为`foo`的目录,将源代码文件放入。
每个基准程序需要实现三个函数:
* `void bench_foo_prepare();`:进行准备工作,如初始化随机数种子、为数组分配内存等。运行时环境不保证全局变量和堆区的初始值,因此基准程序使用的全局数据必须全部初始化。
* `void bench_foo_run();`:实际运行基准程序。只有这个函数会被计时。
* `int bench_foo_validate();`验证基准程序运行结果。正确返回1错误返回0。
在`benchmark.h`的`BENCHMARK_LIST`中增加相应的`def`项格式参考已有的benchmark。
## 基准程序可以使用的库函数
虽然klib中提供了一些函数但不同的klib实现会导致性能测试结果有差异。
因此MicroBench中内置一些简单的库函数:
* `bench_memcpy(void *dst, const void *src, size_t n)`: 内存复制。
* `bench_srand(uint seed)`用seed初始化随机数种子。
* `bench_rand()`返回一个0..32767之间的随机数。
* `bench_alloc`/`bench_free`:内存分配/回收。目前回收是空操作。