Project – stage 1.
Spo600
final project is that I choose one open source to optimize.
For
fist stage, choose open source package and build software. Benchmark the
performance of the current implementation of the software on AArch64 and x86-64
systems. Lastly experiment with build options to determine if this has any
impact on the performance.
1. Choose open
source
I
choose “Zopfli (https://github.com/google/zopfli)” open source. It is compression algorithm made by Google. Zopfli is written in C for portability. It is a
compression-only library. Zopfli is bit-stream compatible with compression used
in gzip, Zip, PNG, HTTP requests, and others.
If
you compare compression algorithm, zopfli is slower than others. When compare
the fast one, gzip 9, zopfli is slower more than 80times.
*https://www.lifehacker.com.au/2013/03/a-look-at-zopfli-googles-open-source-compression-algorithm/
Also, I saw “Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.” in zopfli GitHub.
So,
I want to optimize this one.
2. Build the software
2-1. x86_64
1) clone the code to server
I cloned the data from the Zopfli GitHub.
2) add image to the server
For my testing, I will choose the 10mb png file from https://www.sample-videos.com/download-sample-png-image.php.
3) Benchmark the performance
To benchmark the performance, I used 10mb, 20mb and
30mb files.
Real: elapsed real (wall clock) time used by the
process, in seconds.
User: total number of CPU-seconds that the process used
directly (in user mode), in seconds.
Sys: total number of CPU-seconds used by the system on
behalf of the process (in kernel mode), in seconds.
time
|
10mb
|
20mb
|
30mb
|
real
|
2m19.740s
|
3m38.955s
|
5m38.456s
|
user
| 1m22.996s |
3m38.287s
|
5m37.188s
|
sys
|
0m0.163s
|
0m0.320s
|
0m0.728s
|
* 20mb
I chosen 10mb file, executed 5times with O3 building option.
10mb
|
1st
|
2nd
|
3rd
|
4th
|
5th
|
Real
|
1m23.846s | 1m23.742s | 1m23,551s | 1m24.355s | 4m58.047s |
User
|
1m23.571s
| 1m23.469s | 1m23.282s | 1m24.060s | 1m33.316s |
Sys
|
0m0.132s
| 0m0.133s | 0m0.132s | 0m0.150s | 0m0.154s |
4) Experiment with build option.
I use the 10mb.png file with various build options.
-O0- no optimization
-O1- first level optimization
-O2 – second level optimization
-O3 – highest optimization
-Ofast – optimize for speed only
time
|
-O0
|
-O1
|
-O2
|
-O3
|
-Ofast
|
real
|
3m39.489s
| 1m41.897s | 1m33.908s | 2m19,740s | 1m23.712s |
user
|
3m39.023s
| 1m41.606s | 1m33.620s | 1m22.996s | |
sys
|
0m0.142s
| 0m0.127s | 0m0.138s | 0m0.163s | 0m0.119s |
*-Ofast
2-2. AArch64
1) Benchmark the performance
To benchmark the performance, I used 10mb, 20mb and
30mb files.
time
|
10mb
|
20mb
|
30mb
|
real
|
8m8.914s
|
18m56.510s
|
29m23.732s
|
user
|
8m7.776s
|
18m53.535s
|
29m18.217s
|
sys
|
0m0.229s
|
0m0.847s
|
0m2.001s
|
* 20mb
*30mb
I chosen 10mb file, executed 5times with buildin option O3.
10mb
|
1st
|
2nd
|
3rd
|
4th
|
5th
|
Real
|
8m19.454s
| 8m42.835s | 8m31.470s |
8m24.040s
|
8m8.914s
|
User
|
8m18.075s
| 8m41.506s | 8m30.148s |
8m22.796s
|
8m7.776s
|
sys
|
0m0.339s
| 0m0.299s | 0m0.339s |
0m0.319s
|
0m0.229s
|
2) Experiment with build option.
I use the 10mb.png file with various build options.
time
|
-O0
|
-O1
|
-O2
|
-O3
|
-Ofast
|
real
|
23m39.314s
|
10m5.855s
|
8m49.292s
|
8m8.914s
|
8m7.720s
|
user
|
23m36.622s
|
10m4.457s
|
8m48.034s
|
8m7.776s
|
8m6.445s
|
sys
|
0m0.330s
|
0m0.329s
|
0m0.310s
|
0m0.229s
|
0m0.349s
|
When I change the building option the running time is also changed. no optimization(O0) is most slow and -Ofast is most fast. When the building option is changed except the code changing, the performance is changed. It is really interesting to me.
For stage2, I will profile the software to determine
which part of the code is doing most of the work.