SPO600 - Hyouk Sun Kwon: Project

This is the last part of the optimization project.
I choose the "Zopfli" open source for optimization.
As a result of profiling, the hottest part of Zopfli is the "ZopfliUpdateHash()" function.

Here is the result of profiling.

Here is the hottest part in the "ZopfliUpdateHash()" function.

Here is the "ZopfliUpdateHash" function code in hash.c .

I plan to optimize "while loof" in this function as profiling result with several options.

First, GCC optimization options.

When I change GCC option with -o flag, the result is very interesting. I changed only the option 0 to 3 and fast, the results are quite different. when choose the -o fast, real and sys time is faster. Aslo, when compare option o0 to o3, when the option was increased, the speed also faster.

-O0- no optimization

-O1- first level optimization

-O2 – second level optimization

-O3 – highest optimization

-Ofast – optimize for speed only

time	-O0	-O1	-O2	-O3	-Ofast
real	3m39.489s	1m41.897s	1m33.908s	2m19,740s	1m23.712s
user	3m39.023s	1m41.606s	1m33.620s	1m22.996s	1m23.455s
sys	0m0.142s	0m0.127s	0m0.138s	0m0.163s	0m0.119s

Second, Code Rewriting optimizations

* Original code

while (pos + amount + 1 < end && array[pos] == array [pos + amount +1]

&& amount < (unsigned short)(-1)){

amount++;

}

1) Common subexpression elimination.

In this while loof code, subexpression(pos + amount +1) is evaluated twice. It could be evaluated once and used twice, saving an expensive it.

size_t newVal = pos +1;

while( newVal < end && array[pos] == array[bewVa] && amount < )unsigned short)(-1){

amount ++;

newVal= newVal + amount ;

}

2) Short Circuit

In this while loof, there are three conditions. when evaluating a condition with a logical AND, both halves of the condition must be true for the entire condition to be true. therore, it is not necessary to evaluate the second, third of the condition if the first is false.

int newVal = pose + 1;

while( newVal < end){

while (array[pos] == array[bewVa]){

while (amount < )unsigned short)(-1){

amount ++;

newVal= newVal+ amount ;

}}}

Third, SIMD

Vectorization is the unrolling of a loop combined with the generation of packed SIMD instructions by the compiler. Because the packed instructions operate on more than one data element at a time, the loop can execute more efficiently. It is sometimes referred to as auto-vectorization, to emphasize that the compiler identifies and optimizes suitable loops on its own, without requiring any special action by the programmer.

* Original code

void ZopfliUpdateHash(const unsigned char* array, size_t pos, size_t end, ZopfliHash* h) { .......

......

while (pos + amount + 1 < end &&

array[pos] == array[pos + amount + 1] && amount < (unsigned short)(-1)) {

amount++;

}

.....

......

}

* optimization code

Inform to the compiler that no overlap is possible. This is done in standard C by using the restrict qualifier for the pointers.

void ZopfliUpdateHash(const unsigned char* restrict array, size_t pos, size_t end, ZopfliHash* restrict h) { .......

......

__builtin_assume_aligned(array, 16);
__builtin_assume_aligned(h, 16);

while (pos + amount + 1 < end &&

array[pos] == array[pos + amount + 1] && amount < (unsigned short)(-1)) {

amount++;

}

.....

......

}

Finally, I will try to optimize "zopfliUpdateHash()" with below code.

void ZopfliUpdateHash(const unsigned char* restrict array, size_t pos, size_t end, ZopfliHash* restrict h) { .......

......

__builtin_assume_aligned(array, 16);
__builtin_assume_aligned(h, 16);

size_t newVal= pos + 1;

while (newVal < end ){

while ( array[pos] == array[newVal ] {

while( amount < (unsigned short)(-1)) {

amount++;

newVal = newVal + amount ;

}}}

.....

......

}

What I learn...

During this project, I learned two things.

First, when I do coding, I have to think whether my code is optimized. My coding style can affect to the processing speed. so, during programming, I have to consider about the speed. Second, there are variety way for optimization. My optimization plan are some of them. I will try to learn other way. This project is very interesting.

SPO600 - Hyouk Sun Kwon

Saturday, April 18, 2020

Project - Stage 3

First, GCC optimization options.

Second, Code Rewriting optimizations

1) Common subexpression elimination.

2) Short Circuit

Third, SIMD

Finally, I will try to optimize "zopfliUpdateHash()" with below code.

No comments:

Post a Comment