Saturday, April 18, 2020

Project - Stage 3





This is the last part of the optimization project.
 I choose the "Zopfli" open source for optimization.
 As a result of profiling, the hottest part of Zopfli is the "ZopfliUpdateHash()" function.

 Here is the result of profiling.







Here is the hottest part in the "ZopfliUpdateHash()" function.

Here is the "ZopfliUpdateHash" function code in hash.c .


I plan to optimize "while loof" in this function as profiling result with several options. 

First, GCC optimization options.

When I change GCC option with -o flag,  the result is very interesting. I changed only the option 0 to 3 and fast, the results are quite different. when choose the -o fast, real and sys time is faster. Aslo, when compare option o0 to o3, when the option was increased, the speed also faster.
-O0- no optimization
-O1- first level optimization
-O2 – second level optimization
-O3 – highest optimization
-Ofast – optimize for speed only

time
-O0
-O1
-O2
-O3
-Ofast
real
3m39.489s
1m41.897s1m33.908s2m19,740s1m23.712s
user
3m39.023s
1m41.606s1m33.620s1m22.996s
 1m23.455s
sys
0m0.142s
0m0.127s0m0.138s0m0.163s0m0.119s


Second, Code Rewriting optimizations

* Original code
 while (pos + amount + 1 < end && array[pos] == array [pos + amount +1] 
           && amount < (unsigned short)(-1)){
             amount++;
}

1) Common subexpression elimination.

In this while loof code, subexpression(pos + amount +1) is evaluated twice. It could be evaluated once and used twice, saving an expensive it.

 size_t  newVal = pos +1; 
while( newVal < end && array[pos] == array[bewVa] && amount < )unsigned short)(-1){
           amount ++;
           newVal= newVal + amount  ;
}


2) Short Circuit

In this while loof, there are three conditions. when evaluating a condition with a logical AND, both halves of the condition must be true for the entire condition to be true. therore, it is not necessary to evaluate the second, third of the condition if the first is false.

 int  newVal = pose + 1; 
while( newVal < end){
while (array[pos] == array[bewVa]){
while (amount < )unsigned short)(-1){
           amount ++;
           newVal= newVal+ amount ;
}}}


Third,  SIMD

Vectorization  is the unrolling of a loop combined with the generation of packed SIMD instructions by the compiler. Because the packed instructions operate on more than one data element at a time, the loop can execute more efficiently. It is sometimes referred to as auto-vectorization, to emphasize that the compiler identifies and optimizes suitable loops on its own, without requiring any special action by the programmer.

 * Original code
 void ZopfliUpdateHash(const unsigned char* array, size_t pos, size_t end, ZopfliHash* h) { .......
   ......
   while (pos + amount + 1 < end &&
array[pos] == array[pos + amount + 1] && amount < (unsigned short)(-1)) {
amount++;
}
   .....
   ......
}

 * optimization code
Inform to the compiler that no overlap is possible. This is done in standard C by using the restrict qualifier for the pointers. 
 void ZopfliUpdateHash(const unsigned char* restrict array,  size_t  pos, size_t end, ZopfliHash* restrict h) { .......
   ......

__builtin_assume_aligned(array, 16);
__builtin_assume_aligned(h, 16);
   while (pos + amount + 1 < end &&
array[pos] == array[pos + amount + 1] && amount < (unsigned short)(-1)) {
amount++;
}
   .....
   ......
}

Finally, I will try to optimize "zopfliUpdateHash()" with below code.

 void ZopfliUpdateHash(const unsigned char* restrict array,  size_t  pos, size_t end, ZopfliHash* restrict h) { .......
   ......

__builtin_assume_aligned(array, 16);
__builtin_assume_aligned(h, 16);

size_t newVal= pos + 1;
   while (newVal < end ){
    while ( array[pos] == array[newVal ] {
     while( amount < (unsigned short)(-1)) {
amount++;
                newVal = newVal + amount  ;
}}}
   .....
   ......
}


What I learn...
During this project, I learned two things.
First, when I do coding, I have to think whether my code is optimized. My coding style can affect to the processing speed. so, during programming, I have to consider about the speed. Second, there are variety way for optimization. My optimization plan are some of them. I will try to learn other way. This project is very interesting.




No comments:

Post a Comment