Auto-Vectorization and How to Make it Happen

March 05, 2017

Auto-Vectorization and How to Make it Happen

What is Auto-Vectorization?

When you do the same operation on a set of numbers, (eg. adding the elements in two parallel arrays)

Instead of the following logic:

Loop through the elements, start with element 0
Store the first element of array 1 into a register on the CPU
Store the first element of array 2 into a register on the CPU
Add Array1element + Array2element
Grab the next element

Vectorization Logic:

Loop through the elements, start with element 0
Store element 0 - 8 of array 1 into register on the CPU
Store element 0 - 8 of array 2 into register on the CPU
Add Array1elements + Array2elements
Grab the next 8 elements

You can see why vectorization can make your code around 4-10 times faster. (For a proof of this, see my blog post timing various algorithms compiled with -O3 vs the -O0 flag. Algorithm Timing) How to make it happen Auto-Vectorization is done by the compiler under 3 conditions:

compiler flags:
- -O3 is specified. This turns on a set of flags that compile with the “risk” of getting skewed results
  
  OR
- The individual flags for auto-vectorization are used like -ftree-vectorize and -fvect-cost-model
Assurance that none of the arrays overlap
Assurance that all of the arrays have their hardware words aligned. This means that elements of the arrays each take up a fixed amount of space, and you can expect where to find the next element. Although this is a little wasteful of space, it’s worth it so that it’s easier to jump to the next element.

Assembler Code Walkthrough on AARCH64 - Auto-Vectorized

Here’s the C code we’re going to vectorize:

Let’s go through some Auto-Vectorized code and understand what’s going on in the assembly:

Go out there and Auto-vectorize!

Search This Blog

Tech Honey 🍯

Auto-Vectorization and How to Make it Happen

What is Auto-Vectorization?

Assembler Code Walkthrough on AARCH64 - Auto-Vectorized

Comments

Post a Comment

Popular Posts

How to Choose the right Algorithm and Compiler Options