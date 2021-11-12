Every so often a breakthrough happens that impacts and improves the database industry. The latest is vectorization which, in essence, refers to the process of converting an algorithm from operating on a single value at a time to operating on a set of values at a time. In other words, it is the ability to do single mathematical operation on a list (or "vector”) of numbers in one single step.

In fact, although vectorisation is attracting a lot of attention right now, it has a long history, making its presence felt in supercomputing back in the 1980s where vectorized arithmetic first appeared. Vectorization is attractive because operations can easily be performed in parallel by supercomputers and by multi-processors resulting in big gains in performance and work-flow at lower cost.

Indeed, these days all desktop Central Processing Units (CPUs) provide some form of support for vector operations where a Single Instruction is applied to Multiple Data (SIMD). SIMD is what vector processing was first called when the word was introduced to the computing environment in 1966. Vectorisation enables database scientists and administrators to do more with less as more power permits simpler data structures which results in reduced data storage costs, increased flexibility and less time having to be spent on engineering the data.

A new publication, “Vectorization The New Era of Big Data Parallelism” by Kinetica, (“The Database for Time & Space”) the Arlington, Virginia, US-based specialist in the real-time analysis of massive fast-moving data sets, provides a very interesting and detailed exposition of where vectorization came from and where it’s going. Vector processing is gaining popularity as new software is introduced that that efficiently exploits and optimises modern hardware. The report explains that “a vector points to a memory position in a large array of rows and columns. The columns in the memory could be a simple stream of variables (e.g. sensor readings, GPS coordinates). Or the array can be a relational database table (e.g. customer transaction records).” You can see the potential.

A little further into the report Kinetica waxes lyrical, analogising that “Vector processing is like an orchestra. The control unit is the conductor, the instructions are a musical score. The processors are the violins and cellos. Each vector has only one control unit plus dozens of small processors. Each small processor receives the same instruction from the control unit. Each processor operates on a different section of memory. Hence, every processor has its own vector pointer. Here’s the magic: that one instruction is applied to every element in the array. Only then does the processor move on to the next instruction.” That’s why SIMD is often called “data parallel” processing. The report continues; “What vectors do best is stepping through a large block of memory with a serial list of instructions. Like the conductor in the orchestra, the entire score is completed before any other music is considered.”

Vector processing is particularly well-suited to machine learning, data compression and decompression (images, for example), cryptography, multimedia including audio and video, speech and handwriting analysis and full-on usage for database requirements such as sorting, calculations and aggregations.