Columnar Database

A column-oriented DBMS is a database management system (DBMS) that stores its content bycolumn rather than by row. It serializes all of the value of a column together, then the values of the next column, and so on. So, only the columns in the query need to be retrieved.

A columnar database has an advantage over row oriented dtabases in terms of efficiency of hard-disk access.

Aside from that, you can also get the following benefits when you use a columnar database:

1. Better analytic performance - the row-oriented approach allows better performance in running a large number of simultaneous queries.

2. Rapid joins and aggregations - the data access streaming along column-oriented data allows for incrementally computing the results of aggregate functions, which is critical for data warehouse applications.

3. Suitability for compression - it eliminates storage of multiple indexes, views, and aggregations, and facilitates vast improvements in compression.

4. Rapid data loading - in a columnar arrangement, the system effectively allows one to separate storage by column. This means that each column is built in one pass, and stored separately, allowing the database system to load columns in parallel using multiple threads.

Further, related performance characteristics of join processing built atop a column store is often sufficiently fast that the load-time joining required to create fact tables is unnecessary, shortening the latency from receipt of new data to availability for query processing.

Finally, since columns are store separately, entire table columns can be added and dropped without downing the system, and without the need to returning the system following the change.

Image Source

However, there are also some drawbacks/challenges because no one-size-fits all system.

In terms of load time, converting the data source into columnar format can be unbearably slow where tens or hundreds of gigabytes of data are involved.

The incremental loads can be performance problematic.

For data compression, some columnar systems greatly compress the source data. However uncompressing the data to read it can slow performance.

There are also structural limitations. Columnar databases use different techniques to simulate a relational structure. Some require the same primary key on all tables, meaning the database hierarchy is limited to two levels. The limits imposed by a particular system may not seem to matter, but remember that your needs may change tomorrow. Constraints that seem acceptable now could prevent you from expanding the system in the future.

And lastly for scability. Columnar databases major advantage is to get good performance on large databases. But how about when you are dealing with common size database? Is it reasonable enough to use columnar database?