Materialization Strategies in a Column-Oriented DBMS

Unknown author (2006-11-27)

There has been renewed interest in column-oriented database architectures in recent years. For read-mostly query workloads such as those found in data warehouse and decision support applications, ``column-stores'' have been shown to perform particularly well relative to ``row-stores.'' In order for column-stores to be readily adopted as a replacement for row-stores, however, they must present the same interface to client applications as do row stores, which implies that they must output row-store-style tuples.Thus, the input columns stored on disk must be converted to rows at some point in the query plan, but the optimal point at which to do the conversion is not obvious. This problem can be considered as the opposite of the projection problem in row-store systems: while row-stores need to determine where in query plans to place projection operators to make tuples narrower, column-stores need to determine when to combine single-column projections into wider tuples. This paper describes a variety of strategies for tuple construction and intermediate result representations and provides a systematic evaluation of these strategies.