Home Index

1. Cassandra

2. Compaction

Cassandra / Compaction /

Cassandra Compaction Strategy

SizeTieredCompactionStrategy - Write heavy workloads on spining drives

ALTER TABLE visited WITH compaction = {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 7 };

SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see STCS compaction subproperties.

LeveledCompactionStrategy - Read heavy workloads only for SSD Drives

cqlsh:engine> ALTER TABLE name  WITH compaction = { 'class' :  'LeveledCompactionStrategy' };
cqlsh:engine> ALTER TABLE visit WITH compaction = { 'class' :  'LeveledCompactionStrategy' };
cqlsh:engine> ALTER TABLE link  WITH compaction = { 'class' :  'LeveledCompactionStrategy' };

LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping SSTables. This can improve performance for reads, because Cassandra can determine which SSTables in each level to check for the existence of row key data. This compaction strategy is modeled after Google's leveldb implementation. Also see LCS compaction subproperties.

DateTieredCompactionStrategy - Time Series Data

ALTER TABLE health WITH compaction = { 'class' :  'DateTieredCompactionStrategy', 'min_threshold' : 7 };

DateTieredCompactionStrategy (DTCS): Available in Cassandra 2.0.11 and 2.1.1 and later. This strategy is particularly useful for time series data. DateTieredCompactionStrategy stores data written within a certain period of time in the same SSTable. For example, Cassandra can store your last hour of data in one SSTable time window, and the next 4 hours of data in another time window, and so on. Compactions are triggered when the min_threshold (4 by default) for SSTables in those windows is reached. The most common queries for time series workloads retrieve the last hour/day/month of data. Cassandra can limit SSTables returned to those having the relevant data. Also, Cassandra can store data that has been set to expire using TTL in an SSTable with other data scheduled to expire at approximately the same time. Cassandra can then drop the SSTable without doing any compaction. Also see DTCS compaction subproperties and DateTieredCompactionStrategy: Compaction for Time Series Data.

Disabling Compaction ( when we have very few deletes and tombstones )

ALTER TABLE WITH COMPACTION = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'};