Weather - Time Series Data
Weather is classified as time-series data because of every weather event points into the specific date and time.
The Risk of using Redis for Weather Time Series Data
With constantly inserting, updating and deletting weather data, plus editors editing weather data alerts, which created a new specific weather data alerts for their particular channels, the data will grow in Redis memory based database.
As data grows Weather Time Series Data will most likely run into congestion and stall using because of extensive scanning, swapping large amount of memory frequently. Therefore this solution become not scalable as more data being added into Redis slower will be to retrieve the data using scanning all the rows in the Redis and scan queries become slower, scanning a larger and larger key/value data sizes.
Industry solutions for Weather Time Series Data
There are many solutions for time-series data as PostgreSQL uses Adaptive Space-Time Chunking, similar to MySQL, Cassandra using partitioning, and clustering keys and others.
Understanding Time Series Retrieval:
To minimize seeks through Times Series Data a system needs two indexes.
- An index that holds the key
- An Index that eliminates the majority of data where the data for selected time is not present.
Redis shortcoming when dealing with Time Series Data:
Redis holds one key/value pair thus creating an issue for Time Series data. For Time Series data a second index necessary to eliminate majority records where the data for a specific time is not present. Thus Redis must always scan all the records for each request generating many seek through the memory.
Time Complexity = O(n)
Example: Scanning for Zone in Minneapolis requires to scan the entire Redis keys/value pair, generating comparison 0(n) - very slow, not scalable solution as data grows.
$ redis-cli -h localhost --scan --pattern '*MNZ*'
Time Series Data Modeling using Cassandra Distributed Database:
In the industry, there are many solutions how to implement Time Series data.
Below, I am illustrating a solution using distributed NoSQL database Apache Cassandra.
Time Series Pattern - Reverse Order Timeseries with Expiring Columns (7 days):
The the following id an example to create a tables on Distributed Database that identify composite index that includes:
- Partition Index: event_alert - ( identifies where on which server and parition the data will reside )
Time complexity O(1)
- Clustering Index: event_time - ( is a storage engine process that sorts data within each partition based on the definition of the clustering columns, column is sorted in ascending or descending order using Merge Sort )
Time complexity O(n log n) - Merge Sort
CREATE TABLE weather_header (
PRIMARY KEY (event_alert, event_time),
) WITH CLUSTERING ORDER BY (event_time DESC);
INSERT INTO weather_header(event_alert,event_time,temperature)
) USING TTL 604800;
SELECT event_alert, event_time, json
WHERE event_alert = "NH:ZONE:014"
AND event_time > ’2018-07-11 07:03:00′
AND event_time < ’2018-07-12 07:03:00′;