In a previous post, we explained how Siodb organizes user data in columns. A column storage table is excellent for OLAP environments (for long-running queries on multi-dimensional tables).
The Cell Address Set
To rapidly identify any cells at any time, each cell has a unique address. And a group of addresses from the cells of the same row is stored into a Cell Address Set. In other words, Siodb maintains a set of cell addresses (Cell Address Set) per row. The Cell Address Set is stored in the Master Column.
The Master Column
The Master Column is a hidden column that Siodb automatically updates over time. When Siodb produces a Cell Address Set, it stores this set into the Master Column. Hence, Siodb always knows where the cells of a row are physically stored and can quickly access to them with a minimum amount of I/Os. And to identify each row, Siodb also stores a Table Row Id per row in the Master Column.
Table Row Id
A Table Row Id is a unique numerical identifier (unsigned 64bit integer). Siodb increments this Table Row Id per table at each insert transaction. Thus, all newly create rows in a table are uniquely identified with this Table Row Id.
How the Master Column identifies rows and links cells together:
In the previous schema, we can see the link that the Master Column creates between cells of the same row. This provides a virtual representation in rows on top of the physical column storage. For instance, when you run this kind of query:
select * from table_1 where column_8 = 'firstname.lastname@example.org' ;
Siodb will seek first the cells from column_8 of table_1 which match the filter ‘email@example.com’. For each cell, Siodb gets access to the Cell Address Set in the master column from where Siodb gets the addresses of remaining cells in the current row.
Data changes tracking
Siodb also maintains multiple metadata for each row in the Master Column. For instance, the date of creation or modification and the user who did the DML transactions on that version of the row.
You can then query those metadata with standard SQL as you would for regular column in a table. Thus you always know what’s happens to your data. Only Siodb writes those metadata and it is not possible to modify them by design.
Data TTL and data privacy
One of those metadata maintain by Siodb is an expiration timestamp that can be used to define a TTL (Time to Live) on rows. When the TTL has expired, Siodb destroys the data physically for that row on the disk. The destruction is then effortlessly propagated into backups at the next backup synchronization time.
This is a convenient and unique way to comply with the data destruction requested by end-user using their rights from the data privacy regulations.
PS: Your feedback is WANTED. Please give us your input by commenting below.
🧑Design, automate, and scale data technologies in the cloud for my clients.
📱 Text me to get me deploying Siodb for you 👉👉 +41 78 853 85 07
🚀 Contribute to the Siodb project 👉👉 https://github.com/siodb/siodb