Data model
Table mode
SDS data model concept includes tables, records, and attributes.
In SDS, a table is a collection of records, and a record is a collection of properties.
Tables in SDS have predefined schema. There are strong schema models where users can set table names, entity group keys of tables, primary keys, names of other attributes, and data types.
Table name (required) is unique under each user
Entity group key (optional) can be of one or more attributes. Entity group key is used to support local secondary index and hash distribution
Primary key (required) can also contain one or more attributes. Entity primary key + primary key identifies one unique record
Remaining attributes (optional). Each attribute in record is a name/value pair. An attribute can have one value, or a set of multiple values
Properties support the following data types:
Data type | Java SDK | PHP SDK | Python SDK | Go SDK | C++ SDK | node.js SDK | Notes |
---|---|---|---|---|---|---|---|
BOOL | boolean | boolean | bool | bool | bool | boolean | - |
INT8 | byte | integer | int | int8 | int8_t | number | - |
INT16 | short | integer | int | int16 | int16_t | number | - |
INT32 | int | integer | int | int32 | int32_t | number | - |
INT64 | long | integer | long | int64 | int64_t | number | - |
FLOAT | float | not supported | float | float32 | not supported | number | - |
DOUBLE | double | double | float | float64 | double | number | - |
STRING | String | string | str | string | string | string | Cannot contain ' \0 ' |
BINARY | byte[] | string | string | []byte | string | string | - |
RAWBINARY | byte[] | not supported | not supported | []byte | not supported | not supported | Cannot be used as entity group key, primary key or secondary index properties |
Collection type | List | array | list | slice | vector | array | Supports secondary index |
Data in table supports two index structures on physical organization:
Primary key index - Required, consisting of one or more table attributes
- All other attributes are stored in the order of entity group key (if any) + primary key. When writing, it must specify entity group key (if any) and primary key. Each piece of data consumes 1 unit write quota. When reading, each record consumes 1 unit read quota
Local secondary index - Optional, by using a local secondary index, it must define an entity group key that consists of one or more table attributes. Index is divided into three categories: Lazy,Eager and Immutable
- Lazy index - This type of index does not support projection properties or unique index
- Eager index - This type of index supports (Unique Index). It can also define a set of properties as (Projection), and store along with index (can be taken as a copy of the corresponding attributes in the main record. Its attribute value is kept as strong consistency)
Immutable index - This type of index is suitable for read-only data, where data will not be modified after writing (which requires user’s assurance). This type of index supports projection but not unique index.
These types of indexes will induce various levels of extra quota consumptions. Users need to select the appropriate index type based on actual application. Lazy index is suitable for more writing with less reading, modifying, and deleting. Eager index is suitable for less writing with more reading. Immutable index is suitable for read-only data, where data will not be modified once written.
In addition, from the table definition, one or more table attributes may be defined as Entity Group Keys in physical organization. entity group key is the prefix of record primary key and secondary index. Entity group key can be defined whether to perform hash distribution. Once this option is enabled, a hash value of 256 is added to achieve load balancing. It can be used to eliminate request hotspots. It is recommended to be turned on. Please heed that when this option is on, data in table will no longer be able to guarantee global order. In general, there is no special reason for a compulsory entity group key setup or hashing function to be on. (When designing table schema and estimating request volume, ensure that peak value of the read/write quotas consumed per unit time, per entity group is as little as possible. In principle, it should not exceed 1000/sec, or the read/write request quotas for entire table will not be guaranteed. The number of entity groups does not affect the performance. Therefore, under the premise of satisfying business query function, read/write quotas on each entity group should be reduced to minimal level to avoid read/write hotspots, in particular to avoid reading and writing of great amount of data to one entity or less amount to multiple entities at the same time.). When entity group key starts hash distribution, primary key needs to contain at least one attribute. If there is no attribute in schema to be used as primary key, it is recommended to use a default placeholder property in primary key (for example, If a table has only one userId as entity group key attribute, then an additional recordId attribute can be added as primary key. Its value is a constant 0, indicating that it is the 1st record under a user).
For entity group keys, encoding of primary key and properties of secondary index (KeySpec) can be selected via asc to determine whether it is in ascending or descending order (e.g. when encoding in integer descending order, the order is ..., 2, 1, 0, -1, ... the opposite of ascending order). In scan operation, forward and reverse scans can be performed by setting the reverse option. However, forward scan is more efficient than reverse scan. Therefore, when defining table structure, encoding sequence needs to be determined based on the query mode on the critical path. Critical path should be set in forward sequence scan as much as possible.
Main record (primary key) line is organized as:
[Entity group prefix] [Primary key attribute 1] ... [Primary key attribute m]:
{Table property 1, ..., Table property p}
Physical storage will be made in accordance with the order defined above, which will go by entity group prefix and the order of primary key attributes from 1 to m. It will then follow the attribute definition encoding (asc/desc). Attribute group key and the primary key attributes will be encoded into rowkey and stored in proper order.
Entity group prefix is optional. When table defines entity group key and secondary index, the secondary index row is organized as:
[Entity group prefix] [Index ID] [Secondary index attribute 1] ... [Secondary index attribute n] [Primary key attribute 1] ... [Primary key attribute m]:
{Projection property 1, ..., Projection property q}
As mentioned earlier, the projection property is a copy of the main record row, which is always guaranteed to be consistent with the main data row. Only Eager index can define the projection property.
Similarly, physical storage will also be stored in the order defined above.
The composition form of entity group key prefix is:
[Entity group hash] [Entity group key attribute 1] ... [Entity group key attribute k]
Developers can choose whether to support entity group keys and whether to define secondary index based on data read mode. At the same time, secondary index needs to go by the ratio of read/write to select type. For example, Lazy index is suitable for more writing with less reading, and less sensitive to delay. Eager index is used with projections, and is suitable for frequent reading, or more sensitive to read delays. Immutable index is suitable for read-only data. Please note that At present, dynamic table entity group key modifications (including hash distribution options, primary keys, other secondary index options, except index types), and property data type and encoding of attribute are not supported. Make careful selection when building table. Normal attributes of table can be added or reduced, but data access will take a short pause (average of 2-3 seconds). It is recommended to use it at low point of business traffic. The type of secondary index can be modified, but only Immutable can be changed to Eager. It is appropriate when data in table is getting converted from non-updatable to writable. A common scenario is that existing data is imported at the beginning of table construction, as data then is read-only, with no update allowed. ( Requires developer's own assurance). At the time, index is defined as < 0> Immutable </ 0>, which can help reserve read quota. After data import is completed but before starting online services, index can be changed back to Eager.
Consistency
SDS storage is a model of strong consistency, where subsequent reads can always read previously written data. Data is backed up in three copies, and will be asynchronously synchronized to standby cluster. Standby cluster currently provides read-only functionality for off-line analysis.
Transactional
Row-level transaction guarantee
Atomicity is guaranteed for record data on the same put (non-batch). For example, two concurrent puts are written to the same row of the two properties of p and q: (p1, q1) and (p2, q2); the final result will not be (p1, q2) or (p2, q1).
Transaction guarantees within entity group
Records belonging to the same transaction group can support atomicity of batch. You can configure whether to enable or not when creating table. If you need to define a secondary index, you must enable batch atomicity. Local secondary index can be supported within the same transaction group. Index and primary record are of strong consistency.
Support self-increment operation
Currently supports auto-increment operations on integer data types
Conditional modification
SDS supports conditional modifications of (put and delete), so that it can implement its own synchronization logic (such as locks) on the application level.