Choosing the Right Storage for Application Data
What types of data you are dealing with? We will try to roughly classify them and divide into the following five categories. Naturally, this is not a comprehensive classification, but it will help us to understand the options and approaches we have to keep in mind.
- Homogeneous data arrays containing elements of the same type
- Multimedia — audio, video and graphics files
- Interim data for internal use (logs of various types, caches)
- Streams of calculated data of various types (e.g. recorded video stream or massive computation results)
- Documents (simple or compound)
The ways for storing such a data are as follows:
- Files in file system
- Databases
- Structured storages
- Archives (as a specific form of structured storage)
- Remote (distributed, cloud) storages
Let us now discuss which storage mechanism will be the best suited for the types of data mentioned above. Homogeneous data arrays
Homogeneous data arrays contain elements of the same type. Examples of a homogeneous data array may be a simple table, temperature data over time or last year stock values.
- For homogeneous data arrays, regular files do not provide possibility for convenient and fast search. You have to create, maintain and constantly update special indexing files. Modification of the data structure is almost impossible. Metainformation is limited. There is no built-in run-time compression or encryption of data.
- Relational databases are well suited for homogeneous data. They comprise a set of predefined records with rigid internal format. Main advantage of relational databases is an ability to locate data quickly according to specified criterion, as well as transactional support of data integrity. Their significant shortcoming is that relational databases will not work well for large-size data of variable length (BLOB fields are usually stored separately from the rest of the record). Moreover, keeping data in relational databases requires: a) use of specific DBMS, which limits severely portability of the data and of the application itself, b) pre-planning of database structure, including interrelational links and indexing policy, c) researching details of peak loads is required for efficient database development, which also may be a serious overhead.
- Structured storages are somewhat analogous to a file system, i.e. storages are a specific set of enveloped named streams (files). Such storage can be stored at any location, i.e. in a single file on a disk, in a database record, or even in RAM. The main advantage of this approach is that it allows efficient adding or deleting data in an existing storage, provides the effective manipulation of data of various sizes (from small to huge). The storages represent separate units (files) and therefore can be easily relocated, copied, duplicated, backed up. There is no need to track all files generated by an application. Moreover, journal keeping makes it possible to restore content completely or partially, thus eliminating accidents or failures. The disadvantage may be relatively slower search inside these huge data arrays.
- ZIP archives, as a specific form of the structured storage, can be used for storing homogenous data arrays, but only in case when the most of access is read-only. Standardized nature of ZIP format makes it easy to use, especially in cross-platform applications, but this format is not suitable for the data to be modified after packing, so adding and deleting of data is a time-consuming operation.
- Remote and distributed storages are the next level of storage in which actual data location and data access are provided by specific layer used for encapsulating of access mechanics. In such storages data can actually be stored in databases or be distributed among different file systems, but the actual storage organization does not matter for an end-user. The user observes only a set of objects accessed through an API, or, as a variant, through file system calls. Good example is cloud storages. These types of data storages are to be used in large software complexes. Among other advantages one can mention unified data access without a need to think about actual ways how data are stored. Its disadvantages — they cannot be efficiently managed and controlled, and backup or migration of data is complicated.
Audio, video and graphic files
Storing a single (or several) multimedia files is simple. Complexities appear when you need to maintain a large number of files and want to perform a search across the multimedia collection. Boiling and chilled filtered water tap