Size of HSQL database - hsqldb

My current project uses HSQLDB in production mode.
Since I wanted to upload the files and pictures to the database as BLOB, I would like to know
what could be the maximum size of database under HSQLDB?
Also how the HSQLDB performs while handling BLOB data?
Regards,
Satya

HSQLDB 2.0 and later stores the BLOB (and CLOB) data is a separate file ending with .lobs. The theoretical capacity of this file is 2^31 units of 32KB (total 64TB). Internal tables store the directory information for the lobs. These tables are stored in memory by default and are therefore limited to a few hundereds of thousand individual lobs (100000 is a safe limit). If you plan to store millions of lobs, you can change the lob tables to CACHED tables in the latest versions.
http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_lob_mem_use
Therefore it is practical to store several million lobs. With typical files and pictures, the lob access performance is related more to OS file access and caching than any other factors. Average access speeds of over 20MB per second are typical for multi-gigabyte lob databases.

I don't think that there is limit to how big your database can be, but accessing your data depends on how you structure it and how you query it, I've used hsqldb in several projects and it works fine with BLOB data, but my overall database size in those projects were not more than several gigabytes so maybe there is a limit to how efficient hsqldb can be depending on the database size

Related

Choose to store table exclusively on disk in Apache Ignite

I understand the native persistence mode of Apache Ignite allows to store as much as possible data in memory - and the potential remaining data on disk.
Is it possible to manually choose which table I want to store in memory and which I want to store EXCLUSIVELY on disk? If I want to save costs, should I just give Ignite a lot of disk space and just a small amount of memory? What if I know some tables should return results as fast as possible while other tables have lower priorities in terms of speed (even if they are accessed more often)? Is there any feature to prioritize data storage into memory at table level or any other level?
You can define two different data regions - one with small amount of memory and enabled persistence and second without persistence, but with bigger max memory size: https://apacheignite.readme.io/docs/memory-configuration
You can't have a cache (which contains rows for a table) to be stored exclusively on disk.
When you add a row to table it gets stored in Durable Memory, which is always located in RAM. Later it may be flushed to disk via Checkpointing process, which will use checkpoint page buffer, which is also in RAM. So you can have a separate region with low memory usage (see another answer) but you can't have data exclusively on disk.
When you access data it will always be pulled from disk to Durable Memory, too.

Best solution for storing / accessing large Integer arrays for a web application

I have a Web Application (Java backend) that processes a large amount of raw data that is uploaded from a hardware platform containing a number of sensors.
Currently the raw data is uploaded and the data is decompressed and stored as a 'text' field in a Postgresql database to allow the users to log in and generate various graphs / charts of the data (using a JS charting library clientside).
Example string...
[45,23,45,32,56,75,34....]
The arrays will typically contain ~300,000 values but this could be up to 1,000,000 depending on how long the sensors are recording so the size of the string being stored could be a few hundred kilobytes
This currently seems to work fine for now as there are only ~200 uploads per day but as I am looking at the scalability of the application and the ability to backup the data I am looking at alternatives for storing this data
DynamoDB looked like a great option for me as I can carry on storing the uploads details in my SQL table and just save a URL endpoint to be called to retrieve the arrays....but then I noticed the item size is limited to 64kb
As I am sure there are a million and one ways to do this I would like to put this out to the SO community to hear what others would recommend, either web services or locally stored....considering performance, scalability, maintainability etc etc...
Thanks in advance!
UPDATE:
Just to clarify the data shown above is just the 'Y' values as it is time-sampled the X values are taken as the position in the array....so I dont think storing as a tuple would have any benefits.
If you are looking to store such strings, you probably want to use S3 (1 object containing
the array string), in this case you will have "backup" out of the box by enabling bucket
versioning.
You can try tuple of Couchbase and ElasticSearch. Couchbase is very fast document-oriented NoSql database. Several thousands of insert operation is normal for CB. Item size is limited to 20MB. Performance of "get" operation is several tens of thousands. There is one disadvantage, you can query data only by id (there is "view", but I think it will be too difficult to adapt them to the plotting). Compensate for this deficiency may ElasticSearch, that can perform any query very fast. Format data in Couchbase and ElasticSearch is json-document.
I have just come across Google Cloud Datastore, which allows me to store single item Strings up to 1Mb (un-indexed), seems like a good alternative to Dynamo
May be you should use Redis or SSDB, both are designed to store large list(array) of data. The difference between these two databases is that Redis is memory only(disk for backup), but SSDB is disk based and uses memory as cache.

Optimize tempdb log files

Have a dedicated DB server with 16 cores and 200+ GB of memory.
Use tempdb a lot but it typically stays under 4 GB
Recently added a dedicated SSD stripe for tempdb
Based on this page should create multiple files
Optimizing tempdb Performance
Understand multiple row files.
Here is my question:
Should I also create multiple tempdb log files?
It does say "create one data file for each CPU".
So my thought is that data means row (not log) files.
No, as with all databases, SQL server can only use one log file at a time so there is no benefit at all in having multiple log files.
The best thing you can do with log files really is keep them on separate drives to the data files as they have different IO requirements, pre-size them so they don't have to auto grow and if they do have to autogrow, make sure they do so at a sensible level to manage the number of virtual log files that are created inside them.

How to limit the RAM used by multiple embedded HSLQDB DB instances as a whole?

Given:
HSQLDB embedded
50 distinct databases (I have 50 different data sources)
All the databases are of the file:/ kind
All the tables are CACHED
The amount of RAM allowed to use by all the embedded DB instances combined is limited and given upon startup of the java process.
The LOG file is disabled (no need to recover upon crash)
My understanding is that the RAM used by a single DB instance is comprised of the following pieces:
The cache of all the tables (all my tables are CACHED)
The DB instance internal state
Also, as far as I can see I have these two properties to control the total size of the cache of a single DB instance:
SET FILES CACHE SIZE
SET FILES CACHE ROWS
However, they control only the cache part of the RAM used by a DB instance. Plus, they are per DB instance, whereas I would like to limit all the instances as a whole.
So, I wonder whether it is possible to instruct HSQLDB to stay within the specified amount of RAM in total including all the DB instances?
You can only limit the CACHE memory use per database instance. Each instance is independent of the other.
You can reduce the CACHE SIZE and CACHE ROWS per database to suit your application.
HSQLDB does not use a lot of other memroy, but when it does, it uses the memory of the JVM, which is shared among the different database instanced.

Core Data sqlite file growing

I have a synchronization process and Im using Core Data to store a lot of information. Several times I downloaded the real SQLite database file with the organizer to check if the data is correct.
Some days ago I recongized that the size difference of two SQLite files was huge. One file was 80MB, the other file about 100MB. As I checked out the data in it with a SQLiteviewer there was no difference. The same tables, same indexes, same rows. How can that be? Is is is possible that some data is still in the file when I delete objects over Core Data?
EDIT:
The solution is an option flag which can be inserted in an option NSDictionary and added as parameter to the addPersistentStore method.
NSSQLiteManualVacuumOption
Option key to rebuild the store file, forcing a database wide defragmentation when the store is added to the coordinator.
This invokes SQLite's VACUUM command. It is ignored by stores other than the SQLite store.
Sqlite does not proactively return unused disk space as it deletes data, for performance reasons. This could be why you see the difference. See this link for more info:
SQLite FAQ