Delete or shrink lobs file in a HSQLDB database - hsqldb

I am using HSQLDB 2.3.0. I have a database that the following schema:
CREATE TABLE MEASUREMENT (ID INTEGER NOT NULL PRIMARY KEY IDENTITY, OBJ CLOB);
When I fill this table with test data, the LOBS file in my database grows:
ls -lath
-rw-rw-r-- 1 hsqldb hsqldb 35 May 6 16:37 msdb.log
-rw-rw-r-- 1 hsqldb hsqldb 85 May 6 16:37 msdb.properties
-rw-rw-r-- 1 hsqldb hsqldb 16 May 6 16:37 msdb.lck
drwxrwxr-x 2 hsqldb hsqldb 4.0K May 6 16:37 msdb.tmp
-rw-rw-r-- 1 hsqldb hsqldb 1.6M May 6 16:37 msdb.script
-rw-rw-r-- 1 hsqldb hsqldb 625M May 6 16:35 msdb.lobs
After running the following command:
TRUNCATE SCHEMA public AND COMMIT;
CHECKPOINT DEFRAG;
SHUTDOWN COMPACT;
The lobs file is still the same size:
-rw-rw-r-- 1 hsqldb hsqldb 84 May 6 16:44 msdb.properties
-rw-rw-r-- 1 hsqldb hsqldb 1.6M May 6 16:44 msdb.script
-rw-rw-r-- 1 hsqldb hsqldb 625M May 6 16:35 msdb.lobs
What is the best way to truncate the schema and get all the disk space back?

I have an application with the same problem using hsqldb 2.3.3. The .lobs file seems to be growing indefinatly even after calling "checkpoint defrag". My scenario is that I'm inserting a 1000 blobs of 300 bytes each. I'm periodically deleting them all and inserting a 1000 new blobs about the same size. After a number of rounds of this my .lobs file is now 1,3GB in size but it is really just storing around 300kB of data. Inspite of calling checkpoint defrag the .lobs file just grows and grows. Is this behavoiur a bug?

The database engine is designed for continuous use in real applications. If you have an application that uses lobs and deletes some of them, the space will be reused for future lobs after each checkpoint.
In normal application use, the DELETE statement is used to delete rows. This statement deallocates the lob space for reuse after each checkpoint.
You can design your tests in a way that recreates the database, rather than reuse the old database after removing the data.

Related

Improve ETL from COBOL file to SQL

I have a multiserver/multiprocess/multithreaded solution which can parse and extract over 7 million records from a 6gb EBCDIC Cobol file, to 27 SQL tables, all under 20 minutes. The problem; the actual parsing and extracting of the data really only takes about 10 minutes using bulk inserts into staging tables. It then takes almost another 10 minutes to copy the data from the staging tables to their final table. Any ideas on how I can improve the 2nd half of my process? I've tried using In-Memory tables but it blows out the SQL server.

Aerospike space occupying

Initially, I have 2 sets(tables) each contains 45gb of data which is total 90gb of data in 1 namespace(database), So I decided to remove 1 set to free up the ram size, after deletion of 1 set, again it shows 90gb, ram size changed nothing. Without a restart of aerospike server, Is there a way to flush the deleted data to free up my ram ??
Thanks in advance !!
From Aerospike CE 3.12 on up you should be using the truncate command to truncate the data in a namespace, or a set of a namespace.
The aerospike/delete-set repo is an ancient workaround (hasn't been updated in 2 years). In the Java client simply use the AerospikeClient.truncate() command.

Controlling mappers with hive table having around 800 part files

I have a hive table to which data gets added every day.
So, around 5 files get added each day.
Now we ended up having 800 part files under this table.
The issue i have is joining or using this table anywhere is triggering 800 mappers, as mappers are proportional to the number of files.
But i do have to use the entire table for my jobs running.
Is there way to use the entire table but not triggering too many mappers?
Files look like below
-rw-rw-r-- 3 XXXX hdfs 106610 2015-12-15 05:39 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_1.deflate
-rw-rw-r-- 3 XXXX hdfs 106602 2015-12-23 12:31 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_10.deflate
-rw-rw-r-- 3 XXXX hdfs 157686 2016-03-06 05:20 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_100.deflate
-rw-rw-r-- 3 XXXX hdfs 163580 2016-03-07 05:22 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_101.deflate
I would prefer to partition the table so that the data is stored in the partition directories and whenever queried, only the files under the partitions are accessed and so are the mappers that get triggered in the hive queries when that partition columns are used.
Other option is to bucket the table using CLUSTER BY clause to distribute the data into fixed no. of bucketed directories and reducing the no. of directories and hence files that are accessed while querying.

TempDb growing Big on Staging server version SQL Server 2005

I have a concern regarding Tempdb growth on one of my staging servers. Current size of Tempdb has grown to nearly 42 GB in size.
Per standards we have split the data file into a primary .mdf and 7 .ndf files along a .ldf. Currently all the 7 .ndf are more than 4 GB, and primary being over 6GB.
Tried restarting but it fills up fast.
DBCC shrink file (.ndf) is also not helping as it will shrink max to some MB.
Kindly help on how to resolve this..
Thanks
Kapil..

Why would a nightly full backup of our SQL Server database grow 30 GB over night and then shrink again the next day?

We run SQL Server 2005 and have a database that's about 100 GB (the MDF is 100GB and the LDF is 34 GB).
Our maintenance plan takes a full database back up every night. It's set up to
This backup size is usually around 95-100 GB but it all of a sudden grew to 120 GB, then 124 GB then 130 GB then back to 100 GB over 4 consecutive days.
Does anyone know what could cause this? I don't believe we added and then removed that much data in such a short period of time.
If your backup is larger than the MDF, this means you have a lot of log activity recorded too. SQL Server notes data changes that happen during a full backup and does a "mini" log backup to capture this.
I'd say that you need to change Index maintenance and backup timings