How can we reduce the file size for aerospike .dat file?
our current config is
namespace test {
memory-size 20G # Maximum memory allocation for data and
# primary and secondary indexes.
storage-engine device { # Configure the storage-engine to use
file /opt/aerospike/test.dat # Location of data file on server.
filesize 100G # Max size of each file in GiB.
}
}
current file size if test.dat is 90GB as par ls -ltrh. But on AMC ui it shows 50GB is used.
I want to reduce the file size to 80GB. I tried following this doc
Decrease
filesize Decreasing the size of the files with an Aerospike
service restart will potentially end up deleting random data which can
results in unexpected behavior on the Aerospike cluster due to the
truncation, maybe even landing into low available percentage on the
node. Thus, you would need to delete the file itself and let the data
be migrated from the other nodes in the cluster.
Stop Aerospike server.
Delete the file and update the configuration with the new filesize.
Start Aerospike server.
But when I start the server post data deletion, the startup fails with error
Jan 20 2022 03:44:50 GMT: WARNING (drv_ssd): (drv_ssd.c:3784) unable to open file /opt/aerospike/test.dat: No such file or directory
I have few questions wrt this
Is there a way to restart process with no initial data and let it take data from other nodes in the cluster?
If i wanted to reduce the size from 100G to 95G, would i still have to do the same thing? considering current file size is only 90GB. Is there still a risk of losing data?
Stopping the Aerospike server, deleting the file and restarting it is the way to go. There are other ways (like cold starting empty -- cold-start-empty) but the way you have done it is the recommended one. Seems there are some permission issues preventing the server to create that file in that directory.
Yes, you would have to do the same thing for reducing the file size, as mentioned in that document you referred to.
Related
I am using PutHive3Streaming to load avro data from Nifi to Hive. For a sample, I am sending 10 MB data Json data to Nifi, converting it to Avro (reducing the size to 118 KB) and using PutHive3Streaming to write to a managed hive table. However, I see that the data is not compressed at hive.
hdfs dfs -du -h -s /user/hive/warehouse/my_table*
32.1 M /user/hive/warehouse/my_table (<-- replication factor 3)
At the table level, I have:
STORED AS ORC
TBLPROPERTIES (
'orc.compress'='ZLIB',
'orc.compression.strategy'='SPEED',
'orc.create.index'='true',
'orc.encoding.strategy'='SPEED',
'transactional'='true');
and I have also enabled:
hive.exec.dynamic.partition=true
hive.optimize.sort.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict
hive.optimize.sort.dynamic.partition=true
avro.output.codec=zlib
hive.exec.compress.intermediate=true;
hive.exec.compress.output=true;
It looks like despite this, compression is not enabled in Hive. Any pointers to enable this?
Hive does not compress datas which inserted by Streaming Data Ingest API.
They'll be compressed when compaction runs.
See https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2#StreamingDataIngestV2-APIUsage
If you dont' wanna wait, use ALTER TABLE your_table PARTITION(key=value) COMPACT "MAJOR".
Yes, #K.M is correct in so far that Compaction needs to be used.
a) Hive compaction strategies need to be used to manage the size of the data. Only after compaction is the data encoded. Below are the default properties for auto-compaction.
hive.compactor.delta.num.threshold=10
hive.compactor.delta.pct.threshold=0.1
b) Despite this being default, one of the challenges I had for compaction is that the delta files written by nifi were not accessible(delete-able) by the compaction cleaner (after the compaction itself). I fixed this by using the hive user as the table owner as well as giving the hive user 'rights' to the delta files as per standards laid out by kerberos.
d) Another challenge I continue to face is in triggering auto compaction jobs. In my case, as delta files continue to get streamed into hive for a given table/partition, the very first major compaction job completes successfully, deletes deltas and creates a base file. But after that point, auto-compact jobs are not triggered. And hive accumulates a huge number of delta files. (which have to be cleaned up manually <--- not desirable)
I have a warehouse with a 800 GB log file.now I want to shrink or another solution to reduce the disk space occupied by the Log.ldf.
I tried shrink file in several ways.I got a full backup , transaction log back up , changed recovery mode , ran dbcc command but none of them affected log file volume in disk , detach database and then deleted the log file but because of memory-optimize file container I faced error while I attempted to attach it again (I read SQL server will automatically add a log file but apparently not when database has a memory_optimize file )
after all these solutions my log file is still 800 GB and I don't know what to do to free disk space used by log file.
Is there any suggestion ? Or do I forget to do sth in my approaches ?
The Issue
I've been running a particularly large query, generating millions of records to be inserted into a table. Each time I run the query I get an error reporting that the transaction log file is full.
I've managed to get a test query to run with a reduced set of results and by using SELECT INTO instead of INSERT into as pre built table. This reduced set of results generated a 20 gb table, 838,978,560 rows.
When trying to INSERT into the pre built table I've also tried using it with and without a Cluster index. Both failed.
Server Settings
The server is running SQL Server 2005 (Full not Express).
The dbase being used is set to SIMPLE for recovery and there is space available (around 100 gb) on the drive that the file is sitting on.
The transaction log file setting is for File Growth of 250 mb and to a maximum of 2,097,152 mb.
The log file appears to grow as expected till it gets to 4729 mb.
When the issue first appeared the file grow to a lower value however i've reduced the size of other log files on the same server and this appears to allow this transaction log file grow further by the same amount as the reduction on the other files.
I've now run out of ideas of how to solve this. If anyone has any suggestion or insight into what to do it would be much appreciated.
First, you want to avoid auto-growth whenever possible; auto-growth events are HUGE performance killers. If you have 100GB available why not change the log file size to something like 20GB (just temporarily while you troubleshoot this). My policy has always been to use 90%+ of the disk space allocated for a specific MDF/NDF/LDF file. There's no reason not to.
If you are using SIMPLE recovery SQL Server is supposed manage the task of returning unused space but sometimes SQL Server does not do a great job. Before running your query check the available free log space. You can do this by:
right-click the DB > go to Tasks > Shrink > Files.
change the type to "Log"
This will help you understand how much unused space you have. You can set "Reorganize pages before releasing unused space > Shrink File" to 0. Moving forward you can also release unused space using CHECKPOINT; this may be something to include as a first step before your query runs.
I've been using Redis on a windows server for last 10 months without any issue but this morning I checked my website and saw that it's completely empty!!!
After a few minutes of investigation I realised that Redis database was empty???
Luckily I use redis as a caching solution so I still have all data in MS SQL database and I've managed to recover content of my website.
But I realised that redis has stopped saving data into dump.rdb. The last time file was updated 20.11.2015 at 11:35.
Redis config file has set
save 900 1
save 300 10
save 60 10000
and by just reloading all from MS SQL this morning I had more than 15.000 writes. So the file should be updated, right?
I run redis-check-dump dump.rdb and as result got:
Processed 7924 valid opcodes
I even run manually SAVE command and as result got:
OK <2.12>
But the file size and update date of dump.rdb is the same 20.11.2015
I just want to highlight that between 20.11.2015 and today I haven't changed anything in redis configuration or restarted the server
Any idea?
It's not the answer but at least I've managed to make Redis to start dumping data to disk.
Using console I set a new dbfilename name and now Redis is again dumping data data to disk.
It would be great if someone has a clue why it had stopped duping data to original dump file
First, sorry for my approximative english.
I'm a little lost with HSQLDB using.
I need to save in local database a big size of data (3Go+), in a minimum of time.
So I made the following :
CREATE CACHED TABLE ...; for save data in .data file
SET FILES LOG FALSE; for don't save data in .log file and gain time
SHUTDOWN COMPACT; for save records in local disk
I know there's other variable to parameter for increase the .data size and for increase data access speed, like :
hsqldb.cache_scale=
hsqldb.cache_size_scale=
SET FILES NIO SIZE xxxx
But I don't know how to parameter this for a big storage.
Thanks to help me.
When you use SET FILES LOG FALSE data changes are not saved until you execute SHUTDOWN
or CHECKPOINT.
The other parameters can be left to their default values. If you want to use more memory and gain some speed, you can multiply the default values of the parameters by 2 or 4.