Index corruption on large table - lucene

I have a large table with around 123 million records in cratedb. I noticed that during a snapshot to s3 (or indeed to file system) an index corruption occurs on each shard. Consequently this causes a partial snapshot. Once crate is restarted the table doesn't load on the account that there is a corrupted index. I have to remove the corrupted file and a file lock from the index folder and table heals. I have tried to recreate tables by moving everything to another table and swapping (using alter cluster command) but the corruption still occurs on the new table as well.
Is there anything else I can try to fully snapshot the cluster and avoid corruption?

Crate team found a bug https://github.com/crate/crate/pull/9318
Resolved in 4.0.8

Related

databricks error IllegalStateException: The transaction log has failed integrity checks

I have a table that I need drop, delete transaction log and recreate, but while I am trying to drop I get following error.
I have ran repair table statement on this one and could be responsible for error but not sure.
IllegalStateException: The transaction log has failed integrity checks. We recommend you contact Databricks support for assistance. To disable this check, set spark.databricks.delta.state.corruptionIsFatal to false. Failed verification of:
Table size (bytes) - Expected: 0 Computed: 63233
Number of files - Expected: 0 Computed: 1
We think this may just be related to s3 eventual consistency. Please try waiting a few extra minutes after deleting the Delta directory before writing new data to it. Also, normal MSCK REPAIR TABLE doesn't do anything for Delta, as Delta doesn't use the Hive Metastore to store the partitions. There is an FSCK REPAIR TABLE, but that is for removing the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system.
We don't recommend overwriting a Delta table in place, like you might with a normal Spark table. Delta is not like a normal table - it's a table, plus a transaction log, and many versions of your data (unless fully vacuumed). If you want to overwrite parts of the table, or even the whole table, you should use Delta's delete functionality. If you want to completely change the table, consider writing to an entirely new directory, such as /table/v2/... and separately deleting the other table.
To skip the issue from occurring can use below command (PySpark notebook):
spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)

Mnesia: DAT file huge, remove old records to resize it

I'm trying to reduce the DAT file size about a Mnesia table (disc_copies), but I don't find yet the solution. I can't truncate the file or create a new one, I must work in this table during the system is working.
The procedure that I considered was to delete the oldest records, but this isn't enough because the DAT file size remain the same until I restart the node.
So there is a way to force the sync to disk without restart the entire node?

How to add or route PostgreSQL Data to New Hard Drive

Im Using Windows Server 2008 R2 Standard
Im Running PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 32-bit
I got C:/ and D:/ Drive
C:/ --> 6.7GB free space (almost full and my server performance running low)
D:/ --> 141GB free space
Currently my PostgreSQL Data stored at C:/ Now,I want to route or add path to D:/ without migrate the data from C:/ to D:/ because now my PostgreSQL Data Stored around 148 GB. It Heavy and Massive Stored.
If success, I should still be able to do a query like SELECT * From table_bla_bla and it will return result from both drives?
Please do not suggest me to change PostgreSQL to other DB or whatsoever.
Because Im running 39,763 Device GPS Meter that send the data to my Server.
I have to take care this server because my expert already past-away.
You need to use tablespaces.
Create the tablespace, for example CREATE TABLESPACE second_drive LOCATION 'D:/postgresdata/' (see this other answer if you get permission denied errors)
ALTER TABLE table_bla_bla SET tablespace second_drive
Tablespaces allow you to decide which tables go on which drives and that can help speed up performance by ensuring you control where reads and writes go, but it also helps with space.
Postgres places individual tables in TABLESPACEs (which relate to a single disk), which is enough if you have multiple tables and you can achieve what you need by moving some tables to the other disk.
On the other hand, if you have a large table that you need to split over multiple disks, you need to use Postgres's Horizontal Partitioning capability.
This builds on tablespaces by allowing you to create a master table table_bla_bla which is actually just a facade on top of two or more tables which actually hold the data. These data tables can then be put on different tablesspaces effectively splitting your data over disks.
For this you would:
Rename your current table_bla_bla to something like
table_bla_bla_c
Create a new table_bla_bla master table.
Alter table_bla_bla_c to mark that it inherits from
table_bla_bla
Create a new table_bla_bla_d table that inherits from table_bla_bla and specify the tablespace as the D drive.
Apply partitioning triggers and check constraints as per the partitioning documentation
Once this is in place, you can arrange it so that any inserts into table_bla_bla cause new records to be created on the D drive. Selects on table_bla_bla will read from both disks.

Sql server when rebuild index table hard disk size 0

I've got a major problem
I tried to rebuild index a table with 300M records
I've had 100GB Free storage
when the process was ongoing free space has gone to 0 and the rebuild stucked
Now I can't access any data on this specific table
(the sql log file size seems as the table was before)
Anyone has a suggestion what should I do to fix this issue?
I've shrink the database logile and from almost 700GB it shrinked to nothing.
somehow the table had data again probably it resumed the reindexing after doing that.
Back to normal

How to safely release unallocated space for 1 table in a database?

Our Production database is on SQL Server 2008 R2. One of our tables, Document_Details, stores documents that users upload via our application (VB). They are stored in varbinary(max) format. There are over 20k files in pdf format and many of these are large in size (some are 50mb each). So overall this table is 90GB. We then ran an exe that compressed these pdf files down to 10GB.
However here lies the problem - the table is still 90GB in size. The unalloacted space hasn't been released. How do I unallocate this space so that the table is 10GB?
I tried moving the table to a new filegroup and then back to original filegroup but in either case it didn't release any space.
I also tried rebuilding the index on the table but that didn't work either.
What did work (but I heard it isn't recommended) was - change the recovery type from Simple, Shrink the filegroup, set recovery to Full.
Could I move this table to a new filegroup and then shrink that filegroup (i.e. just the Document_Details table)? I know the shrink command affects performance but if it's just 1 table would it still be a problem? Or is there anything else I can try?
Thanks.
Moving a table to a filegroup has one problem: By default the TEXTIMAGE data (the blobs) are not moved! A table's rows can reside on one filegroup and the blobs and on another. This is a crazy defect in SQL Server. Maybe by rebuilding the table the blobs were simply not touched.
Use one of the well-known methods to move lob data as well. That would rebuild the lobs and shrink them.