Best way to duplicate a ClickHouse replica? - replication

I want to create another replica from an existing one by copying it.
Made a snapshot in AWS, created a new server, all my data has a copy on the new server.
Fixed the macro replica in config.
When I start the server it throws "No node" in error log for the first table that it finds, and gets stalled, repeating the same error once in a while.
<Error>: Application: Coordination::Exception: No node, path: /clickhouse/tables/0/test_pageviews/replicas/replica3/metadata: Cannot attach table `default`.`pageviews2` from metadata file . . .
I suspect this is because the node for this replica does not exist in Zookeeper (obviously, it was not created, because I did not run the CREATE TABLE for this replica as it is just a duplicate of another replica).
What is the correct way to do a duplication of a replica?
I mean, I would like to avoid copying the data, and make the replica pull only the data that was added from the moment in time when the snapshot was created.

I mean, I would like to avoid copying the data,
It's not possible.

Related

Will truncate in ScyllaDB also delete data on other nodes

I have three nodes running, which contain some data we need to purge. Since we can not identify the records, we would like to truncate the table and fill it from scratch.
Now I was wondering, when I issue a truncate statement, will this only truncate the current instance or will this also clear the other nodes?
So basically, do I have to issue the truncate statement on each node and also fill it there, or is it sufficient to do it on one node and it will propagte to the others. We would load the data from a CSV file via the COPY command.
The table will be truncated in the entire cluster.

Can I copy data table folders in QuestDb to another instance?

I am running QuestDb on production server which constantly writes data to a table, 24x7. The table is daily partitioned.
I want to copy data to another instance and update it there incrementally since the old days data never changes. Sometimes the copy works but sometimes the data gets corrupted and reading from the second instance fails and I have to retry coping all the table data which is huge and takes a lot of time.
Is there a way to backup / restore QuestDb while not interrupting continuous data ingestion?
QuestDB appends data in following sequence
Append to column files inside partition directory
Append to symbol files inside root table directory
Mark transaction as committed in _txn file
There is no order between 1 and 2 but 3 always happens last. To incrementally copy data to another box you should copy in opposite manner:
Copy _txn file first
Copy root symbol files
Copy partition directory
Do it while your slave QuestDB sever is down and then on start the table should have data up to the point when you started copying _txn file.

databricks error IllegalStateException: The transaction log has failed integrity checks

I have a table that I need drop, delete transaction log and recreate, but while I am trying to drop I get following error.
I have ran repair table statement on this one and could be responsible for error but not sure.
IllegalStateException: The transaction log has failed integrity checks. We recommend you contact Databricks support for assistance. To disable this check, set spark.databricks.delta.state.corruptionIsFatal to false. Failed verification of:
Table size (bytes) - Expected: 0 Computed: 63233
Number of files - Expected: 0 Computed: 1
We think this may just be related to s3 eventual consistency. Please try waiting a few extra minutes after deleting the Delta directory before writing new data to it. Also, normal MSCK REPAIR TABLE doesn't do anything for Delta, as Delta doesn't use the Hive Metastore to store the partitions. There is an FSCK REPAIR TABLE, but that is for removing the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system.
We don't recommend overwriting a Delta table in place, like you might with a normal Spark table. Delta is not like a normal table - it's a table, plus a transaction log, and many versions of your data (unless fully vacuumed). If you want to overwrite parts of the table, or even the whole table, you should use Delta's delete functionality. If you want to completely change the table, consider writing to an entirely new directory, such as /table/v2/... and separately deleting the other table.
To skip the issue from occurring can use below command (PySpark notebook):
spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)

How to remove shards in crate DB?

I am new to crate.io and I am not very familiar with the term of "sherd" and I am trying to understand why when I am running my local db it creates 4 different shards?
I need to reduce this to one single shard because it causes problems when I try to export the data from crate into json files (it creates 4 different shards!)
Most users run crate on multiple servers. To distribute the records of a table between multiple servers it needs to be splitted. One piece of that table is called shards.
To make sure that the database still has records CrateDB by defaults create on replica of each shard. A copy of the data that is located on a different server.
While the system doesn't have full copies of the shards the cluster state is yellow / underreplicated.
CrateDB running on a single node will never be able to create a redundant copy (because it is only one server).
To change the amount of replicas you can use the command ALTER TABLE my_table SET(number_of_replicas=...)

HSQLDB clear table data after restart

I want to save some temporary data in memory, which should be removed after server shuts down.
There is a temporary table in HSQLDB, but the data is removed immediately after transaction committed, which is too short for me. On the other side, the memory table keeps a script log file and resume the data when server new starts. It takes time and place to maintain such script logs, which are useless for my situation.
what I need is just a type of table, only the table structure is persistent in hard disk, the data and the data operations should only be performed in memory. Otherwise why do I need a in-memory DB instead of mysql?
Is there such type of table in HSQLDB?
thanks
Create the file: database, then create the tables. Perform SHUTDOWN. Edit the .properties file for the database, add the setting below and save.
files_readonly=true
When you perform your tests with this database, no data is written to disk.
Alternatively, with the latest versions of HSQLDB 2.2.x, you can can specify this property on the connection URL during the tests. For example
jdbc:hsqldb:file:myfilepath;files_readonly=true