How to split big sql dump file into small chunks and maintain each record in origin files despite later other records deletions

How to split big sql dump file into small chunks and maintain each record in origin files despite later other records deletions - sql

Here's what I want to do with (MySQL example):
dumping only structure - structure.sql
dumping all table data - data.sql
spliting data.sql and putting each table data info seperate files - table1.sql, table2, sql, table3.sql ... tablen.sql
splitting each table into smaller files (1k lines per file)
commiting all files in my local git repository
coping all dir out to remote secure serwerwer
I have a problem with #4 step.
For instance I split table1.sql into 3 files: table1_a.sql and table1_b.sql and table1_c.sql.
If on new dump there are new records that is fine - it's just added to table1_b.sql.
But if there are deleted records that were in table1_a.sql all next records will move and git will treat files table1_b.sql and table1_c.sql as changed and that not OK.
Basicly it destroys whole idea keeping sql backup in SCM.
My question: How to split big sql dump file into small chunks and maintain each record in origin files despite later other records deletions?

To Split SQL Dumps in files of 500 lines execute in your terminal:
$ split -l 5000 hit_2017-09-28_20-07-25.sql dbpart-

Don't split them at all. Or split them by ranges of PK values. Or split them right down to 1 db row per file (and name the file after tablename + the content of the primary key).
(That apart from the even more obvious XY answer, which was my instinctive reaction.)

Related

Can I copy data table folders in QuestDb to another instance?

I am running QuestDb on production server which constantly writes data to a table, 24x7. The table is daily partitioned.
I want to copy data to another instance and update it there incrementally since the old days data never changes. Sometimes the copy works but sometimes the data gets corrupted and reading from the second instance fails and I have to retry coping all the table data which is huge and takes a lot of time.
Is there a way to backup / restore QuestDb while not interrupting continuous data ingestion?

QuestDB appends data in following sequence
Append to column files inside partition directory
Append to symbol files inside root table directory
Mark transaction as committed in _txn file
There is no order between 1 and 2 but 3 always happens last. To incrementally copy data to another box you should copy in opposite manner:
Copy _txn file first
Copy root symbol files
Copy partition directory
Do it while your slave QuestDB sever is down and then on start the table should have data up to the point when you started copying _txn file.

Creating data files in SQL Server - how would the data/tables be distributed?

My database has 1 data file (.mdf) and 1 log file (.ldf). We have about 500 tables in this database. If I create more data files, say 3 more (.ndf files), what would exactly happen? Would these new data files remain unused until the .mdf runs out of space? Or how would the data/tables be distributed?

New pages are distributed along all files in a file group. It is strongly recommended to not ad files later if you want proper distribution - that really is ahrd to achieve. Better make a new filegroup, add files, copy tables over (recreate clustered index on new filespace).

BigQuery faster way to insert million of rows

I'm using bq command line and trying to insert large amount of json files with one table per day.
My approach:
list all file to be push (date named YYYMMDDHHMM.meta1.meta2.json)
concatenate in the same day file => YYYMMDD.ndjson
split YYYMMDD.ndjson file (500 lines files each) YYYMMDD.ndjson_splittedij
loop over YYYMMDD.ndjson_splittedij and run
bq insert --template_suffix=20160331 --dataset_id=MYDATASET TEMPLATE YYYMMDD.ndjson_splittedij
This approach works. I just wonder if it is possible to improve it.

Again you are confusing streaming inserts and job loads.
You don't need to split each file in 500 rows (that applies to streaming insert).
You can have very large files for insert, see the Command line tab examples listed here: https://cloud.google.com/bigquery/loading-data#loading_csv_files
You have to run only:
bq load --source_format=NEWLINE_DELIMITED_JSON --schema=personsDataSchema.json mydataset.persons_data personsData.json
JSON file compressed must be under 4 GB if uncompressed must be under 5 TB, so larger files are better. Always try with 10 line sample file until you get the command working.

Extract specific data from full mysqldump backup

I am making regular backups of my MySQL database with mysqldump. This gives me a .sql file with CREATE TABLE and INSERT statements, allowing me to restore my database on demand. However, I have yet to find a good way to extract specific data from this backup, e.g. extract all rows from a certain table matching certain conditions.
Thus, my current approach is to restore the entire file into a new temporary database, extract the data I actually want with a new mysqldump call, delete the temporary database and then import the extracted lines into my real database.
Is this really the best way to do this? Is there some sort of script that can directly parse the .sql file and extract the relevant lines? I don't think there is an easy solution with grep and friends unfortunately, as mysqldump generates INSERT statements that insert many values per line.

The solution to this just ended up being to import the whole file, extract the data I needed and drop the database again.

SQL Server 2008: Filestream how to physically delete uploaded file from filestreamgroup?

I have created filestream group at C:\Test\FilestreamGroup1
and a table with varBinary Filstream column
Now when file is uploaded then it physically stored at FilestreamGroup1...
Now here I want to know two things
In which format file is stored at FilestreamGroup1 (for every single uploaded file I found 2 encoded file)?
secondly how to delete uploaded file physically (i.e. deleting a record from the table is like execute delete command, but doing this I'll not result in physical deletion of file from NTFS...so how can I delete a file physically)

If you want to delete files from FileSystem instanly then you need to force garbage collection manually by using checkpoint
Link

This is not a StackOverflow question, this belongs to ServerFault (admin). It toucehs dev though-
i.e. deleting a record from the table is like execute delete command, but doing this I'll not result in physical
deletion of file from NTFS...so how can I delete a file physically
Do you know what the primary reason is to hav a database? Guarantee data integrity.
A delete must keep the data around until a backup is taken. What is your backup policy? YOU may note that when you make an update, another copy of the file is created.... for that simple reason. The old one must still b e available for backup, and that is just how they integrate it.
In which format file is stored at FilestreamGroup1 (for every single uploaded file I found 2 encoded file)?
No, files are stored raw. What would be the sense to encode them... if there are SQL functions to get the path and it is a supported scenario that the client does not use SQL to load the file (but: asks SQL for the file name and path, then accesses it via NTFS file share). This also supports interop (as any program loading from a network can be fed a SQL driven location.
I strongly assume you have 1 copy only and somehow make an update resulting in a second file written.
http://msdn.microsoft.com/en-us/library/cc645962.aspx
has an explanation how to access FileSTream data with SQL.
http://technet.microsoft.com/en-us/library/cc645940(v=sql.105).aspx
has an explanation how to access FIleStream data using Win32.
FILESTREAM files being left behind after row deleted
explains while files are left behind when a row is deleted. I found that using the extremely trivial goodle search for "sql filestream delete file" and it was item 1 on the result list - did you even try google?

secondly how to delete uploaded file physically (i.e. deleting a record from the table is like execute delete command, but doing this I'll not result in physical deletion of file from NTFS...so how can I delete a file physically)
Checkpoint does not remove the files, files are removed in a backgroundprocess and it can take quite a while. To force deletion use
sp_filestream_force_garbage_collection
EDIT: works with SQL Server 2012 only

Write "checkpoint" after deleting a row. it will remove physical existence of file.
Run the below query and check, the file getting deleted from file system automatically
DELETE FROM TableName CHECKPOINT
Thanks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas