I run the following query in Google Big Query:
DELETE FROM mydataset.mytable_wrong
WHERE time = "2019-09-01 13:00:00 UTC"
and then I realised it was the wrong table. Can I recover those rows somehow or undo the query?
The table still exists.
Thanks
Edit to add more info:
Table is partitioned.
You can use an snapshot decorator to query a point-in-time snapshot of your data. You can revert changes with it. Check how to restore a deleted table, I think you can extrapolate to restore your data.
Also, here's a similar Stackoverflow answer with more info.
Related
When we run bigquery transfer (youtube) backfill.
How bigquery transfer backfill guarantee that no duplicated records are inserted?
It is not a database that you could do "if new insert, if old update". Bigquery transfer is delete and insert, right?
So bigquery transfer backfill would delete the old data of the backfill scheduled date and then re-insert the data?
I am trying to figure out when sometimes I get zero data but the transfer status is complete.
In my many times tests, it seems the old data was not deleted. But in one test, I did see the old data was deleted in backfill. (I could not re-produce it though).
BigQuery DTS backfills are indeed write truncate operations (delete and insert).
It sounds like you resolved your own question: two transfers both backfilling to the same table may be causing the issue.
Is there a way to find last time updated date from a table without using sys.dm_db_index_usage_stats?? I have been searching for this for an hour now but all answers I found were using this property which seems to be reset on SQL database restart.
Thanks.
You can use this property (which is greatly advised).
Or you can code your own ON UPDATE TRIGGER that will populate this table
(or another homemade) on its own.
Also if you just wish to collect some data about current usage,
you can setup a SQL Profiler that will do the job
(then parse the results somehow, Excel or whatever)
Last option, restore successively the backups you have taken (on a copy).
Hoping you have enough backup retention to find the data you're searching for.
While importing data from sql server or any RDBMS database to hadoop using Sqoop, we can get newly appended records or modified records using incremental append or last modified or some free form queries.
Is there anyway we can identify deleted records? Considering when record is deleted it will not exist in sql table.
One workaround is to load full table using Sqoop and compare with previous table in hive.
Is there any other best way to do?
No, you can not get deleted records using sqoop.
A better workaround could be:
Create a boolean field status(default true) in your SQL Server table.
Whenever you need to delete that record don't delete just update with marking status false.
If you are using last-modified increment import, you will get this changed data in HDFS.
Later (after sqqop import) you can delete all these records with status false.
If you are syncing the entire partition or table then you can identify deleted records after sqoop import before merging them using full join with existing target partition or table. Records existing in target table/partition which do not exist in imported data are those deleted on source database since last sync.
Incremental sqooping does not handle deleted records out of the box. There are two approach you may want to consider.
Please look at this post.
How can I restore accidentally overwritten table in BigQuery? I have tried to do a copy (bq cp) of a snapshot by timestamp previous to the overwritten timestamp but it does not work.
bq cp my_table#1480406400000 new_table
This might be equivalent to Undeleting a Table which is possible within two days (and performed on a best-effort basis and are not guaranteed)
Timestamp in your question looks like 3-4 days back - so that's might be an explanation
As an option try to query that snapshot and see if old data is still there.
I have a huge schema containing billions of records, I want to purge data older than 13 months from it and maintain it as a backup in such a way that it can be recovered again whenever required.
Which is the best way to do it in SQL - can we create a separate copy of this schema and add a delete trigger on all tables so that when trigger fires, purged data gets inserted to this new schema?
Will there be only one record per delete statement if we use triggers? Or all records will be inserted?
Can we somehow use bulk copy?
I would suggest this is a perfect use case for the Stretch Database feature in SQL Server 2016.
More info: https://msdn.microsoft.com/en-gb/library/dn935011.aspx
The cold data can be moved to the cloud with your given date criteria without any applications or users being aware of it when querying the database. No backups required and very easy to setup.
There is no need for triggers, you can use job running every day, that will put outdated data into archive tables.
The best way I guess is to create a copy of current schema. In main part - delete all that is older then 13 months, in archive part - delete all for last 13 month.
Than create SP (or any SPs) that will collect data - put it into archive and delete it from main table. Put this is into daily running job.
The cleanest and fastest way to do this (with billions of rows) is to create a partitioned table probably based on a date column by month. Moving data in a given partition is a meta operation and is extremely fast (if the partition setup and its function is set up properly.) I have managed 300GB tables using partitioning and it has been very effective. Be careful with the partition function so dates at each edge are handled correctly.
Some of the other proposed solutions involve deleting millions of rows which could take a long, long time to execute. Model the different solutions using profiler and/or extended events to see which is the most efficient.
I agree with the above to not create a trigger. Triggers fire with every insert/update/delete making them very slow.
You may be best served with a data archive stored procedure.
Consider using multiple databases. The current database that has your current data. Then an archive or multiple archive databases where you move your records out from your current database to with some sort of say nightly or monthly stored procedure process that moves the data over.
You can use the exact same schema as your production system.
If the data is already in the database no need for a Bulk Copy. From there you can backup your archive database so it is off the sql server. Restore the database if needed to make the data available again. This is much faster and more manageable than bulk copy.
According to Microsoft's documentation on Stretch DB (found here - https://learn.microsoft.com/en-us/azure/sql-server-stretch-database/), you can't update or delete rows that have been migrated to cold storage or rows that are eligible for migration.
So while Stretch DB does look like a capable technology for archive, the implementation in SQL 2016 does not appear to support archive and purge.