Can I prevent duplicate data in bigquery? - google-bigquery

I'm playing with BQ and I create a table and inserted some data. I reinserted it and it created duplicates. I'm sure I'm missing something, but is there something I can do to ignore it if the data exists in the table?
My use case is I get a stream of data from various clients and sometimes their data will include some data they previously already sent(I have no control on them submitting).
Is there a way to prevent duplicates when certain conditions are met? The easy one is if the entire data is the same but also if certain columns are present?

It's difficult to answer your question without a clear idea of the table structure, but it feels like you could be interested in the MERGE statement: ref here.
With this DML statement you can perform a mix of INSERT, UPDATE, and DELETE statements, hence do exactly what you are describing.

Related

When target and source values are the same does Merge Command update?

Wasn't really able to find the answer anywhere for this, trying to understand when using the Merge command, if there is a row in my target table that is identical to my source table if it will update the values anyways from the source table.
In other words I have the following tables:
Will the source table still run an update on the target table in the above situation?
What I'm trying to do, if the target table equals source table, do nothing.
Only apply the update/insert/delete functions if there is a true difference between the tables.
AND BONUS POINTS, IF POSSIBLE, only run an update on the specific column that is different not the entire row.
I'm afraid that currently when the "matched" condition is met, it will update the values regardless if they are in fact the same.
Now, I understand even if the values are updated they wont be incorrect, but I'm trying to keep track of true adjustments to table via insert/update/delete operations.
MERGE target_table USING source_table
ON merge_condition
WHEN MATCHED
THEN update_statement
WHEN NOT MATCHED
THEN insert_statement
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
It appears it still is marked as updated in terms of log activity, but the data itself doesn't update (minus a few situations noted on the link below). Please see this question on the DBA Stack Exchange
DBA StackExchange - Non-Updating Updates
The Impact of Non-Updating Updates

Db2 for I :How to Select rows while deleting?

I've found this article that explains how to get the deleted record with the OLD TABLE keywords.
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/apsg/src/tpc/db2z_selectvaluesdelete.html
However, the instruction doesn't seem to work in Db2 for I-series (version 7.2)
Do you know any alternatives to get the same result?
Thanks
As you have discovered, this syntax is not valid for DB2 for i. But, I can think of a couple ways to do what you want.
You can use two statements, one to retrieve records to be deleted into a temporary table, then one to perform the delete (just use the same where clause for both). Unfortunately, this has the opportunity, however small, that you will delete more than you read. If additional records that match your where clause are inserted between the time you select, and the time you delete, then your log will not be accurate.
You can use a delete trigger to insert records into the log as they are deleted from the table. This might be the best way, as it will always log deletes, no matter how the records are deleted. But it will always log deletes, no matter how the records are deleted, and if you only want those records logged within certain processes, then you will need to build dependencies between your trigger and those processes making both more complex.
You can use a stored procedure with a cursor and a positioned delete as mentioned by Mark Bairinstein in the comments above. This will allow you to delete records with logging, and also prevent the issue with the first option. But this leaves users the opportunity to delete records in a way that is not logged. May be good or bad depending on your requirements.

how can I update a table in one schema to match a table in a second schema

How can I update a table in one schema to match a table in a second schema assuming the only difference is additional fields and indexes in the second. I do not want to change any of the data in the table. Hoping to do it without laboriously identifying the missing fields.
A elegant solution to this can be a DDL trigger that is triggered on a ALTER, CREATE ddl_event that applies the same changes to the first table (in one schema) as in the second table(that is another schema) in the same transaction.
Link --> https://docs.oracle.com/cd/E11882_01/appdev.112/e25519/triggers.htm#LNPLS2008
A little known but interesting recent addition to the Oracle DBMS artillery is DBMS_COMPARISON.
https://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_comparison.htm
Haven't tried it myself, but according the document should be able to get you the information at least without having to do any heavy scripting.
I've been doing this sort of thing since Oracle7 and always had to resort to complex scripting.

SQL: splitting up WITH statements into read-only tables

I am working on a project in SQLite where several INSERT statements involve recursive WITH statements. These INSERT statements are within triggers that are watching for INSERT statements being called on other databases.
I've found that this leads to the recalculation of lots of values every time the WITH statement is triggered, and rather than introduce logic into the WITH statement to only recalculate values that need to be recalculated (which creates sprawl and is difficult for me to maintain), I'd rather take some of the temporary views defined in the WITH statement and make them permanent tables that cache values. I'd then chain triggers so that an update of table_1 leads to an update of table_2, which leads to an update of table_3, etc.. This makes the code easier to debug and maintain.
The issue is that, in the project, I want to make a clear distinction between tables that are user editable and those that are meant for caching intermediary values. One way to do this is (perhaps) to make the "caching" tables read-only EXCEPT when called via certain triggers. I've seen a few questions about similar issues in other dialects of SQL but nothing pertaining to SQLite and nothing looking to solve this problem, thus the new question.
Any help would be appreciated!

Logging the results of a MERGE statement

I have 2 tables: A temporary table with raw data. Rows in it may be repeating (more then 1 time). The second is the target table with actual data (every row is unique).
I'm transfering rows using a cursor. Inside the cursor I use a MERGE statement. How can I print to the console using DBMS_OUTPUT.PUT_LINE which rows are updated and which are deleted?
According to the official documentation there is no such feature for this statement.
Are there any workaround?
I don't understand why you would want to do this. The output of dbms_output requires someone to be there to look at it. Not only that it requires someone to look through all of the output otherwise it's pointless. If there are more than say 20 rows then no one will be bothered to do so. If no one looks through all the output to verify but you need to actually log it then you are actively harming yourself by doing it this way.
If you really need to log which rows are updated or deleted there are a couple of options; both involve performance hits though.
You could switch to a BULK COLLECT, which enables you to create a cursor with the ROWID of the temporary table. You BULK COLLECT a JOIN of your two tables into this. Update / delete from the target table based on rowid and according to your business logic then you update the temporary table with a flag of some kind to indicate the operation performed.
You create a trigger on your target table which logs what's happening to another table.
In reality unless it is important that the number of updates / deletes is known then you should not do anything. Create your MERGE statement in a manner that ensures that it errors if anything goes wrong and use the error logging clause to log any errors that you receive. Those are more likely to be the things you should be paying attention to.
Previous posters already said that this approach is suspicious, both because of the cursor/loop and the output log for review.
On SQL Server, there is an OUTPUT clause in the MERGE statement that allows you to insert a row in another table with the $action taken (insert,update,delete) and any columns from the inserted or deleted/overwritten data you want. This lets you summarize exactly as you asked.
The equivalent Oracle RETURNING clause may not work for MERGE but does for UPDATE and DELETE.