Compare two table or insert unique values - google-bigquery

Dear users of BigQuery.
I've a table with millions of records (table 2) and in this table I've few lost data. So I genrate an other table with all data (table 1).
I need to integrate lost data from table 1 to table 2 or integrate all data of table 1 to table 2 and remove all duplicate records, so I've several ways to do that.
What is the best way to do this according to you ?
Thanks for your help.

The best way to do this would be using UNION DISTINCT .
Your query will look something like :
Select name, timestamp, value from Project.Dataset.Table2
UNION DISTINCT
Select name, timestamp, value from Project.Dataset.Table1
This should work fine and give you the results.

Related

MS Access - trying to find duplicates across 4 tables based on Column1 and Column2

MS Access - trying to find duplicates across 4 tables based on the info in Column1 and Column2. I would also like the resulting query to show me Column3, Column4 and Column5 for easy review. I've tried following a Youtube vid on a union query and was successful.. But that's as far as I can go. I tried to follow along some of the answers but I cant make it work. Just note that I have 0 programming language knowledge. Tyvm in advance!
Column1 = Unique reference
Column 2 = Loss date
Duplicates happen when a row has same unique ref and same DOL. This can be within the table or across tables. Like one entry is in Table2019 and another one is in Table2022. Or two entries in Table2019 with four more spread in other tables.
SELECT [t2019].ID, [t2019].[ClaimNo], [t2019].DOL, [t2019].[Amount], [t2019].[Cause], [t2019].[Ref], [t2019].[Regn], [t2019].Remarks
FROM [t2019]
UNION
SELECT [t2020].ID, [t2020].[ClaimNo], [t2020].DOL, [t2020].[Amount], [t2020].[Cause], [t2020].[Ref], [t2020].[Regn], [t2020].Remarks
FROM [t2020]
UNION
SELECT [t2021].ID, [t2021].[ClaimNo], [t2021].DOL, [t2021].[Amount], [t2021].[Cause], [t2021].[Ref], [t2021].[Regn], [t2021].Remarks
FROM [t2021]
UNION
SELECT [t2022].ID, [t2022].[ClaimNo], [t2022].DOL, [t2022].[Amount], [t2022].[Cause], [t2022].[Ref], [t2022].[Regn], [t2022].Remarks
FROM [t2022];
Access has a wizard to help write the relatively difficult SQL for finding duplicate records. So first gather up all the records that need to be searched for duplicates then use the wizard.
To gather the records open the query designer, go to the SQL Pane, SELECT union and adapt the following SQL:
Unfortunately, there is no graphical interface to help.
Get Typing and don't forget that semi-colon. UNION is used to combine SELECT statements. So were combining everything from all the tables. the ALL is important because by itself UNION ignores rows where every column is an exact match to a previous row. We are looking for duplicates so we add ALL to include those skipped rows.
When you have all the rows go to query wizard under the create tab and run the find duplicates wizard:
Here is the resulting SQL for my example data:
SELECT Query1.[ID], Query1.[DOL], Query1.[ClaimNo], Query1.[Amount], Query1.[Cause], Query1.[Ref], Query1.[Regn], Query1.[Remarks]
FROM Query1
WHERE (((Query1.[ID]) In (SELECT [ID] FROM [Query1] As Tmp GROUP BY [ID],[DOL] HAVING Count(*)>1 And [DOL] = [Query1].[DOL])))
ORDER BY Query1.[ID], Query1.[DOL]
Note:
In Access ID is a primary key and AutoNumber by default. It looks suspicious here. If the default settings are intact and you are entering data in Access then every table starts with ID 1 and you have duplicate ID's in every table. Instead, I would normally combine all these year tables using a year column. This also avoids the union query. I would only use year tables if I had millions of records and couldn't afford the space for a year column.

Best way to compare two tables in SQL by matching string?

I have a program where the goal is to take data from an API, and capture the differences in data from minute to minute. It involves three tables: Table 1 (for new data), Table 2 (for previous minutes data), Results table (for the results).
The sequence of the program is like this:
Update table 1 -> Calculate the differences from table 2 and update a "Results" table with the differences -> Copy table 1 to table 2.
Then it repeats! It's simple and it works.
Here is my SQL query:
Insert into Results (symbol, bid, ask, description, Vol_Dif, Price_Dif, Time) Select * FROM(
Select symbol, bid, ask, description, Vol_Dif, Price_Dif, '$now' as Time FROM (
Select t1.symbol, t1.bid, t1.ask, t1.description, (t1.volume - t2.volume) AS Vol_Dif, (t1.totalPrice - t2.totalPrice) AS Price_Dif
FROM `Table_1` t1
Inner Join (
Select id, volume, ask, totalPrice FROM Table_2) t2
ON t2.id = t1.id) as test
The tables are identical in structure, obviously. The primary key is the 'id' field that auto-increments. And as you can see, I am comparing both tables on the basis of these 'id' fields being equal.
The PROBLEM is that the API seems to be inconsistent. One API call will have 50,000 entries. The next one will have 51,000 entries. And the entries are not just added to the end or added to the beginning, they are mixed into the middle.
So, comparing on equal ID's means I am comparing entries for DIFFERENT data, IF the API calls return a different number entries.
The data that I am trying to get the differences of is the 'bid', 'ask', 'Vol_Dif', 'Price_Dif' from minute to minute. There are many instances of the same 'symbol's, so I couldn't compare with this. The ONLY other way to compare entries from table to table, beside the matching ID's, would be matching the "description" fields.
I have tried this. The script is almost the same as above except the end of the query is
ON t2.description = t1.description
The problem is that looking for matching description fields takes 3 minutes for 50,000 entries, whereas looking for matching ID's takes 1 second.
Is there a better, faster way to do what I'm trying to do? Thanks in advance. Any help is appreciated.

Joining Different Database Tables

I have two tables in Access pulling from databases. There is no primary key linking the two tables, because the databases pull from different programs.
Using SQL, I need all of the information from both tables to pull into a query, and this is where I have problems. The two tables are pulling the same data, but they column titles might not necessarily be the same. For now, I'm assuming they are. How can I get it so that the data from both tables pull into the correct column together?
Here's an example of code (I can't post the real code for certain reasons):
SELECT system1_vehiclecolor, system1_vehicleweight, system1_licenseplate, system2_vehiclecolor, system2_vehicleweight, system2_licenseplate
FROM system1, system2
To further explain this, I want the table to have a column for vehiclecolor, vehicleweight, and licenseplate that combines all of the information. Currently, the way I have it, it is making a column for each of the names in each table, which isn't what I want.
You can use 2 queries to get this done
Select col1as c1 ,col2 col as c2 into resulttable from table1
Insert into resulttable (c1,c2) select colX as c1, colY as c2 from table2
Hope this will help you

Need SQL to shift entries from one table to another

Heres the situation. I have 2 tables here of the schema:
ID | COMPANY_NAME | DESC | CONTACT
ID | COMPANY_ID | X_COORDINATE | Y_COORDINATE
The first tabel contains a list of companies and the second contacts coordinates of the companies as mentioned.
The thing is that I want to merge the data in this table with the data in another set of tables which already have data. The other tables have similar structure but are already propopulated with data. The IDs are autoincremental.
SO if we have lets say companies marked 1-1000 in table1 and companies marked 1-500 in table 2. We need it merged such that ID number 1 in table 2 becomes ID 1001 when migrated to the other table. And side by side we would also want to migrated the entries in the coordinates table as well in such a way that they map with the new ids of the table. Can this be done in SQL or do I need to resort to using a script here for this kind of work.
i`m not sure i understand how many tables are there and who is table 1 ,2, but the problem is pretty clear. i think the easy way is:
back up all your database before you start this process
add a column to the destination table that will contain the original id.
insert all the records you want to merge (source) into the destination table, putting the original id in the column you added.
now you can update the geo X,Y data using the old ID
after all is done and good you can remove the original id column.
EDIT: in reply to your comment , i`ll add teh code here, since its more readable.
adapted from SQL Books Online: insert rows from another table
INSERT INTO MyNewTable (TheOriginalID, Desc)
SELECT ID, Desc
FROM OldTable;
Then you can do an update to the new table based on values from the old table like so:
UPDATE MyNewTable SET X = oldTable.X , Y = oldTable.Y where
FROM MYNewTable inner JOIN OldTable ON MYNewTable.TheOriginalID = OldTable.ID

Query select a bulk of IDs from a table - SQL

I have a table which holds ~1M rows. My application has a list of ~100K IDs which belong to that table (the list being generated by the application layer).
Is there a common-method of how to query all of these IDs? ~100K Select queries? A temporary table which I insert the ~100K IDs to, and Select query via join the required table?
Thanks,
Doori Bar
You could do it in one query, something like
SELECT * FROM large_table WHERE id IN (...)
Insert a comma-separated list of IDs where I put the ...
Unfortunately, there is no easy way that I know of to parametrize this, so you need to be extra-super careful to avoid SQL injection vulnerabilities.
A temporary table which holds the 100k IDs seems like a good solution. Don't insert them one by one though ; INSERT ... VALUES syntax in MySQL accepts the insertion of multiple rows.
By the way, where do you get your 100k IDs, if it's not from the database ? If they come from a preceding request, I'd suggest to have it fill the temporary table.
Edit : For a more portable way of multiple insert :
INSERT INTO mytable (col1, col2) SELECT 'foo', 0 UNION SELECT 'bar', 1
Do those id's actually reference the table with 1M rows?
If so, you could use SELECT * ids FROM <1M table>
where ids is the ID column and where "1M table" is the name of the table which holds the 1M rows.
but I don't think I really understand your question...