How to merge two table using order by? - sql

While trying to merge two tables, when rows not matched how do I insert rows based on an order. For example in table_2 I have a column "Type" (sample values 1,2,3 etc), so when I do an insert for unmatched codes I need to insert records with type as 1 first, then 2 etc.
So far I tried below code
WITH tab1 AS
(
select * From TABLE_2 order by Type
)
merge tab1 as Source using TABLE_1 as Target on Target.Code=Source.Code
when matched then update set Target.Description=Source.Description
when not matched then insert (Code,Description,Type)
values (Source.Code,Source.Description,Source.Type);
But I get "The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified." error because of using order by in sub query.
So how do I insert records based on an order while merging two table?
Thanks in advance.

Change
select *
to
select top 100 percent
That will allow you to use ORDER BY in the first select

Related

How to Identify matching records in two tables?

I have two tables with same column names. There are a total 40 columns in each table. Both the tables have same unique IDs. If I perform an inner join on the ID columns I get a match on 80% of the data. However, I would like to see if this match has exactly same data in each of the columns.
If there were a few rows like say 50-100 I could have performed a simple union operation ordered by ID and manually checked for the data. But both the tables contain more than 5000 records.
Is a join on each of the columns a valid solution for this or do I need to perform concatenation?
Suppose you have N columns, you can add GROUP BY COL1,COL2,....COLN
select * from table1
union all
select * from table2
group by COL1, COL2, ... , COLN
having count(*)>1;
Reference: link

Bigquery- problem with the combination of insert into and 2 level order by

I'm trying to update a table using insert into
the data I want to enter includes two levels of order by.
for some reason after the run the target table includes only the first level of the order by.
the order work perfectly when I run the query without the insert into part.
any thoughts?
to demonstrate my query:
insert into table_b
select * from table_a order by column_a , column_b desc
thanks
Do you mean in the bigquery preview of table_b, the records does not show up in the order you want?
The preview of table_b, is not necessary in the order of your insert operations.
If you want to see records in table_b, in the order you want. you need to
select * from table_b order by column_a , column_b desc
SQL tables represent unordered sets. There is no such thing as "ordering" in a SQL table unless a column specifies the ordering.
You have no control over the rows shown in the preview pane (well, they might come from the first partition on the table). And if you do a select * from t with no order by the results are in an indeterminate order -- and that ordering might change each time you run the query.

Deduplicate rows in complex schema in a bigquery partition

I have read some threads but I know too little sql to solve my problem.
I have a table with a complex schema with records and nested fields.
Below you see a query which finds the exact row that I need to deduplicate.
SELECT *
FROM my-data-project-214805.rfid_data.rfid_data_table
WHERE DATE(_PARTITIONTIME) = "2020-02-07"
AND DetectorDataMessage.Header.MessageID ='478993053'
DetectorDataMessage.Header.MessageID is supposed to be unique.
How can I delete one of these rows? (there are two)
If possible I would like deduplicate the whole table but its partitioned and I can't get it right. I try the suggestions in below threads but I get this error Column DetectorDataMessage of type STRUCT cannot be used in...
Threads of interest:
Deduplicate rows in a BigQuery partition
Delete duplicate rows from a BigQuery table
Any suggestions? Can you guide me in the right direction?
Try using a MERGE to remove the existing duplicate rows, and a single identical one. In this case I'm going for a specific date and id, as in the question:
MERGE `temp.many_random` t
USING (
# choose a single row to replace the duplicates
SELECT a.*
FROM (
SELECT ANY_VALUE(a) a
FROM `temp.many_random` a
WHERE DATE(_PARTITIONTIME)='2018-10-01'
AND DetectorDataMessage.Header.MessageID ='478993053'
GROUP BY _PARTITIONTIME, DetectorDataMessage.Header.MessageID
)
)
ON FALSE
WHEN NOT MATCHED BY SOURCE
# delete the duplicates
AND DATE(_PARTITIONTIME)='2018-10-01'
AND DetectorDataMessage.Header.MessageID ='478993053'
THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
Based on this answer:
Deduplicate rows in a BigQuery partition
If all of the values in the duplicate rows are the same, just use 'SELECT distinct'.
If not, I would use the ROW_NUMBER() function to create a rank for each unique index, and then just choose the first rank.
I don't know what your columns are, but here's an example:
WITH subquery as
(select MessageId
ROW_NUMBER() OVER(partition by MessageID order by MessageId ASC) AS rank
)
select *
from subquery
where rank = 1

How do I merge one SQL 2005 Table with another Table?

I have two tables both with one column each. I want to copy/merge the data from those two tables into another table with both columns. So in the example below I want the data from Table1 and Table2 to go into Table3.
I used this query:
INSERT **TABLE3** (BIGNUMBER)
SELECT BIGNUMBER
FROM **TABLE1**;
INSERT **TABLE3** (SMALLNUMBER)
SELECT SMALLNUMBER
FROM **TABLE2**;
When I did this it copied the data from Table1 and Table2 but didn't put the data on the same lines. So it ended up like this:
I am trying to get the data to line up... match. So BIGNUMBER 1234567812345678 should have SMALLNUMBER 123456 next to it. If I am querying I could do this with a JOIN and a LIKE 'SMALLNUMBER%' but I am not sure how to do that here to make the data end up like this:
It doesn't have to be fancy comparing the smallnumber to the bignumber. When I BULK insert data into TABLE1 and TABLE2 they are in the same order so simply copying the data into TABLE3 without caring if SMALL is the start of BIG is fine with me.
There is no relationship at all in these tables. This is the simplest form I can think of. Basically two flat tables that need to be merged side by side. There is no logic to implement... start at row 1 and go to the end on BIGNUMBER. Start at row 1 again and go to the end on SMALLNUMBER. All that matters is if BIGBUMBER has 50 rows and SMALLNUMBER has 50 rows, in the end, there is still only 50 rows.
When I was using the query above I was going off of a page I was reading on MERGE. Now that I look over this I don't see MERGE anywhere... so maybe I just need to understand how to use MERGE.
If the order of numbers is not important and you don't want to add another field to your source tables as jcropp suggested, you can use ROW_NUMBER() function within a CTE to align a number to each row and then make a join based on them
WITH C1 AS(
SELECT ROW_NUMBER() OVER (ORDER BY TABLE1.BIGNUMBER) AS Rn1
,BIGNUMBER
FROM TABLE1
)
,C2 AS(
SELECT ROW_NUMBER() OVER (ORDER BY TABLE2.SMALLNUMBER) AS Rn2
,SMALLNUMBER
FROM TABLE2
)
INSERT INTO TABLE3
SELECT C1.BIGNUMBER
,C2.SMALLNUMBER
FROM C1
INNER JOIN C2 ON C1.Rn1 = C2.Rn2
More information about ROW_NUMBER(), CTE and INSERT INTO SELECT
In order to use a JOIN statement to merge the two tables they each have to have a column that has common data. You don’t have that, but you may be able to introduce it:
Edit the structure of the first table. Add a column named something
like id and set the attributes of the id column to autonumber.
Browse the table to make sure that theid column has been assigned
numbers in the correct order.
Do the same for the second table.
After you’ve done a thorough check to ensure that the rows are
numbered correctly, run a query to merge the tables:
SELECT TABLE1.id, TABLE1.BIGNUMBER, TABLE2.SMALLNUMBER INTO TABLE3
FROM TABLE1 INNER JOIN TABLE2 ON TABLE1.id = TABLE2.id

how do i filter a column with multiple values

how do i filter a column col1 with multiple values
select * from table where col1=2 and col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'
i tired
select * from table where col1=2 or col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'
the result of both queries is incorrect
the first one gives zero results
second one gives 3 results
the correct is 2 results
the sql will run against the sqlite db.
AND is evaluated before OR, so your query is equivalent to:
select *
from table
where col1=2 or (col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1')
You need to explicitly group the conditions when mixing AND and OR:
select *
from table
where (col1=2 or col1=4) and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'