Read Rows But search first - sql

I want to make an import system that will look into one Datasource and copy new records into another DataSource.
Monthly I want to copy some tables data from one datasource to another datasource
SourceTableName : srcTable
DestinationTableName : destTable
Suppose first month in source table I have:
Id Name 1 john
3 Rahul 5 Andrew
All three rows Will be copy into desTable
Suppose Second Month in Source Table I have
Id Name 1 John
3 Rahul 5 Andrew
6 Vikas 7 Sonam
8 Divya
Firstly Sql Should get the last Row of desTable
and match that row into srcTable
and extract all new records from scrTable and copied into desTable
.....
Please let me know how I can write query for fulfill above purpose. If there is shorter approach, that would be helpful too.

Since you only care about adding new records, and don't need to handle updates or deletes... You can simply add the record from the source table if it doesn't exist in the destination table:
INSERT INTO destTable (ID, Name)
SELECT s.ID, s.Name
FROM
srcTable s
LEFT OUTER JOIN destTable d ON d.ID = s.ID
WHERE
d.ID IS NULL

You can write a stored procedure for do this action and execute that every time you want.
for this action you can from bellow query:
(Part 1 for insert new data, Part 2 for update change data)
Insert Into DestinationTable(ID, Name)
Select ID, Name
From SoiurceTable
Where Not Exists
(Select *
From TDestinationTablest
Where DestinationTable.ID = SoiurceTable.ID)
Go
Update DestinationTable
Set DestinationTable.Name = SoiurceTable.Name
From DestinationTable, SoiurceTable
Where DestinationTable.ID = SoiurceTable.ID
I hope it's helpful.

Related

How to delete records in BigQuery based on values in an array?

In Google BigQuery, I would like to delete a subset of records, based on the value of a specific column. It's a query that I need to run repeatedly and that I would like to run automatically.
The problem is that this specific column is of the form STRUCT<column_1 ARRAY (STRING), column_2 ARRAY (STRING), ... >, and I don't know how to use such a column in the where-clause when using the delete-command.
Here is basically what I am trying to do (this code does not work):
DELETE
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
The error that I'm getting is: Syntax error: Expected end of input but got keyword LEFT at [3:1]
If I replace the DELETE with SELECT *, it does work:
SELECT *
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
Does somebody know how to use such a column to delete a subset of records?
EDIT:
Here is some code to create a reproducible example with some silly data (fill in your own dataset and table name in all queries):
Suppose you want to delete all rows where category.type contains the value 'food'.
1 - create a table:
CREATE TABLE <DATASET>.<TABLE_NAME>
(
article STRING,
category STRUCT<
color STRING,
type ARRAY<STRING>
>
);
2 - Insert data into the new table:
INSERT <DATASET>.<TABLE_NAME>
SELECT "apple" AS article, STRUCT('red' AS color, ['fruit','food'] as type) AS category
UNION ALL
SELECT "cabbage" AS article, STRUCT('blue' AS color, ['vegetable', 'food'] as type) AS category
UNION ALL
SELECT "book" AS article, STRUCT('red' AS color, ['object'] as type) AS category
UNION ALL
SELECT "dog" AS article, STRUCT('green' AS color, ['animal', 'pet'] as type) AS category;
3 - Show that select works (return all rows where category.type contains the value 'food'; these are the rows I want to delete):
SELECT *
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Initial Result
4 - My attempt at deleting rows where category.type contains 'food' does not work:
DELETE
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Syntax error: Unexpected keyword LEFT at [3:1]
Desired Result
This is the code I used to delete the desired records (the records where category.type contains the value 'food'.)
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS(SELECT 1 FROM UNNEST(t1.category.type) t2 WHERE t2 = 'food')
The embarrasing thing is that I've seen these kind of answers on similar questions (for example on update-queries). But I come from Oracle-SQL and I think that there you are required to connect your subquery with your main query in the WHERE-statement of the subquery (ie. connect t1 with t2), so I didn't understand these answers. That's why I posted this question.
However, I learned that BigQuery automatically understands how to connect table t1 and 'table' t2; you don't have to explicitly connect them.
Now it is possible to still do this (perhaps even recommended?):
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS (SELECT 1 FROM <DATASET>.<TABLE_NAME> t2 LEFT JOIN UNNEST(t2.category.type) AS type WHERE type = 'food' AND t1.article=t2.article)
but a second difficulty for me was that my ID in my actual data is somehow hidden in an array>struct-construction, so I got stuck connecting t1 & t2. Fortunately this is not always an absolute necessity.
Since you did not provide any sample data I am going to explain using some dummy data. In case you add your sample data, I can update the answer.
Firstly,according to your description, you have only a STRUCT not an Array[Struct <col_1, col_2>].For this reason, you do not need to use UNNEST to access the values within the data. Below is an example how to access particular data within a STRUCT.
WITH data AS (
SELECT 1 AS id, STRUCT("Alex" AS name, 30 AS age, "NYC" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Leo" AS name, 18 AS age, "Sydney" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Robert" AS name, 25 AS age, "Paris" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Mary" AS name, 28 AS age, "London" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Ralph" AS name, 45 AS age, "London" AS city) AS info
)
SELECT * FROM data
WHERE info.city = "London"
Notice that the STRUCT is named info and the data we accessed is city and used it in the WHERE clause.
Now, in order to delete the rows that contains an specific value within the STRUCT , in your case I assume it would be your_struct.column_1, you can use DELETE or MERGE and DELETE. I have saved the above data in a table to execute the below examples, which have the same output,
First method: DELETE
DELETE FROM `project.dataset.table`
WHERE info.city = "Sydney"
Second method: MERGE and DELETE
MERGE `project.dataset.table` a
USING (SELECT * from `project.dataset.table` WHERE info.city ="London") b
ON a.info.city =b.info.city
WHEN matched and b.id=1 then
Delete
And the output for both queries,
Row id info.name info.age info.city
1 1 Alex 30 NYC
2 1 Robert 25 Paris
3 1 Ralph 45 London
4 1 Mary 28 London
As you can see the row where info.city = "Sydney" was deleted in both cases.
It is important to point out that your data is excluded from your source table. Therefore, you should be careful.
Note: Since you want to run this process everyday, you could use Schedule Query within BigQuery Console, appending or overwriting the results after each run. Also, it is a good practice not deleting data from your source table. Thus, consider creating a new table from your source table without the rows you do not desire.

SQL Query based on specific conditions

Figure 1 denotes the current state of the TABLE A and TABLE B.
The current implementation is to fetch the new MIds to Table B and copy the SqlQuery from base process, in case of a new market or an existing market. Below query is used for this:
SELECT A.MId, B1.Loop, B1.Segment, B1.SqlQuery, B1.UseDefault
FROM TableB B1 WITH (NOLOCK)
INNER JOIN TableA A WITH (NOLOCK) ON B1.MId IN (100, 200)
AND B1.MId = A.BaseMarket
AND ISNULL(A.POCId, 0) > 0
LEFT JOIN TableB B2 WITH (NOLOCK) ON A.MId = B2.MId
WHERE B2.MId IS NULL
Figure 2 shows the updated data in Table A and the desired state of Table B. The required implementation would be:
To fetch the new MIds to Table B and copy the SqlQuery from Base Process, if it's a new market (XYZ Market - 2001, 2002)
If the market configuration already exists in Table B (Market ABC - 1001 and 1002), then copy the existing configuration's SqlQuery.
Here's the complete flow for Table A and B. The base configurations (100 and 200) in both tables were inserted manually initially including the loop and segments.
A new market is introduced and a new MId is created in Table A. Let's assume that to be 1001 and 1002 for Market ABC.
Corresponding records are inserted in Table B for each MId and it copies data from Base Configuration in Table B. Inserted Records (SqlId - 3 and 4)
SqlQuery column in Table B is updated manually due to a specific business request. (SqlId - 3 and 4). Hence, the different query.
Market ABC is updated in front end, which creates two new entries in Table A. (MId - 1003 and 1004). Also, new market XYZ (MId - 2001 and 2002) is created.
Corresponding entries created in Table B should refer Base Configuration for Market XYZ (SqlId - 7 and 8), since it's a new market but should copy the existing configuration for Market ABC (MId - 1001 and 1002) since it's configuration already existed.
I am looking for a suggestions if a single query can implement this requirement using Case statement. I'll appreciate your help!
I guess by market configuration already exists you actually mean the combination of MarketName and Type. So here's the query
SELECT
A.NewId, B.Loop, B.Segment, B.SqlQuery, B.UseDefault
FROM (
SELECT
A1.MId AS NewId, A2.MId AS RefId
FROM
TableA A1
INNER JOIN
TableA A2
ON
(A1.MarketName = A2.MarketName AND A1.Type = A2.Type) -- use your market configuration logic here
OR
A1.BaseMarket = A2.BaseMarket
WHERE
A1.Mid NOT IN (SELECT MId FROM TableB)
) As A
INNER JOIN
TableB B
ON (A.RefId = B.MID)
At first we are self-joining TableA to get the reference MId as RefId here. Then we are joining the new derived table with TableB.
Hope this helps. Thank you!

Handling duplicates while updating records in SQL using where clause

I have a situation where i need to update a row in table and when faced with a duplicate entry then take a decision based on another column value.
For example let's say my table is like this : salary_table
Salary Username Usersurname Last_entry_date
3000 abc bak 20-feb-13
4000 sdf kup 20-mar-15
5000 abc bak 20-mar-15
so my update query is something like this
update salary_table
set salary=9000
where username=abc
and usersurname=bak;
For records like row 2 when there is unique entry this will not cause any problem
but i will get multiple rows for records like 1 (1 and 3) but i only want to update one row. In this case i would like to check last_entry_date. And the entry which has latest date (row 3 in this case) should get updated.
How can it be done ?
Any help would be highly appreciated.
Update salary_table
set salary = 9000
where username= 'abc'
and usersurname= 'bak'
and Last_entry_date = (select max(Last_entry_date)
from SalaryTable
where s.username = username
and s.usersurname = usersurname);
you have to add "where clause" on what you want in this case
"last_entry_date = ??"
With out adding proper filter how you identify which row to be updated.

TSQL Inserting records and track ID

I would like to insert records in a table below (structure of table with example data). I have to use TSQL to achieve this:
MasterCategoryID MasterCategoryDesc SubCategoryDesc SubCategoryID
1 Housing Elderly 4
1 Housing Adult 5
1 Housing Child 6
2 Car Engine 7
2 Car Engine 7
2 Car Window 8
3 Shop owner 9
So for example if I enter in a new record with MasterCategoryDesc = 'Town' it will insert '4' in MasterCategoryID with the respective SubCategoryDesc + ID.
CAN I SIMPLIFY THIS QUESTION BY REMOVING THE SubCategoryDesc and SubCategoryID columns. How can I achieve this now just with the 2 columns MasterCategoryID and MasterCategoryDesc
INSERT into Table1
([MasterCategoryID], [MasterCategoryDesc], [SubCategoryDesc], [SubCategoryID])
select TOP 1
case when 'Town' not in (select [MasterCategoryDesc] from Table1)
then (select max([MasterCategoryID])+1 from Table1)
else (select [MasterCategoryID] from Table1 where [MasterCategoryDesc]='Town')
end as [MasterCategoryID]
,'Town' as [MasterCategoryDesc]
,'owner' as [SubCategoryDesc]
,case when 'owner' not in (select [SubCategoryDesc] from Table1)
then (select max([SubCategoryID])+1 from Table1)
else (select [SubCategoryID] from Table1 where [SubCategoryDesc]='owner')
end as [SubCategoryID]
from Table1
SQL FIDDLE
If you want i can create a SP too. But you said you want an T-SQL
This will take three steps, preferably in a single Stored Procedure. Make sure it's within a transaction.
a) Check if the MasterCategoryDesc you are trying to insert already exists. If so, take its ID. If not, find the highest MasterCategoryID, increase by one, and save it to a variable.
b) The same with SubCategoryDesc and SubCategoryID.
c) Insert the new record with the two variables you created in steps a and b.
Create a table for the MasterCategory and a table for the SubCategory. Make an ___ID column for each one that is identity (1,1). When loading, insert new rows for nonexistent values and then look up existing values for the INSERT.
Messing around with finding the Max and looking up data in the existing table is, in my opinion, a recipe for failure.

Need a SQL statement focus on combination of tables but entries always with unique ID

I need SQL code to solve the tables combination problem, described on below:
Table old data: table old
name version status lastupdate ID
A 0.1 on 6/8/2010 1
B 0.1 on 6/8/2010 2
C 0.1 on 6/8/2010 3
D 0.1 on 6/8/2010 4
E 0.1 on 6/8/2010 5
F 0.1 on 6/8/2010 6
G 0.1 on 6/8/2010 7
Table new data: table new
name version status lastupdate ID
A 0.1 on 6/18/2010
#B entry deleted
C 0.3 on 6/18/2010 #version_updated
C1 0.1 on 6/18/2010 #new_added
D 0.1 on 6/18/2010
E 0.1 off 6/18/2010 #status_updated
F 0.1 on 6/18/2010
G 0.1 on 6/18/2010
H 0.1 on 6/18/2010 #new_added
H1 0.1 on 6/18/2010 #new_added
the difference of new data and old date:
B entry deleted
C entry version updated
E entry status updated
C1/H/H1 entry new added
What I want is always keeping the ID - name mapping relationship in old data table no matter how data changed later, a.k.a the name always has an unique ID number bind with it.
If entry has update, then update the data, if entry is new added, insert to the table then give a new assigned unique ID. If the entry was deleted, delete the entry and do not reuse that ID later.
However, I can only use SQL with simple select or update statement then it may too hard for me to write such code, then I hope someone with expertise can give direction, no details needed on the different of SQL variant, a standard sql code as sample is enough.
Thanks in advance!
Rgs
KC
========
I listed my draft sql here, but not sure if it works, some one with expertise pls comment, thanks!
1.duplicate old table as tmp for store updates
create table tmp as
select * from old
2.update into tmp where the "name" is same in old and new table
update tmp
where name in (select name from new)
3.insert different "name" (old vs new) into tmp and assign new ID
insert into tmp (name version status lastupdate ID)
set idvar = max(select max(id) from tmp) + 1
select * from
(select new.name new.version new.status new.lastupdate new.ID
from old, new
where old.name <> new.name)
4. delete the deleted entries from tmp table (such as B)
delete from tmp
where
(select ???)
You never mentioned what DBMS you are using but if you are using SQL Server, one really good one is the SQL MERGE statement. See: http://www.mssqltips.com/tip.asp?tip=1704
The MERGE statement basically works as
separate insert, update, and delete
statements all within the same
statement. You specify a "Source"
record set and a "Target" table, and
the join between the two. You then
specify the type of data modification
that is to occur when the records
between the two data are matched or
are not matched. MERGE is very useful,
especially when it comes to loading
data warehouse tables, which can be
very large and require specific
actions to be taken when rows are or
are not present.
Example:
MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE
ON (TARGET.ProductID = SOURCE.ProductID)
--When records are matched, update
--the records if there is any change
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName
OR TARGET.Rate <> SOURCE.Rate THEN
UPDATE SET TARGET.ProductName = SOURCE.ProductName,
TARGET.Rate = SOURCE.Rate
--When no records are matched, insert
--the incoming records from source
--table to target table
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, ProductName, Rate)
VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--When there is a row that exists in target table and
--same record does not exist in source table
--then delete this record from target table
WHEN NOT MATCHED BY SOURCE THEN
DELETE
--$action specifies a column of type nvarchar(10)
--in the OUTPUT clause that returns one of three
--values for each row: 'INSERT', 'UPDATE', or 'DELETE',
--according to the action that was performed on that row
OUTPUT $action,
DELETED.ProductID AS TargetProductID,
DELETED.ProductName AS TargetProductName,
DELETED.Rate AS TargetRate,
INSERTED.ProductID AS SourceProductID,
INSERTED.ProductName AS SourceProductName,
INSERTED.Rate AS SourceRate;
SELECT ##ROWCOUNT;
GO
Let me start from the end:
In #4 you would delete all rows in tmp; what you wanted to say there is WHERE tmp.name NOT IN (SELECT name FROM new); similarly #3 is not correct syntax, but if it was it would try to insert all rows.
Regarding #2, why not use auto increment on the ID?
Regarding #1, if your tmp table is the same as new the queries #2-#4 make no sense, unless you change (update, insert, delete) new table in some way.
But (!), if you do update the table new and it has an auto increment field on ID and if you are properly updating the table (using ID) from the application then your whole procedure is unnecessary (!).
So, the important thing is that you should not design the system to work like above.
To get the concept of updating data in the database from the application side take a look at examples here (php/mysql).
Also, to get the syntax correct on your queries go through the basic version of SET, INSERT, DELETE and SELECT commands (no way around this).
Note - if you are concerned about performance you can skip this whole answer :-)
If you can redesign have 2 tables - one with the data and other with the name - ID linkage. Something like
table_original
name version status lastupdate
A 0.1 on 6/8/2010
B 0.1 on 6/8/2010
C 0.1 on 6/8/2010
D 0.1 on 6/8/2010
E 0.1 on 6/8/2010
F 0.1 on 6/8/2010
G 0.1 on 6/8/2010
and name_id
name ID
A 1
B 2
C 3
D 4
E 5
F 6
G 7
When you get the table_new with the new set of data
TRUNCATE table_original
INSERT INTO name_id (names from table_new not in name_id)
copy table_new to table_original
Note : I think there's a bit of ambiguity about the deletion here
If the entry was deleted, delete the
entry and do not reuse that ID later.
If name A gets deleted, and it turns up again in a later set of updates do you want to a. reuse the original ID tagged to A, or b. generate a new ID?
If it's b. you need a column Deleted? in name_id and a last step
4 . set Deleted? = Y where name not in table_original
and 2. would exclude Deleted? = Y records.
You could also do the same thing without the name_id table based on the logic that the only thing you need from table_old is the name - ID links. Everything else you need is in table_new,
This works in Informix and gives exactly the display you require. Same or similar should work in MySQL, one would think. The trick here is to get the union of all names into a temp table and left join on that so that the values from the other two can be compared.
SELECT DISTINCT name FROM old
UNION
SELECT DISTINCT name FROM new
INTO TEMP _tmp;
SELECT
CASE WHEN b.name IS NULL THEN ''
ELSE aa.name
END AS name,
CASE WHEN b.version IS NULL THEN ''
WHEN a.version = b.version THEN a.version
ELSE b.version
END AS version,
CASE WHEN a.status = b.status THEN a.status
WHEN b.status IS NULL THEN ''
ELSE b.status
END AS status,
CASE WHEN a.lastupdate = b.lastupdate THEN a.lastupdate
WHEN b.lastupdate IS NULL THEN null
ELSE b.lastupdate
END AS lastupdate,
CASE WHEN a.name IS NULL THEN '#new_added'
WHEN b.name IS NULL THEN '#' || aa.name || ' entry deleted'
WHEN a.version b.version THEN '#version_updated'
WHEN a.status b.status THEN '#status_updated'
ELSE ''
END AS change
FROM _tmp aa
LEFT JOIN old a
ON a.name = aa.name
LEFT JOIN new b
ON b.name = aa.name;
a drafted approach, I have no idea if it works fine......
CREATE TRIGGER auto_next_id
AFTER INSERT ON table FOR EACH ROW
BEGIN
UPDATE table SET uid = max(uid) + 1 ;
END;
If I understood well what you need based on the comments in the two tables, I think you can simplify a lot your problem if you don't merge or update the old table because what you need is table new with the IDs in table old when they exist and new IDs when they do not exist, right?
New records: table new has the new records already - OK (but they need a new ID)
Deleted Records: they are not in table new - OK
Updated Records: already updated in table new - OK (need to copy ID from table old)
Unmodified records: already in table new - OK (need to copy ID from table old)
So the only thing you need to do is to:
(a) copy the IDs from table old to table new when they exist
(b) create new IDs in table new when they do not exist in table old
(c) copy table new to table old.
(a) UPDATE new SET ID = IFNULL((SELECT ID FROM old WHERE new.name = old.name),0);
(b) UPDATE new SET ID = FUNCTION_TO GENERATE_ID(new.name) WHERE ID = 0;
(c) Drop table old;
CREATE TABLE old (select * from new);
As I don't know which SQL database you are using, in (b) you can use an sql function to generate the unique id depending on the database. With SQL Server, newid(), With postgresql (not too old versions), now() seems a good choice as its precision looks sufficient (but not in other databases as MySQL for example as I think the precision is limited to seconds)
Edit: Sorry, I hadn't seen you're using sqlite and python. In this case you can use str(uuid.uuid4()) function (uuid module) in python to generate the uuid and fill the ID in new table where ID = 0 in step (b). This way you'll be able to join 2 independent databases if needed without conflicts on the IDs.
Why don't you use a UUID for this? Generate it once for a plug-in, and incorporate/keep it into the plug-in, not into the DB. Now that you mention python, here's how to generate it:
import uuid
UID = str(uuid.uuid4()) # this will yield new UUID string
Sure it does not guarantee global uniqueness, but chances you get the same string in your project is pretty low.