Updating Relational Tables using merge - sql

In a hypothetical example, say I have two tables: FARM and FRUIT
FARM is organized like:
FARM_ID Size
1 50
2 100
3 200
...
and FRUIT is organized like:
Reference_ID FRUIT
1 Banana
1 Grape
1 Orange
2 Banana
2 Strawberry
FRUIT table is created from taking a parameter #fruit from excel which is a delimited string using '/'.
For example, #fruit = 'Banana/Grape/Orange'
And using a statement like:
INSERT INTO FRUIT(
Fruit,
Reference_ID,
)
SELECT Fruit, Scope_IDENTITY() from split_string(#fruit, '/')
Where split_string is a function.
My goal is to check for updates. I want to take in a Farm_ID and #fruit and check to see if any changes have been made to the fruit.
1) If the values haven't changed, dont do anything
2) If a new fruit was added, add it to the FRUIT table with the farm_ID
3) If there is a fruit in the FRUIT table that does not correspond to the new delimited list for the respectful FARM_ID, remove it from the FRUIT table.
I think a Merge statement would probably work but open to suggestions. Let me know if anything is unclear. Thank you
EDIT
Im fairly new to SQL but have tried using a merge...
Declare #foo tinyint
Merge Fruit as Target
Using (Select Fruit , #workingID From split_string(#fruit, '/') As source (fruit, ID)
--#workingID is just a way to get the ID from other parts of the sproc.
ON (TARGET.fruit = source.fruit)
WHEN MATCHED THEN
SET #foo = 1
WHEN NOT MATCHED
THEN DELETE
WHEN NOT MATCHED THEN
INSERT INTO FRUIT(
Reference_ID,
Fruit
)
VALUES(
Then I am a bit stuck on how to get unique, new values

Any way your input contains the new fruit list against the farm id. So better option is to delete the existing and insert the new list of fruit against the farmid.
Sample script is given below.
--loading the input to temp table
SELECT Fruit,#referenceid ReferenceId -- farmid corresponding tithe fruit list
INTO #temp
FROM Split_string(#fruit,'/')
-- delete the existing data against the given farmid
DELETE FROM fruit f
WHERE EXISTS ( SELECT 1 FROM #temp t
WHERE f.Reference_id=t.ReferenceId)
-- insert the new list
INSERT INTO fruit
SELECT fruit,referenceId
FROM #temp

Related

SQL query for key value table with 1:n relation

I have a table in which I want to store images. Each image has arbitrary properties that I want to store in a key-value table.
The table structure looks like this
id
fk_picture_id
key
value
1
1
camera
iphone
2
1
year
2001
3
1
country
Germany
4
2
camera
iphone
5
2
year
2020
6
2
country
United States
Now I want a query to find all pictures made by an iphone I could to something like this
select
fk_picture_id
from
my_table
where
key = 'camera'
and
value = 'iphone';
This works without any problems. But as soon as I want to add another key to my query I am get stucked. Lets say, I want all pictures made by an iPhone in the year 2020, I can not do something like
select
distinct(fk_picture_id)
from
my_table
where
(
key = 'camera'
and
value = 'iphone'
)
or
(
key = 'year'
and
value = '2020'
)
...because this selects the id 1, 4 and 5.
At the end I might have 20 - 30 different criteria to look for, so I don't think some sub-selects would work at the end.
I'm still in the design phase, which means I can still adjust the data model as well. But I can't think of any way to do this in a reasonable way - except to include the individual properties as columns in my main table.
A pattern you can consider here is to build a table of search parameters, then simply join this to your target table.
You would first create a temporary table with key and value columns then insert into it the search criteria values, any number of values you wish.
Using a CTE in place of a temporary table might look like:
with s as (
select 'camera' key, 'iphone' value
union all
select 'year', '2020'
)
select distinct t.fk_picture_id
from s
join t on t.key=s.key and t.value=s.value
The solution I found - thanks to this article
How to query data based on multiple 'tags' in SQL?
is that I made some changes to the database model
picture
id
name
1
Picture 1
2
Picture 2
And then I created a table for the tags
tag
id
tag
100
Germany
101
IPhone
102
United States
And the cross table
picture_tag
fk_picture_id
fk_tag_id
1
100
1
101
2
101
2
102
For a better understanding of the datasets
Picture
Tagname
Picture 1
Germany & Iphone
Picture 2
United States & IPhone
Now I can use the following statement
SELECT *
FROM picture
INNER JOIN (
SELECT fk_picture_id
FROM picture_tag
WHERE fk_tag_id IN (100, 101)
GROUP BY fk_picture_id
HAVING COUNT(fk_tag_id) = 2
) AS picture_tag
ON picture.id = picture_tag.fk_picture_id;
The only thing I need to do before the query is to collect the IDs of the tags I want to search for and put the number of tags in the having count statement.
If someone needs the example data, here are the sql statements for the tables and data
create table picture (
id integer,
name char(100)
);
create table tag (
id integer,
tag char(100)
);
create table picture_tag (
fk_picture_id integer,
fk_tag_id integer
);
insert into picture values (1, 'Picture 1');
insert into picture values (2, 'Picture 2');
insert into tag values (100, 'Germay');
insert into tag values (101, 'iphone');
insert into tag values (102, 'United States');
insert into picture_tag values (1, 100);
insert into picture_tag values (1, 101);
insert into picture_tag values (2, 101);
insert into picture_tag values (2, 102);

Hive sql loop through table comparing values

I have a table in hive that looks like the below
fruit value
apple 2
apple 3
apple 4
plum 2
plum 3
plum 4
I want to loop through the table and compare the previous value and fruit and create a new column(total) based off of the loop. this would be the logic
if [fruit] = previous[fruit] then total = prev[fruit]
The new table should look like this
fruit value total
apple 2
apple 3 2
apple 4 3
plum 2
plum 3 2
plum 4 3
How can i achieve this using SQL in Hive?
Also i have ordered the results in my query so its grouped by fruit and ascending values
SQL tables represent unordered sets. There is no "previous" row unless a column specifies the ordering. Assuming you have such a column, then you can use lag():
select t.*,
lag(value) over (partition by fruit order by ?) as prev_value
from t;
The ? is for the name of the column that specifies the ordering.
Adding to the previous answer, you can artificially create an order by writing to a temp table like this:
create table #holding (rowid int identity, fruit varchar(max), value int)
insert #holding
select fruit, value from your table
order by fruit, value
This will recreate the order in the original table and allow you to do what Gordon said above

Fruits are running away ! Stored Procedure SQL?

I have a table like this,
Table - Fruits
ID
Fruit_Truck_ID
Fruit_Crate_ID (can be null)
Mango (column exists but might get deleted) (these columnns are bit)
Apple
etc etc...
Now, what I want is that, create a stored procedure, which will either provide (truck id + fruit name) or provide (truck id + crate id + fruit name)
Fruit Columns can be added and deleted so I need to make a stored procedure which do following,
If in table, if fruit crate id is not null then select using truck id and crate id + given name of column, e.g. if I give this, GetFruitMango(truck_ID, Crate_ID, "Mango", it should return me the value of mango column which is bit (0 or 1)
Now if Crate_ID provided is not in table or is null then use truck_ID and Column name to get what's in mango column etc.

TSQL Inserting records and track ID

I would like to insert records in a table below (structure of table with example data). I have to use TSQL to achieve this:
MasterCategoryID MasterCategoryDesc SubCategoryDesc SubCategoryID
1 Housing Elderly 4
1 Housing Adult 5
1 Housing Child 6
2 Car Engine 7
2 Car Engine 7
2 Car Window 8
3 Shop owner 9
So for example if I enter in a new record with MasterCategoryDesc = 'Town' it will insert '4' in MasterCategoryID with the respective SubCategoryDesc + ID.
CAN I SIMPLIFY THIS QUESTION BY REMOVING THE SubCategoryDesc and SubCategoryID columns. How can I achieve this now just with the 2 columns MasterCategoryID and MasterCategoryDesc
INSERT into Table1
([MasterCategoryID], [MasterCategoryDesc], [SubCategoryDesc], [SubCategoryID])
select TOP 1
case when 'Town' not in (select [MasterCategoryDesc] from Table1)
then (select max([MasterCategoryID])+1 from Table1)
else (select [MasterCategoryID] from Table1 where [MasterCategoryDesc]='Town')
end as [MasterCategoryID]
,'Town' as [MasterCategoryDesc]
,'owner' as [SubCategoryDesc]
,case when 'owner' not in (select [SubCategoryDesc] from Table1)
then (select max([SubCategoryID])+1 from Table1)
else (select [SubCategoryID] from Table1 where [SubCategoryDesc]='owner')
end as [SubCategoryID]
from Table1
SQL FIDDLE
If you want i can create a SP too. But you said you want an T-SQL
This will take three steps, preferably in a single Stored Procedure. Make sure it's within a transaction.
a) Check if the MasterCategoryDesc you are trying to insert already exists. If so, take its ID. If not, find the highest MasterCategoryID, increase by one, and save it to a variable.
b) The same with SubCategoryDesc and SubCategoryID.
c) Insert the new record with the two variables you created in steps a and b.
Create a table for the MasterCategory and a table for the SubCategory. Make an ___ID column for each one that is identity (1,1). When loading, insert new rows for nonexistent values and then look up existing values for the INSERT.
Messing around with finding the Max and looking up data in the existing table is, in my opinion, a recipe for failure.

SQL to search duplicates

I have a table for animals like
Lion
Tiger
Elephant
Jaguar
List item
Cheetah
Puma
Rhino
I want to insert new animals in this table and I am t reading the animal names from a CSV file.
Suppose I got following names in the file
Lion,Tiger,Jaguar
as these animals are already in "Animals" table, What should be a single SQL query that will determine if the animals are already exist in the table.
Revision 1
I want a query that will give me the list of animals that are already in table. I donot want a query to insert that animal.
I just want duplicate animals
To just check if a Lion is already in the table:
select count(*) from animals where name = 'Lion'
You can do the check and the insert in one query with a where clause:
insert into animals (name)
select 'Lion'
where not exists
(
select * from animals where name = 'Lion'
)
In reply to your comment, to select a sub-list of animals:
select name from animals where name in ('Lion', 'Tiger', 'Jaguar')
This would return up to 3 rows for each animal that already exists.
SELECT COUNT(your_animal_column) FROM tblAnimals WHERE your_animal_column = ?;
The question marks get filled by your csv values.
If this statement returns more than 0 there the value already exists
If your incoming file is already in a normalized column format, instead of comma separated like you have displayed, it should be easy. Create a temp table for the insert, then something like..
insert into YourLiveTable ( animalname )
select Tmp.animalname
from YourTempInsertTable Tmp
where Tmp.animalname not in
( select Live.animalname
from YourLiveTable Live )
To match your revised request... just use the select portion and change "NOT IN" to "IN"
select Tmp.animalname
from YourTempInsertTable Tmp
where Tmp.animalname IN
( select Live.animalname
from YourLiveTable Live )
How about
SELECT ANIMAL_NAME, COUNT(*)
FROM ANIMALS
GROUP BY ANIMAL_NAME
HAVING COUNT(*) > 1
This should give you all the duplicate entries