SQL to search duplicates

SQL to search duplicates - sql

I have a table for animals like
Lion
Tiger
Elephant
Jaguar
List item
Cheetah
Puma
Rhino
I want to insert new animals in this table and I am t reading the animal names from a CSV file.
Suppose I got following names in the file
Lion,Tiger,Jaguar
as these animals are already in "Animals" table, What should be a single SQL query that will determine if the animals are already exist in the table.
Revision 1
I want a query that will give me the list of animals that are already in table. I donot want a query to insert that animal.
I just want duplicate animals

To just check if a Lion is already in the table:
select count(*) from animals where name = 'Lion'
You can do the check and the insert in one query with a where clause:
insert into animals (name)
select 'Lion'
where not exists
(
select * from animals where name = 'Lion'
)
In reply to your comment, to select a sub-list of animals:
select name from animals where name in ('Lion', 'Tiger', 'Jaguar')
This would return up to 3 rows for each animal that already exists.

SELECT COUNT(your_animal_column) FROM tblAnimals WHERE your_animal_column = ?;
The question marks get filled by your csv values.
If this statement returns more than 0 there the value already exists

If your incoming file is already in a normalized column format, instead of comma separated like you have displayed, it should be easy. Create a temp table for the insert, then something like..
insert into YourLiveTable ( animalname )
select Tmp.animalname
from YourTempInsertTable Tmp
where Tmp.animalname not in
( select Live.animalname
from YourLiveTable Live )
To match your revised request... just use the select portion and change "NOT IN" to "IN"
select Tmp.animalname
from YourTempInsertTable Tmp
where Tmp.animalname IN
( select Live.animalname
from YourLiveTable Live )

How about
SELECT ANIMAL_NAME, COUNT(*)
FROM ANIMALS
GROUP BY ANIMAL_NAME
HAVING COUNT(*) > 1
This should give you all the duplicate entries

Related

Inserting data into a table(mutliple columns) which has primary key from another data which has data except primary key

I have a table that has 3 columns ID(Primary Key), Name, City.
I need to import data from another table that has only Name and City.
I can write insert into table 1(Name, City) select Name, City from table2.
But then I need ID in table 1 which needs to be inserted using a sequence.
I tried this:
insert into table1(ID, Name,City) values(seq.nextval, select distinct name, city from table2). But I am receiving an error saying an insufficient number of values.
I am trying it in SQL Oracle. Can someone please help me with this?

You are mixing the insert ... values and insert ... select syntax.
You edited your question to include distinct, implying you have duplicate name/city pairs that you want to suppress; but neither version gets the error you reported. If you don't have duplicates then you can just do:
insert into table1(ID, Name,City)
select seq.nextval, name, city from table2;
If you do have duplicates then you can't just add the distinct keyword, but you can use a subquery:
insert into table1 (id, name, city)
select seq.nextval, name, city
from (
select distinct name, city
from table2
);
db<>fiddle
You could also set the ID via a trigger. If you we're on a recent version you could use an identity column instead - but you tagged the question with Oracle 11g, where those are not available.

How to delete records in BigQuery based on values in an array?

In Google BigQuery, I would like to delete a subset of records, based on the value of a specific column. It's a query that I need to run repeatedly and that I would like to run automatically.
The problem is that this specific column is of the form STRUCT<column_1 ARRAY (STRING), column_2 ARRAY (STRING), ... >, and I don't know how to use such a column in the where-clause when using the delete-command.
Here is basically what I am trying to do (this code does not work):
DELETE
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
The error that I'm getting is: Syntax error: Expected end of input but got keyword LEFT at [3:1]
If I replace the DELETE with SELECT *, it does work:
SELECT *
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
Does somebody know how to use such a column to delete a subset of records?
EDIT:
Here is some code to create a reproducible example with some silly data (fill in your own dataset and table name in all queries):
Suppose you want to delete all rows where category.type contains the value 'food'.
1 - create a table:
CREATE TABLE <DATASET>.<TABLE_NAME>
(
article STRING,
category STRUCT<
color STRING,
type ARRAY<STRING>
>
);
2 - Insert data into the new table:
INSERT <DATASET>.<TABLE_NAME>
SELECT "apple" AS article, STRUCT('red' AS color, ['fruit','food'] as type) AS category
UNION ALL
SELECT "cabbage" AS article, STRUCT('blue' AS color, ['vegetable', 'food'] as type) AS category
UNION ALL
SELECT "book" AS article, STRUCT('red' AS color, ['object'] as type) AS category
UNION ALL
SELECT "dog" AS article, STRUCT('green' AS color, ['animal', 'pet'] as type) AS category;
3 - Show that select works (return all rows where category.type contains the value 'food'; these are the rows I want to delete):
SELECT *
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Initial Result
4 - My attempt at deleting rows where category.type contains 'food' does not work:
DELETE
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Syntax error: Unexpected keyword LEFT at [3:1]
Desired Result

This is the code I used to delete the desired records (the records where category.type contains the value 'food'.)
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS(SELECT 1 FROM UNNEST(t1.category.type) t2 WHERE t2 = 'food')
The embarrasing thing is that I've seen these kind of answers on similar questions (for example on update-queries). But I come from Oracle-SQL and I think that there you are required to connect your subquery with your main query in the WHERE-statement of the subquery (ie. connect t1 with t2), so I didn't understand these answers. That's why I posted this question.
However, I learned that BigQuery automatically understands how to connect table t1 and 'table' t2; you don't have to explicitly connect them.
Now it is possible to still do this (perhaps even recommended?):
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS (SELECT 1 FROM <DATASET>.<TABLE_NAME> t2 LEFT JOIN UNNEST(t2.category.type) AS type WHERE type = 'food' AND t1.article=t2.article)
but a second difficulty for me was that my ID in my actual data is somehow hidden in an array>struct-construction, so I got stuck connecting t1 & t2. Fortunately this is not always an absolute necessity.

Since you did not provide any sample data I am going to explain using some dummy data. In case you add your sample data, I can update the answer.
Firstly,according to your description, you have only a STRUCT not an Array[Struct <col_1, col_2>].For this reason, you do not need to use UNNEST to access the values within the data. Below is an example how to access particular data within a STRUCT.
WITH data AS (
SELECT 1 AS id, STRUCT("Alex" AS name, 30 AS age, "NYC" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Leo" AS name, 18 AS age, "Sydney" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Robert" AS name, 25 AS age, "Paris" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Mary" AS name, 28 AS age, "London" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Ralph" AS name, 45 AS age, "London" AS city) AS info
)
SELECT * FROM data
WHERE info.city = "London"
Notice that the STRUCT is named info and the data we accessed is city and used it in the WHERE clause.
Now, in order to delete the rows that contains an specific value within the STRUCT , in your case I assume it would be your_struct.column_1, you can use DELETE or MERGE and DELETE. I have saved the above data in a table to execute the below examples, which have the same output,
First method: DELETE
DELETE FROM `project.dataset.table`
WHERE info.city = "Sydney"
Second method: MERGE and DELETE
MERGE `project.dataset.table` a
USING (SELECT * from `project.dataset.table` WHERE info.city ="London") b
ON a.info.city =b.info.city
WHEN matched and b.id=1 then
Delete
And the output for both queries,
Row id info.name info.age info.city
1 1 Alex 30 NYC
2 1 Robert 25 Paris
3 1 Ralph 45 London
4 1 Mary 28 London
As you can see the row where info.city = "Sydney" was deleted in both cases.
It is important to point out that your data is excluded from your source table. Therefore, you should be careful.
Note: Since you want to run this process everyday, you could use Schedule Query within BigQuery Console, appending or overwriting the results after each run. Also, it is a good practice not deleting data from your source table. Thus, consider creating a new table from your source table without the rows you do not desire.

How can I make this an insert statement

I am new to SQL, So I have a table called 'DealerShip' and I have a many different car ID's . I have a dealership called 'Hondo' and another one called 'Mitch' . I would like to insert about 80 records into 'Mitch' that 'Hondo' has . For instance with this Query
select * from DealerShips where name='Hondo' and CarType=63
That query above contains about 70 records, How can I create an insert statement that will insert all the returned records from that query above ? The insert will go into the same table above except that the name will be 'Mitch' . I am using MSSQL 2012

INSERT INTO SELECT
INSERT INTO yourtable (FIELDS...) ---- fields here should match the select fields
SELECT FIELDS... FROM
DealerShips where name = 'Hondo' and CarType=63

Just make sure you list the columns in the insert and the select in the same order.
INSERT INTO DealerShips (name, cartype, more columns)
SELECT
'Mitch'
, cartype
, more columns
from DealerShips where name='Hondo' and CarType=63

Normalizing a table, from one to the other

I'm trying to normalize a mysql database....
I currently have a table that contains 11 columns for "categories". The first column is a user_id and the other 10 are category_id_1 - category_id_10. Some rows may only contain a category_id up to category_id_1 and the rest might be NULL.
I then have a table that has 2 columns, user_id and category_id...
What is the best way to transfer all of the data into separate rows in table 2 without adding a row for columns that are NULL in table 1?
thanks!

You can create a single query to do all the work, it just takes a bit of copy and pasting, and adjusting the column name:
INSERT INTO table2
SELECT * FROM (
SELECT user_id, category_id_1 AS category_id FROM table1
UNION ALL
SELECT user_id, category_id_2 FROM table1
UNION ALL
SELECT user_id, category_id_3 FROM table1
) AS T
WHERE category_id IS NOT NULL;
Since you only have to do this 10 times, and you can throw the code away when you are finished, I would think that this is the easiest way.

One table for users:
users(id, name, username, etc)
One for categories:
categories(id, category_name)
One to link the two, including any extra information you might want on that join.
categories_users(user_id, category_id)
-- or with extra information --
categories_users(user_id, category_id, date_created, notes)
To transfer the data across to the link table would be a case of writing a series of SQL INSERT statements. There's probably some awesome way to do it in one go, but since there's only 11 categories, just copy-and-paste IMO:
INSERT INTO categories_users
SELECT user_id, 1
FROM old_categories
WHERE category_1 IS NOT NULL

Insert into select and update in single query

I have 4 tables: tempTBL, linksTBL and categoryTBL, extra
on my tempTBL I have: ID, name, url, cat, isinserted columns
on my linksTBL I have: ID, name, alias columns
on my categoryTBL I have: cl_id, link_id,cat_id
on my extraTBL I have: id, link_id, value
How do I do a single query to select from tempTBL all items where isinsrted = 0 then insert them to linksTBL and for each record inserted, pickup ID (which is primary) and then insert that ID to categoryTBL with cat_id = 88. after that insert extraTBL ID for link_id and url for value.
I know this is so confusing, put I'll post this anyhow...
This is what I have so far:
INSERT IGNORE INTO linksTBL (link_id,link_name,alias)
VALUES(NULL,'tex2','hello'); # generate ID by inserting NULL
INSERT INTO categoryTBL (link_id,cat_id)
VALUES(LAST_INSERT_ID(),'88'); # use ID in second table
I would like to add here somewhere that it only selects items where isinserted = 0 and iserts those records, and onse inserted, will change isinserted to 1, so when next time it runs, it will not add them again.

As longneck said, you cannot do multiple things in one query, but you can in a stored procedure.

http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
INSERT INTO linksTBL (link_id,link_name,alias)
SELECT field1, field2, field3
FROM othertable
WHERE inserted=0;

this is not possible to do in a single query. you will have to insert the rows, then run a separate update statement.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL to search duplicates - sql

SELECT COUNT(your_animal_column) FROM tblAnimals WHERE your_animal_column = ?; The question marks get filled by your csv values. If this statement returns more than 0 there the value already exists

How about SELECT ANIMAL_NAME, COUNT() FROM ANIMALS GROUP BY ANIMAL_NAME HAVING COUNT() > 1 This should give you all the duplicate entries

Related

Inserting data into a table(mutliple columns) which has primary key from another data which has data except primary key

How to delete records in BigQuery based on values in an array?

How can I make this an insert statement

Normalizing a table, from one to the other

Insert into select and update in single query

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL to search duplicates - sql

SELECT COUNT(your_animal_column) FROM tblAnimals WHERE your_animal_column = ?; The question marks get filled by your csv values. If this statement returns more than 0 there the value already exists

How about SELECT ANIMAL_NAME, COUNT(*) FROM ANIMALS GROUP BY ANIMAL_NAME HAVING COUNT(*) > 1 This should give you all the duplicate entries

Related

Inserting data into a table(mutliple columns) which has primary key from another data which has data except primary key

How to delete records in BigQuery based on values in an array?

How can I make this an insert statement

Normalizing a table, from one to the other

Insert into select and update in single query

Categories

Resources

How about SELECT ANIMAL_NAME, COUNT() FROM ANIMALS GROUP BY ANIMAL_NAME HAVING COUNT() > 1 This should give you all the duplicate entries