How to "filter" duplicate item in select statement

How to "filter" duplicate item in select statement - sql

I have 2 tables : "Hotels" and "Area".
"Hotels" table has the following columns: INDEX, NAME, AREA
Sample data :
(1,'hotel bla", 4)
"Area" tbale has the following columns: INDEX, NAME, CODE
Sample data :
(4,'TEL-AVIV','TLV')
This means the "hotel bla" is in tel aviv
I need to create a list of hotels that have the same name and area ("duplicate hotels")
for examaple:
Hotels has 3 recoreds:
1,'hotel a',1
2,'hotel a',1
3,'hotel b',2
4,'hotel b',2
5,'hotel c',1
Area has 2 records
1,'tel-aviv','TLV'
2,'haifa','HAF'
The output should be something like:
'hotel a','1'
'hotel b,'2'
update:
If I have only 1 record of the hotel in the table I don't want to return it.
All your answer will also return it
see the fixed examaple

You need to use the distinct keyword to get only distinct records.
select distinct name, area from hotels
This would give you only unique rows if you only need name and area columns.
But if you need the ID column as well you can try
select id, max(name) name, max(area) as area from hotels group by name,area;
or
select id, first(name) name, first(area) as area from hotels group by name,area;
depending on what DBMS you're using

select name, area
from hotels
group by name, area
having count(1) > 1;
This will give you hotels having same name and belonging to same area.
Demo at sqlfiddle.

Related

How to delete records in BigQuery based on values in an array?

In Google BigQuery, I would like to delete a subset of records, based on the value of a specific column. It's a query that I need to run repeatedly and that I would like to run automatically.
The problem is that this specific column is of the form STRUCT<column_1 ARRAY (STRING), column_2 ARRAY (STRING), ... >, and I don't know how to use such a column in the where-clause when using the delete-command.
Here is basically what I am trying to do (this code does not work):
DELETE
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
The error that I'm getting is: Syntax error: Expected end of input but got keyword LEFT at [3:1]
If I replace the DELETE with SELECT *, it does work:
SELECT *
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
Does somebody know how to use such a column to delete a subset of records?
EDIT:
Here is some code to create a reproducible example with some silly data (fill in your own dataset and table name in all queries):
Suppose you want to delete all rows where category.type contains the value 'food'.
1 - create a table:
CREATE TABLE <DATASET>.<TABLE_NAME>
(
article STRING,
category STRUCT<
color STRING,
type ARRAY<STRING>
>
);
2 - Insert data into the new table:
INSERT <DATASET>.<TABLE_NAME>
SELECT "apple" AS article, STRUCT('red' AS color, ['fruit','food'] as type) AS category
UNION ALL
SELECT "cabbage" AS article, STRUCT('blue' AS color, ['vegetable', 'food'] as type) AS category
UNION ALL
SELECT "book" AS article, STRUCT('red' AS color, ['object'] as type) AS category
UNION ALL
SELECT "dog" AS article, STRUCT('green' AS color, ['animal', 'pet'] as type) AS category;
3 - Show that select works (return all rows where category.type contains the value 'food'; these are the rows I want to delete):
SELECT *
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Initial Result
4 - My attempt at deleting rows where category.type contains 'food' does not work:
DELETE
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Syntax error: Unexpected keyword LEFT at [3:1]
Desired Result

This is the code I used to delete the desired records (the records where category.type contains the value 'food'.)
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS(SELECT 1 FROM UNNEST(t1.category.type) t2 WHERE t2 = 'food')
The embarrasing thing is that I've seen these kind of answers on similar questions (for example on update-queries). But I come from Oracle-SQL and I think that there you are required to connect your subquery with your main query in the WHERE-statement of the subquery (ie. connect t1 with t2), so I didn't understand these answers. That's why I posted this question.
However, I learned that BigQuery automatically understands how to connect table t1 and 'table' t2; you don't have to explicitly connect them.
Now it is possible to still do this (perhaps even recommended?):
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS (SELECT 1 FROM <DATASET>.<TABLE_NAME> t2 LEFT JOIN UNNEST(t2.category.type) AS type WHERE type = 'food' AND t1.article=t2.article)
but a second difficulty for me was that my ID in my actual data is somehow hidden in an array>struct-construction, so I got stuck connecting t1 & t2. Fortunately this is not always an absolute necessity.

Since you did not provide any sample data I am going to explain using some dummy data. In case you add your sample data, I can update the answer.
Firstly,according to your description, you have only a STRUCT not an Array[Struct <col_1, col_2>].For this reason, you do not need to use UNNEST to access the values within the data. Below is an example how to access particular data within a STRUCT.
WITH data AS (
SELECT 1 AS id, STRUCT("Alex" AS name, 30 AS age, "NYC" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Leo" AS name, 18 AS age, "Sydney" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Robert" AS name, 25 AS age, "Paris" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Mary" AS name, 28 AS age, "London" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Ralph" AS name, 45 AS age, "London" AS city) AS info
)
SELECT * FROM data
WHERE info.city = "London"
Notice that the STRUCT is named info and the data we accessed is city and used it in the WHERE clause.
Now, in order to delete the rows that contains an specific value within the STRUCT , in your case I assume it would be your_struct.column_1, you can use DELETE or MERGE and DELETE. I have saved the above data in a table to execute the below examples, which have the same output,
First method: DELETE
DELETE FROM `project.dataset.table`
WHERE info.city = "Sydney"
Second method: MERGE and DELETE
MERGE `project.dataset.table` a
USING (SELECT * from `project.dataset.table` WHERE info.city ="London") b
ON a.info.city =b.info.city
WHEN matched and b.id=1 then
Delete
And the output for both queries,
Row id info.name info.age info.city
1 1 Alex 30 NYC
2 1 Robert 25 Paris
3 1 Ralph 45 London
4 1 Mary 28 London
As you can see the row where info.city = "Sydney" was deleted in both cases.
It is important to point out that your data is excluded from your source table. Therefore, you should be careful.
Note: Since you want to run this process everyday, you could use Schedule Query within BigQuery Console, appending or overwriting the results after each run. Also, it is a good practice not deleting data from your source table. Thus, consider creating a new table from your source table without the rows you do not desire.

How to change Select List values based on other Select List value using oracle apex?

I have two select lists in my page, Region and Country.
Country should display values based on the value selected from the Region list. But the problem is, it's displaying blank in the Country list, even after selecting a value from the Region list.
Region table Country table
Id Group_Id
Name Name
Id
For Region, I have written the query as follows:
Select Name d,Id r from Regions
The query for Country as:
Select Name d,Id r from Country where Group_Id = :P3_REGION
How can I get the correct values?

P3_REGION should be entered as Cascading LOV Parent Item(s) of the Country SelectList.

sql find average from counting?

So I have the query:
select states, count(numberofcounty)
from countytable
group by states
order by count(distinct numberofcounty) ASC;
which return a table of 2 columns: number of state and number of county from least to most.
How can I get the avg of how many county are there per state in a single real number?
The table structure is:
create table numberofcounty(
numberofcounty text,
states text references states(states),
);
create table states (
states text primary key,
name text,
admitted_to_union text
);

This might be it?
countytable:
state|county
a aa
a ab
a ac
b ba
SELECT AVG(counties) FROM (SELECT state, COUNT(county) as counties FROM countytable GROUP BY state) as temp
result: 2

You can use your current query as a subquery to one that gets the average of the number of counties per state
Select avg (c) average_counties
From (select count(numberofcounty) c
from countytable
group by states) s

If I am not missing anything, this should essentially give you the same result as the double grouping method in other answers:
SELECT COUNT(numberofcounty) / COUNT(DISTINCT states)
FROM countytable
;

Pick One Row per Unique ID from duplicate records

In Ms.Access 2010, I have a similar query table like one below where its displaying duplicate records. Problem is that even though I have unique ID's, one of the field has different data than other row since I have combined two seperate tables in this query. I just want to display one row per ID and eliminate other rows. It doesn't matter which row I pick. See below:
ID - NAME - FAVCOLOR
1242 - John - Blue
1242 - John - Red
1378 - Mary - Green
I want to just pick any of the the row with same ID. It doesn't matter which row I pick as long as I am displaying one row per ID is what matters.
ID - NAME - FAVCOLOR
1242 - John - Red
1378 - Mary - Green

Use the SQL from your current query as a subquery and then GROUP BY ID and NAME. You can retrieve the minimum FAVCOLOR since you want only one and don't care which.
SELECT sub.ID, sub.NAME, Min(sub.FAVCOLOR)
FROM
(
SELECT ID, [NAME], FAVCOLOR
FROM TABLE1
UNION ALL
SELECT ID, [NAME], FAVCOLOR
FROM TABLE2
) AS sub
GROUP BY sub.ID, sub.NAME;
Note NAME is a reserved word. Bracket that name or prefix it with the table name or alias to avoid confusing the db engine.

Try selecting union without the ALL parameter and see if you get the desired result.
Your new query would look like
"SELECT ID, NAME, FAVCOLOR FROM TABLE1; UNION SELECT ID, NAME, FAVCOLOR FROM TABLE2;"

If you just want the IDs, why is the color in the query? Maybe I'm missing something.
The only thing I could suggest is to use some aggregate function (min, max) to get one color.
Select
id,
name,
max(favcolor)
from (
(select * from table1) t1
union (select * from table2) t2 )t
group by
id,
name

How can I write this SQL SELECT query for this table?

I have this situation in a certain table:
id | name
1 'Test'
2 'Test'
3 'Test'
How can I make a query to SELECT by distinct the name? I also need the ID column, even if I get the first occurrence of the element, e.g. "if the name column repeats, give me the first record with this repetition."

select name, MIN(ID)
from aCertainTable
group by name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to "filter" duplicate item in select statement - sql

select name, area from hotels group by name, area having count(1) > 1; This will give you hotels having same name and belonging to same area. Demo at sqlfiddle.

Related

How to delete records in BigQuery based on values in an array?

How to change Select List values based on other Select List value using oracle apex?

sql find average from counting?

Pick One Row per Unique ID from duplicate records

How can I write this SQL SELECT query for this table?

Categories

Resources