How to get unique records from 3 tables - sql

I have 3 tables and I am trying to get unique results from all 3 tables (including other columns from each table).
I have tried union approach but that approach only works when I have single column selected from each table.
As soon as I want another corresponding column value from each table, I don't get unique values for the field I am trying to get.
Sample Database and query available here as well: http://www.sqlfiddle.com/#!18/1b9a6/10
Here is the example tables i have created.
CREATE TABLE TABLEA
(
id int,
city varchar(6)
);
INSERT INTO TABLEA ([id], [city])
VALUES
(1, 'A'),
(2, 'B'),
(3, 'C');
CREATE TABLE TABLEB
(
id int,
city varchar(6)
);
INSERT INTO TABLEB ([id], [city])
VALUES
(1, 'B'),
(2, 'C'),
(3, 'D');
CREATE TABLE TABLEC
(
id int,
city varchar(6)
);
INSERT INTO TABLEC ([id], [city])
VALUES
(1, 'C'),
(2, 'D'),
(2, 'E');
Desired result:
A,B,C,D,E
Unique city from all 3 table combined. By unique, I am referring to DISTINCT city from the combination of all 3 tables. Yes, the id is different for common values between tables but it doesn't matter in my use-case if id is coming from table A, B OR C, as long as I am getting DISTINCT (aka UNIQUE) city across all 3 tables.
I tried this query but no luck (city B is missing in the output):
SELECT city, id
FROM
(SELECT city, id
FROM TABLEA
WHERE city NOT IN (SELECT city FROM TABLEB
UNION
SELECT city FROM TABLEC)
UNION
SELECT city, id
FROM TABLEB
WHERE city NOT IN (SELECT city FROM TABLEA
UNION
SELECT city FROM TABLEC)
UNION
SELECT city, id
FROM TABLEC) AS mytable

try this. As this should give you distinct city with there first appear id:
select distinct min(id) over(partition by city) id, city from (
select * from TABLEA
union all
select * from TABLEB
union all
select * from TABLEC ) uni

You got the right idea, just wrap the UNION results in a subquery/temp table and then apply the DISTINCT
WITH TABLEE AS (
SELECT city, id FROM TABLEA
UNION
SELECT city, id FROM TABLEB
UNION
SELECT city, id FROM TABLEC
)
SELECT DISTINCT city
FROM TABLEE

Related

How to filter a table based on queried ids from another table in Snowflake

I'm trying to filter a table based on the queried result from another table.
create temporary table test_table (id number, col_a varchar);
insert into test_table values
(1, 'a'),
(2, 'b'),
(3, 'aa'),
(4, 'a'),
(6, 'bb'),
(7, 'a'),
(8, 'c');
create temporary table test_table_2 (id number, col varchar);
insert into test_table_2 values
(1, 'aa'),
(2, 'bb'),
(3, 'cc'),
(4, 'dd'),
(6, 'ee'),
(7, 'ff'),
(8, 'gg');
Here I want to find out all the id's in test_table with value "a" in col_a, and then I want to filter for rows with one of these id's in test_table_2. I've tried this below way, but got an error: SQL compilation error: syntax error line 6 at position 39 unexpected 'cte'.
with cte as
(
select id from test_table
where col_a = 'a'
)
select * from test_table_2 where id in cte;
This approach below does work, but with large tables, it tends to be very slow. Is there a better more efficient way to scale to very large tables?
with cte as
(
select id from test_table
where col_a = 'a'
)
select t2.* from test_table_2 t2 join cte on t2.id=cte.id;
I would express this using exists logic:
SELECT id
FROM test_table_2 t2
WHERE EXISTS (
SELECT 1
FROM test_table t1
WHERE t2.id = t1.id AND
t1.col_a = 'a'
);
This has one advantage over a join in that Snowflake can stop scanning the test_table_2 table as soon as it finds a match.
your first error can be fixed as below. Joins are usually better suited for lookups compared to exists or in clause if you have a large table.
with cte as
(
select id from test_table
where col_a = 'a'
)
select * from test_table_2 where id in (select distinct id from cte);

Union two queries ordered by newid

I have a table that stores employees (id, name, and gender). I need to randomly get two men and two women.
CREATE TABLE employees
(
id INT,
name VARCHAR (10),
gender VARCHAR (1),
);
INSERT INTO employees VALUES (1, 'Mary', 'F');
INSERT INTO employees VALUES (2, 'Jake', 'M');
INSERT INTO employees VALUES (3, 'Ryan', 'M');
INSERT INTO employees VALUES (4, 'Lola', 'F');
INSERT INTO employees VALUES (5, 'Dina', 'F');
INSERT INTO employees VALUES (6, 'Paul', 'M');
INSERT INTO employees VALUES (7, 'Tina', 'F');
INSERT INTO employees VALUES (8, 'John', 'M');
My attempt is the following:
SELECT TOP 2 *
FROM employees
WHERE gender = 'F'
ORDER BY NEWID()
UNION
SELECT TOP 2 *
FROM employees
WHERE gender = 'M'
ORDER BY NEWID()
But it doesn't work since I can't put two order by in the same query.
Why not just use row_number()? One method without a subquery is:
SELECT TOP (4) WITH TIES e.*
FROM employees
WHERE gender IN ('M', 'F')
ORDER BY ROW_NUMBER() OVER (PARTITION BY gender ORDER BY newid());
This is slightly less performant than using ROW_NUMBER() in a subquery.
Or, a fun method would use APPLY:
select e.*
from (values ('M'), ('F')) v(gender) cross apply
(select top (2) e.*
from employees e
where e.gender = v.gender
order by newid()
) e;
You cannot put an ORDER BY in the combinable query (the first one) of the UNION. However, you can use ORDER BY if you convert each one into a table expression.
For example:
select *
from (
SELECT TOP 2 *
FROM employees
WHERE gender = 'F'
ORDER BY newid()
) x
UNION ALL
select *
from (
SELECT TOP 2 *
FROM employees
WHERE gender = 'M'
ORDER BY newid()
) y
Result:
id name gender
--- ----- ------
5 Dina F
4 Lola F
2 Jake M
3 Ryan M
See running example at SQL Fiddle.

How do I insert values into a table with a column-wise uniqueness check?

Create table
CREATE TABLE `my_table`
(
id Uint64,
name String,
PRIMARY KEY (id)
);
Insert values
INSERT INTO `my_table`
( id, name )
VALUES (1, 'name1'),
(2, 'name2'),
(3, 'name3');
#
id
name
0
1
"name1"
1
2
"name2"
2
3
"name3"
How add VALUES (4, 'name1') and skip add VALUES (3, 'name1')?
The available syntax is described here: https://cloud.yandex.com/docs/ydb/yql/reference/syntax/insert_into
From the documentation link that you provided in the comments I see that the databse that you use does not support a statement equivalent to INSERT OR IGNORE... to suppress errors if a unique constraint is violated.
As an alternative you can use INSERT ... SELECT.
If your database supports EXISTS:
INSERT INTO my_table
SELECT 3, 'name1'
WHERE NOT EXISTS (SELECT * FROM my_table WHERE id = 3);
Or you can use a LEFT JOIN:
INSERT INTO my_table
SELECT t.id, t.name
FROM (SELECT 3 AS id, 'name1' AS name) AS t
LEFT JOIN my_table AS m
ON m.id = t.id
WHERE m.id IS NULL;

Distinct with where condition

I have table as below:
I want to perform distinct on city but if city is duplicate then return row which having maximum ref_id. Result should contains all the columns.
Test data:
DECLARE #t_temp TABLE (ID smallint,
name varchar(10),
city varchar(10),
ref_id smallint);
INSERT INTO #t_temp
VALUES
(1, 'xyz', 'a', 101),
(2, 'pqr', 'a', 102),
(3, 'ijk', 'a', 103),
(4, 'abc', 'b', 104),
(5, 'ahg', 'c', 10);
Actual query:
SELECT ID
, name
, city
, ref_id
FROM (SELECT *
, ROW_NUMBER() OVER (PARTITION BY city ORDER BY ref_id DESC) Ranking
FROM #t_temp) base
WHERE Ranking = 1;
Result:
ID name city ref_id
------ ---------- ---------- ------
3 ijk a 103
4 abc b 104
5 ahg c 10
Basicly, what I'm doing is assigning a 'ranking' to all your records grouped by city and ordered by ref_id, and then retaining only the "number one" record. This is an alternative to what Rahul proposed, which is also a valid solution to your problem. The only difference between the two is that in Rahul's example he'll return multiple records if multiple exist with the same city and ref_id (considering it being the highest one), where the solution above will only return a single record. To reach the same behavior as Rahul, you can change the ROW_NUMBER() to RANK() or DENSE_RANK().
Try this:
Select tb1.* from Table1 as tb1
inner join (
Select city, Max(ref_id) as 'ref_id' from Table1 group by city
) as tb2
on tb1.city = tb2.city and tb1.ref_id = tb2.ref_id

PostgreSQL: How to insert multiple values without multiple selects?

I have the problem that I need to insert into a table with 2 entries, where one value is constant but fetched from another table, and the other one is the actual content that changes.
Currently I have something like
INSERT INTO table (id, content) VALUES
((SELECT id FROM customers WHERE name = 'Smith'), 1),
((SELECT id FROM customers WHERE name = 'Smith'), 2),
((SELECT id FROM customers WHERE name = 'Smith'), 5),
...
As this is super ugly, how can I do the above in Postgres without the constant SELECT repetition?
Yet another solution:
insert into table (id, content)
select id, unnest(array[1, 2, 5]) from customers where name = 'Smith';
You can cross join the result of the select with your values:
INSERT INTO table (id, content)
select c.id, d.nr
from (
select id
from customers
where name = 'Smith'
) as c
cross join (values (1), (2), (5) ) as d (nr);
This assumes that the name is unique (but so does your original solution).
Well, I believe you can do something like this:
DECLARE
id customers.id%TYPE;
BEGIN
select c.id into id FROM customers c WHERE name = 'Smith';
INSERT INTO table (id, content) VALUES
(id, 1),
(id, 2),
....
END;