Adding values to a column in SQL(Snowflake) from another table - sql

I have two tables A,B
Table A:
uid category
1 a
1 b
1 c
2 b
2 d
Table B:
category
d
e
Table A contains user id and category
Table B contains top 2 most categories selected by the user
How can I add categories from table B to category column in table A but only the distinct value.
Final result
uid category
1 a
1 b
1 c
1 d
1 e
2 b
2 d
2 e

It is possible to generate missing rows by perfroming CROSS JOIN of distinct UID from tableA and categories from tableB:
WITH cte AS (
SELECT A.UID, B.CATEGORY
FROM (SELECT DISTINCT UID FROM tableA) AS A
CROSS JOIN tableB AS B
)
SELECT A.UID, A.CATEGORY
FROM tableA AS A
UNION ALL
SELECT C.UID, C.CATEGORY
FROM cte AS c
WHERE (c.UID, c.category) NOT IN (SELECT A.UID, A.CATEGORY
FROM tableA AS A)
ORDER BY 1,2;
Sample input:
CREATE OR REPLACE TABLE tableA(uid INT, category TEXT)
AS
SELECT 1,'a' UNION ALL
SELECT 1,'b' UNION ALL
SELECT 1,'c' UNION ALL
SELECT 2,'b' UNION ALL
SELECT 2,'d';
CREATE OR REPLACE TABLE tableB(category TEXT)
AS
SELECT 'd' UNION ALL SELECT 'e';
Output:

Let union take care of duplicates
select uid, category
from t1
union
select uid, category
from (select distinct uid from t1) t1 cross join t2
order by uid, category

Related

SQL recursive query to find all records in table with two ID columns, where ID can exist in either column

I have a table that basically looks like:
pkid
origLegCallIdentifier
destLegIdentifier
1
A
B
2
A
C
3
B
D
4
E
A
5
F
G
6
D
H
7
I
J
I want to be able to select a PKID, and then find all associated rows based on whether origLegCallIdentifier or destLegIdentifier recursively, so the end result if I selected PKID 6 would be:
pkid
origLegCallIdentifier
destLegIdentifier
1
A
B
2
A
C
3
B
D
4
E
A
6
D
H
I've written this following query which achieves what I want, but its very slow to run:
select pkid, origLegCallIdentifier, destLegIdentifier
into #tempA
from RawCDR2
where pkid = '185b76e8-f8b2-4fde-a393-24cad6800bc6'
select distinct id
into #tempB
from
(select origLegCallIdentifier as id from #tempA
union
select destLegIdentifier as id from #tempA) as t
while exists (select pkid, origLegCallIdentifier, destLegIdentifier
from rawcdr2 r
where (origLegCallIdentifier in (select id as origLegCallIdentifier from #tempB)
or destLegIdentifier in (select id as destLegIdentifier from #tempB))
and pkid not in (select pkid from #tempA)
)
begin
insert into #tempA (pkid, origLegCallIdentifier, destLegIdentifier)
select pkid, origLegCallIdentifier, destLegIdentifier
from rawcdr2 r
where (origLegCallIdentifier in (select id as origLegCallIdentifier from #tempB)
or destLegIdentifier in (select id as destLegIdentifier from #tempB))
insert into #tempB (id)
select distinct id
from
(select origLegCallIdentifier as id from #tempA
union
select destLegIdentifier as id from #tempA) as t
end
select distinct pkid
into #tempC
from RawCDR2
where origLegCallIdentifier in (select distinct id from #tempB)
or destLegIdentifier in (select id from #tempB)
select
rawcdr2.pkid,
origLegCallIdentifier,
destLegIdentifier,
from
#tempC
left join
RawCDR2 on rawcdr2.pkid = #tempC.pkid
drop table #tempA
drop table #tempB
drop table #tempC
I'm sure the performance can be increased using DTE but don't know how.
Any idea how to speed this query up?

Fill a select with null when join isn't possible

I'm trying to do a select in n tables and show a few columns of each, but sometimes I can't match some columns and instead of getting a line with "null" the entire line is omitted.
For example:
table_a
id
...
1
2
3
table_b
id
name
...
1
a1
...
2
b2
...
3
c3
...
table_c
name
...
a1
...
And then I do the following select:
select
a.id,
c.name
from
table_a a,
table_b b,
table_c
where
( 1 = 1 )
and a.id = b.id
and b.name = c.name
I'm geting:
id
name
...
1
a1
...
I'm looking for:
id
name
...
1
a1
...
2
null
...
3
null
...
How do I do that? I checked a few answers around including this one but I didn't get how to solve it.
You can use an OUTER JOIN:
SELECT a.id,
c.name
FROM table_a a
LEFT OUTER JOIN table_b b
ON (a.id = b.id)
LEFT OUTER JOIN table_c c
ON (b.name = c.name)
or, depending on precedence of the joins:
SELECT a.id,
c.name
FROM table_a a
LEFT OUTER JOIN (
table_b b
INNER JOIN table_c c
ON (b.name = c.name)
)
ON (a.id = b.id)
Which, for the sample data:
CREATE TABLE table_a (id) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL;
CREATE TABLE table_b (id, name) AS
SELECT 1, 'a1' FROM DUAL UNION ALL
SELECT 2, 'b1' FROM DUAL UNION ALL
SELECT 3, 'c1' FROM DUAL;
CREATE TABLE table_c (name) AS
SELECT 'a1' FROM DUAL;
Would both output:
ID
NAME
1
a1
2
null
3
null
fiddle
You should use a left join, not sure on oracle specifically but it would look something like:
select
a.id,
c.name
from
table_a a
LEFT JOIN table_b b ON (a.id = b.id)
LEFT JOIN table_c c ON (b.name = c.name)

Multiple Unioned Self-joins in BigQuery

I have a table with id, name and parent_id where parent_id is a parent hierarchy relating to id, see below.
id
name
parent_id
0
A
null
1
B
0
2
C
1
3
D
1
4
E
2
I'm trying to create a nicer looking table with each id and its parent_id, including multiple levels up in the hierarchy. I use UNION and self-join to accomplish this, but I have a feeling there should be a nicer way of querying it with BigQuery's Standard SQL.
In the query below I go two levels, but you can imagine I want to go 5-6 levels.
WITH T1 as (
select 0 as id, 'A' as name, null as parent_id union all
select 1 as id, 'B' as name, 0 as parent_id union all
select 2 as id, 'C' as name, 1 as parent_id union all
select 3 as id, 'D' as name, 1 as parent_id union all
select 4 as id, 'E' as name, 2 as parent_id
)
SELECT
a.id as id,
a.name as req_name,
FROM T1 as a
UNION ALL
SELECT
a.id as id,
b.name as req_name,
FROM T1 as a
JOIN T1 as b ON a.parent_id = b.id
UNION ALL
SELECT
a.id as id,
c.name as req_name,
FROM T1 as a
JOIN T1 as b on a.parent_id = b.id
JOIN T1 as c on b.parent_id = c.id
resulting in the table
id
req_name
0
A
1
B
2
C
3
D
4
E
2
A
3
A
4
B
1
A
2
B
3
B
4
C
I would be thankful for any insights!
BigQuery does not (yet) support recursive or hierarchical queries. So your approach is actually fine. You can condense it, if you like, using left joins:
with t as (
select 0 as id, 'A' as name, null as parent_id union all
select 1 as id, 'B' as name, 0 as parent_id union all
select 2 as id, 'C' as name, 1 as parent_id union all
select 3 as id, 'D' as name, 1 as parent_id union all
select 4 as id, 'E' as name, 2 as parent_id
)
select distinct id, t1.name
from t t1 left join
t t2
on t2.parent_id = t1.id left join
t t3
on t3.parent_id = t2.id cross join
unnest(array[t1.id, t2.id, t3.id]) id
where id is not null;
You still need explicit joins to the maximum depth of the data.
The other alternative is to use a looping construct, which is available in the scripting language.

Count on UNION in Oracle

I have 3 table and I need to get the details from 2 tables where the count of UNION is greater than 1.But need to apply certain conditions as well
Table A
id entity_id name category
1 45 abcd win_1
2 46 efgh win_2
3 47 efgh1 win_2
4 48 dfgh win_5
5 49 adfgh win_4
Table B
id product_id name parent_id
1 P123 asdf win_1
2 P234 adfgh win_4
Table 3 category_list
id cat_id name
1 win_1 Households
2 win_2 Outdoors
3 win_3 Mixed
4 win_4 Omni
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
The result which I require is
p_id name cat_id
45 abcd win_1
P123 asdf win_1
46 efgh win_2
47 efgh1 win_2
win_5 is excluded as the count is one and win_4 should be excluded as name in Table A nd B is same.
I have run out of Ideas as i am relatively new to Oracle and DB.Any help is appreciated.
I think you can use exists to ensure that the cat_id is present in both tables
select entity_id as p_id, name, category as cat_id
from table_a a
where exists (select null from table_b where a.category = table_b.parent_id)
union
select entity_id, name, parent_id
from table_b b
where exists (select null from table_a where b.parent_id = table_a.category)
I believe you are looking for something like this -
Select T2.*
from
(Select category
from
(Select name, category from TableA
Union all
Select name, parent_id as category from TableB) t
group by category
having count(distinct name) > 1) T1
Join
(Select entity_id as Pid, name, category from TableA
Union
Select product_id as Pid, name, parent_id as category from TableB) T2
ON T1.category = T2.category;
Would you try this code.
First CTE (Common Table Expression) "list_union" gets the records for each table those have different names then makes the union. with the second CTE "list_cnt" counts the categories and finally gets the result cnt>1 with the last select statement as you pictured.
With
list_union AS (
SELECT
id,
----------
TO_CHAR(entity_id) entity_id,
----------
name,
category
FROM table_A a
WHERE NOT EXISTS(SELECT 1 FROM table_B b WHERE a.name=b.name)
----------
UNION ALL
----------
SELECT
id,
product_id,
name,
parent_id
FROM table_B b
WHERE NOT EXISTS(SELECT 1 FROM table_A a WHERE a.name=b.name)
)
,list_cnt AS (
SELECT
l.*,
----------
COUNT(*) over (PARTITION BY category) cnt
----------
FROM list_union l
)
SELECT
entity_id AS p_id,
name,
category AS cat_id
FROM list_cnt
WHERE cnt>1
ORDER BY cat_id ASC, p_id ASC
;
Just use a union all and window functions:
select ab.*
from (select ab.*,
count(distinct name) over (partition by category) as cnt
from ((select a.* from a
) union all
(select b.* from b
)
) ab
) ab
where cnt > 1;
Although you describe the problem as:
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
You seem to just want cat_ids that have different names across the two tables. Your sample data includes cat_id = 'win_2', which is not even in the second table.

Getting all the values in one query that aren't in another with a group by

Given that I am using Redshift, how would I get the counts for a query that asks:
Given table A and table B, give me all the count of values in Table A for that grouping that aren't in table B;
So if table A and B look like:
Table A
Id | Value
==========
1 | "A"
1 | "B"
2 | "C"
And table B:
Id | Value
==========
1 | "A"
1 | "D"
2 | "C"
I would want:
Id | Count
==========
1 | 1
2 | 0
You can use left join and group by:
select a.id, sum( (b.id is null)::int )
from a left join
b
on a.id = b.id and a.value = b.value
group by a.id;
Use except and subquery
with a as
(
select 1 as id, 'A' as v
union all
select 1,'B'
union all
select 2,'C'
),b as
(
select 1 as id, 'A' as v
union all
select 1,'D'
union all
select 2,'C'
), c as
(
select id,v from a except select id,v from b
)
select id,sum ( (select count(*) from c where c.id=a.id and c.v=a.v))
from a group by id
output
id cnt
1 1
2 0
online demo which will work in redshift