Hierarchical query with some joins - sql

I am struggling to write a performing query which would consist of the data from one sub-select, and hierarchically retrieved data from another table based on the rows from that first sub-select.
So, I have some data retrieved from multiple tables with joins, which finally boils down to the following:
CREATE TABLE TBL1 (UUID, MiscData, KeyToLookup, ConditionClause ) AS
SELECT 13, 'ATM', 12345, null FROM DUAL UNION ALL
SELECT 447, 'Balance Inquiry', 67890, 'BALANCE_INQUIRY_FEE' FROM DUAL UNION ALL
SELECT 789, 'Credit', 22321, 'CREDIT_FEE' FROM DUAL;
Now, I have another table which stores the hierarchical structure of fees:
CREATE TABLE TBL2 ( TariffDomainID, FeeType, UpperTariffDomainID, ID ) AS
SELECT 1543, 'WHATEVER_FEE', 154, 1 FROM DUAL UNION ALL
SELECT 1543, 'BALANCE_INQUIRY_FEE', 154, 2 FROM DUAL UNION ALL
SELECT 154, 'SMTHELSE_FEE', 15, 3 FROM DUAL UNION ALL
SELECT 154, 'CREDIT_FEE', 15, 4 FROM DUAL UNION ALL
SELECT 15, 'BALANCE_INQUIRY_FEE', null, 5 FROM DUAL;
And there is a way to link the first selection to the lowest row in hierarchy of the second table, there are few joins but finally it's like this:
CREATE TABLE TBL3 ( ID, FirstTblKey, SecondTblKey ) AS
SELECT 1, 67890, 1543 FROM DUAL UNION ALL
SELECT 2, 22321, 1543 FROM DUAL;
The important point is that it's not guaranteed there will be a row with this KeyToLookup directly in the second table, as directed by the TBL3.
E.g. in the example above:
row TBL1.UUID=789 is linked via TBL3 to TBL2 row with TariffDomainID=1543,
but there is no row in TBL2 with TariffDomainID=1543 and FeeType=CREDIT_FEE;
however TBL2 contains a link to the same table but upper level, UpperTariffDomainID=154,
and there is a row in TBL2 with TariffDomainID=154 and FeeType=CREDIT_FEE.
In the end I need to connect the info from TBL1 with the all occurrences of this key in TBL2 hierarchically, numerated by depth of hierarchy.
So I expect to get this:
| UUID | MiscData | KeyToLookup | ConditionClause | TariffDomainIDWithPresence | Depth |
|------|-----------------|-------------|---------------------|----------------------------|-------|
| 13 | ATM | 12345 | null | null | null |
| 447 | Balance Inquiry | 67890 | BALANCE_INQUIRY_FEE | 1543 | 1 |
| 447 | Balance Inquiry | 67890 | BALANCE_INQUIRY_FEE | 15 | 3 |
| 789 | Credit | 22321 | CREDIT_FEE | 154 | 2 |
Could anyone please teach me how to make such a hierarchical query?

You can use a hierarchical query joined to the other two tables:
SELECT DISTINCT
t1.uuid,
t1.miscdata,
t1.keytolookup,
t1.conditionclause,
t2.tariffdomainid,
t2.depth
FROM tbl1 t1
LEFT OUTER JOIN tbl3 t3
ON ( t1.keytolookup = t3.firsttblkey )
OUTER APPLY (
SELECT tariffdomainid,
LEVEL AS depth
FROM tbl2 t2
WHERE t2.tariffdomainid = t3.secondtblkey
START WITH
t2.feetype = t1.conditionclause
CONNECT BY
PRIOR TariffDomainID = UpperTariffDomainID
) t2
ORDER BY
uuid,
depth
Which, for the sample data, outputs:
UUID | MISCDATA | KEYTOLOOKUP | CONDITIONCLAUSE | TARIFFDOMAINID | DEPTH
---: | :-------------- | ----------: | :------------------ | -------------: | ----:
13 | ATM | 12345 | null | null | null
447 | Balance Inquiry | 67890 | BALANCE_INQUIRY_FEE | 1543 | 1
447 | Balance Inquiry | 67890 | BALANCE_INQUIRY_FEE | 1543 | 3
789 | Credit | 22321 | CREDIT_FEE | 1543 | 2
(Note: you need the DISTINCT as there are multiple 1543 and 154 entries in TBL2 so the hierarchical query can take multiple paths to get from the start to the end condition. If your actual data does not have these duplicates then you should be able to remove the DISTINCT clause.)
db<>fiddle here

Related

All records from first table and extra records from second table

I have 2 tables where i want to take all records from 1st table and extra records from 2nd table.
Table A
+-----+---------+---------+
| ID | NAME | TASK |
+-----+---------+---------+
| 101 | Alan | Prepare |
+-----+---------+---------+
| 102 | Fabien | Approve |
+-----+---------+---------+
| 103 | Christy | Plan |
+-----+---------+---------+
| 104 | David | Approve |
+-----+---------+---------+
| 105 | Eric | Set |
+-----+---------+---------+
Table B
+-----+---------+---------+
| ID | NAME | TASK |
+-----+---------+---------+
| 101 | Richy | Prepare |
+-----+---------+---------+
| 103 | Girish | Plan |
+-----+---------+---------+
| 106 | Fleming | Approve |
+-----+---------+---------+
| 107 | Ian | Set |
+-----+---------+---------+
Expected output
+-----+---------+---------+
| ID | NAME | TASK |
+-----+---------+---------+
| 101 | Alan | Prepare |
+-----+---------+---------+
| 102 | Fabien | Approve |
+-----+---------+---------+
| 103 | Christy | Plan |
+-----+---------+---------+
| 104 | David | Approve |
+-----+---------+---------+
| 105 | Eric | Set |
+-----+---------+---------+
| 106 | Fleming | Approve |
+-----+---------+---------+
| 107 | Ian | Set |
+-----+---------+---------+
I have tried using LEFT JOIN. But i'm getting only all from left table.
select * from A left join B on A.ID=B.ID and B.ID is NULL
I have also tried UNION and UNION ALL but since Name can be different in 2 tables i'm getting both records. One solution could be using NOT IN but it will be big for me as i refer big queries as table A & B here. I dont know what i'm missing. It should be very simple but it is not striking me now. Please help.
I'm thinking a union with the help of ROW_NUMBER and a computed column:
WITH cte AS (
SELECT ID, NAME, TASK, 1 AS SRC FROM TableA
UNION ALL
SELECT ID, NAME, TASK, 2 FROM TableB
),
cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SRC) rn
FROM cte
)
SELECT ID, NAME, TASK
FROM cte2
WHERE rn = 1;
The idea here is to build an intermediate table containing all records from both tables. We introduce a computed column which keeps track of the table source, and give A records a higher priority than B records. Using ROW_NUMBER allows us to select the A records over B records having the same ID.
Full outer join will work, as full out join will get all the matching and non matching records from both the tables
;with tablea as
(
select 101 as id, 'Alan' name, 'Prepare ' as task
union select 102 , 'Fabien' , 'Approve'
union select 103 , 'Christy' , 'Plan '
union select 104 , 'David' , 'Approve '
union select 105 , 'Eric' , 'Set ')
,tableb as (
select 101 as ID ,'Richy ' as NAME ,' Prepare ' as TASK
union select 103 ,'Girish ',' Plan '
union select 106 ,'Fleming',' Approve '
union select 107 ,'Ian ',' Set '
)
select isnull(a.id,b.id) as id, isnull(a.name,b.name) as name, isnull(a.task,b.TASK) from tablea a
full outer join tableb b on a.id = b.ID
Result

SQL blank rows between rows

I am trying to output a blank row after each row.
For example:
SELECT id,job,amount FROM table
+----+-----+--------+
| id | job | amount |
+----+-----+--------+
| 1 | 100 | 123 |
| 2 | 200 | 321 |
| 3 | 300 | 421 |
+----+-----+--------+
To the following:
+----+-----+--------+
| id | job | amount |
+----+-----+--------+
| 1 | 100 | 123 |
| | | |
| 2 | 200 | 321 |
| | | |
| 3 | 300 | 421 |
+----+-----+--------+
I know I can do similar things with a UNION like:
SELECT null AS id, null AS job, null AS amount
UNION
SELECT id,job,amount FROM table
Which would give me a blank row at the beginning, but for the life of me I can't figure out how to do it every second row. A nested SELECT/UNION? - Have tried but nothing seemed to work.
The DBMS is SQL Server 2016
This is an akward requirement, that would most probably better handled on application side. Here is, however, one way to do it:
select id, job, amount
from (
select id, job, amount, id order_by from mytable
union all
select null, null, null, id from mytable
) t
order by order_by, id desc
The trick is to add an additional column to the unioned query, that keeps track of the original id, and can be used to sort the records in the outer query. You can then use id desc as second sorting criteria, which will put null values in second position.
Demo on DB Fiddle:
with mytable as (
select 1 id, 100 job, 123 amount
union all select 2, 200, 321
union all select 3, 300, 421
)
select id, job, amount
from (
select id, job, amount, id order_by from mytable
union all
select null, null, null, id from mytable
) t
order by order_by, id desc;
id | job | amount
---: | ---: | -----:
1 | 100 | 123
null | null | null
2 | 200 | 321
null | null | null
3 | 300 | 421
null | null | null
In SQL Server, you can just use apply:
select v.id, v.job, v.amount
from t cross apply
(values (id, job, amount, id, 1),
(null, null, null, id, 2)
) v(id, job, amount, ord1, ord2)
order by ord1, ord2;

Add nested column to BigQuery table, joining on value of another nested column in standard SQL

I have a reasonably complex dataset being pulled into BigQuery table via an Airflow DAG which cannot easily be adjusted.
This job pulls data into a table with this format:
| Line_item_id | Device |
|--------------|----------------|
| 123 | 202; 5; 100 |
| 124 | 100; 2 |
| 135 | 504; 202; 2 |
At the moment, I am using this query (written in standard SQL within the BQ Web UI) to split the device ids into individual nested rows:
SELECT
Line_item_id,
ARRAY(SELECT AS STRUCT(SPLIT(RTRIM(Device,';'),'; '))) as Device,
Output:
| Line_item_id | Device |
|--------------|--------|
| 123 | 202 |
| | 203 |
| | 504 |
| 124 | 102 |
| | 2 |
| 135 | 102 |
The difficulty I am facing is I have a separate match table containing the device ids and their corresponding names. I need to add the device names to the above table, as nested values next to their corresponding ids.
The match table looks something like this (with many more rows):
| Device_id | Device_name |
|-----------|-------------|
| 202 | Smartphone |
| 203 | AppleTV |
| 504 | Laptop |
The ideal output I am looking for would be:
| Line_item_id | Device_id | Device_name |
|--------------|-----------|-------------|
| 123 | 202 | Android |
| | 203 | AppleTV |
| | 504 | Laptop |
| 124 | 102 | iphone |
| | 2 | Unknown |
| 135 | 102 | iphone |
If anybody knows how to achieve this I would be grateful for help.
EDIT:
Gordon's solution works perfectly, but in addition to this, if anybody wants to re-nest the data afterwards (so you effectively end up with the same table and additional nested rows), this was the query I finally ended up with:
select t.line_item_id, ARRAY_AGG(STRUCT(d as id, ot.name as name)) as device
from first_table t cross join
unnest(split(Device, '; ')) d join
match_table ot
on ot.id = d
GROUP BY line_item_id
You can move the parsing logic to the from clause and then join in what you want:
select *
from (select 124 as line_item_id, '203; 100; 6; 2' as device) t cross join
unnest(split(device, '; ')) d join
other_table ot
on ot.device = d;
Below is for BigQuery Standard SQL. No GROUP BY required ...
#standardSQL
SELECT * EXCEPT(Device),
ARRAY(
SELECT AS STRUCT Device_id AS id, Device_name AS name
FROM UNNEST(SPLIT(REPLACE(Device, ' ', ''), ';')) Device_id WITH OFFSET
JOIN `project.dataset.devices`
USING(Device_id)
ORDER BY OFFSET
) Device
FROM `project.dataset.items`
If to apply to sample data from your question - result is
FYI: I used below data to test
WITH `project.dataset.items` AS (
SELECT 123 Line_item_id, '202; 5; 100' Device UNION ALL
SELECT 124, '100; 2' UNION ALL
SELECT 135, '504; 202; 2'
), `project.dataset.devices` AS (
SELECT '202' Device_id, 'Smartphone' Device_name UNION ALL
SELECT '203', 'AppleTV' UNION ALL
SELECT '504', 'Laptop' UNION ALL
SELECT '5', 'abc' UNION ALL
SELECT '100', 'xyz' UNION ALL
SELECT '2', 'zzz'
)
What you need is to UNNEST the contents of your devices array, and then to roll it back up after joining with the devices metatable:
select
line_item_id,
array_agg(struct(device_id as device_id, device_name as device_name)) as devices
from (
select
d.line_item_id,
device_id,
n.device_name
from `mydataset.basetable` d, unnest(d.device_ids) as device_id
left join `mydataset.devices_table` n on n.device_id = device_id
)
group by line_item_id
Hope this helps.

Oracle: How can I pivot an EAV table with a dynamic cardinality for certain keys?

I have the following Entity–attribute–value (EAV) table in Oracle:
| ID | Key | Value |
|----|-------------|--------------|
| 1 | phone_num_1 | 111-111-1111 |
| 1 | phone_num_2 | 222-222-2222 |
| 1 | contact_1 | friend |
| 1 | contact_2 | family |
| 1 | first_name | mike |
| 1 | last_name | smith |
| 2 | phone_num_1 | 333-333-3333 |
| 2 | phone_num_2 | 444-444-4444 |
| 2 | contact_1 | family |
| 2 | contact_2 | friend |
| 2 | first_name | john |
| 2 | last_name | adams |
| 3 | phone_num_1 | 555-555-5555 |
| 3 | phone_num_2 | 666-666-6666 |
| 3 | phone_num_3 | 777-777-7777 |
| 3 | contact_1 | work |
| 3 | contact_2 | family |
| 3 | contact_3 | friend |
| 3 | first_name | mona |
| 3 | last_name | lisa |
Notice that some keys are indexed and therefore have an association with other indexed keys. For example, phone_num_1 is to be associated with contact_1.
Note: There is no hard limit to the number of indexes. There can be 10, 20, or even 50 phone_num_*, but it's guaranteed that for each phone_num_N, there is a corresponding contact_N
This is my desired result:
| ID | Phone_Num | Contact | First_Name | Last_Name |
|----|--------------|---------|------------|-----------|
| 1 | 111-111-1111 | friend | mike | smith |
| 1 | 222-222-2222 | family | mike | smith |
| 2 | 333-333-3333 | family | john | adams |
| 2 | 444-444-4444 | friend | john | adams |
| 3 | 555-555-5555 | work | mona | lisa |
| 3 | 666-666-6666 | family | mona | lisa |
| 3 | 777-777-7777 | friend | mona | lisa |
What have I tried/looked at:
I have looked into the pivot function of Oracle; however, I don't believe that can solve my problem since I don't have a fixed number of attributes that I want to pivot on.
I've looked at these posts:
SQL Query to return multiple key value pairs from a single table in one row
Pivot rows to columns without aggregate
Question:
Is what I'm tying to accomplish at all possible purely with SQL? If so, how can it be done? If not, please explain why.
Any help is much appreciated and here's the with table to help you get started:
with
table_1 ( id, key, value ) as (
select 1,'phone_num_1','111-111-1111' from dual union all
select 1,'phone_num_2','222-222-2222' from dual union all
select 1,'contact_1','friend' from dual union all
select 1,'contact_2','family' from dual union all
select 1,'first_name','mike' from dual union all
select 1,'last_name','smith' from dual union all
select 2,'phone_num_1','333-333-3333' from dual union all
select 2,'phone_num_2','444-444-4444' from dual union all
select 2,'contact_1','family' from dual union all
select 2,'contact_2','friend' from dual union all
select 2,'first_name','john' from dual union all
select 2,'last_name','adams' from dual union all
select 3,'phone_num_1','555-555-5555' from dual union all
select 3,'phone_num_2','666-666-6666' from dual union all
select 3,'phone_num_3','777-777-7777' from dual union all
select 3,'contact_1','work' from dual union all
select 3,'contact_2','family' from dual union all
select 3,'contact_3','friend' from dual union all
select 3,'first_name','mona' from dual union all
select 3,'last_name','lisa' from dual
)
select * from table_1;
This is not a dynamic pivot as you have a fixed set of keys - you just need to separate the enumeration of the keys from the keys themselves first.
You need to:
Separate the phone_num and contact key prefixes from the enumerated item; then
Pivot the common keys that have no enumeration so that they are associated with each enumerated key; and finally,
Pivot a second time to get the enumerated keys in a row together.
Oracle Setup:
CREATE TABLE table_1 ( id, key, value ) as
select 1,'phone_num_1','111-111-1111' from dual union all
select 1,'phone_num_2','222-222-2222' from dual union all
select 1,'contact_1','friend' from dual union all
select 1,'contact_2','family' from dual union all
select 1,'first_name','mike' from dual union all
select 1,'last_name','smith' from dual union all
select 2,'phone_num_1','333-333-3333' from dual union all
select 2,'phone_num_2','444-444-4444' from dual union all
select 2,'contact_1','family' from dual union all
select 2,'contact_2','friend' from dual union all
select 2,'first_name','john' from dual union all
select 2,'last_name','adams' from dual union all
select 3,'phone_num_1','555-555-5555' from dual union all
select 3,'phone_num_2','666-666-6666' from dual union all
select 3,'phone_num_3','777-777-7777' from dual union all
select 3,'contact_1','work' from dual union all
select 3,'contact_2','family' from dual union all
select 3,'contact_3','friend' from dual union all
select 3,'first_name','mona' from dual union all
select 3,'last_name','lisa' from dual
Query:
SELECT *
FROM (
SELECT id,
CASE
WHEN key LIKE 'phone_num_%' THEN 'phone_num'
WHEN key LIKE 'contact_%' THEN 'contact'
ELSE key
END AS key,
CASE
WHEN key LIKE 'phone_num_%'
OR key LIKE 'contact_%'
THEN TO_NUMBER( SUBSTR( key, INSTR( key, '_', -1 ) + 1 ) )
ELSE NULL
END AS item,
value,
MAX( CASE key WHEN 'first_name' THEN value END )
OVER ( PARTITION BY id ) AS first_name,
MAX( CASE key WHEN 'last_name' THEN value END )
OVER ( PARTITION BY id ) AS last_name
FROM table_1
)
PIVOT( MAX( value ) FOR key IN ( 'contact' AS contact, 'phone_num' AS phone_num ) )
WHERE item IS NOT NULL
ORDER BY id, item
Output:
ID | ITEM | FIRST_NAME | LAST_NAME | CONTACT | PHONE_NUM
-: | ---: | :--------- | :-------- | :------ | :-----------
1 | 1 | mike | smith | friend | 111-111-1111
1 | 2 | mike | smith | family | 222-222-2222
2 | 1 | john | adams | family | 333-333-3333
2 | 2 | john | adams | friend | 444-444-4444
3 | 1 | mona | lisa | work | 555-555-5555
3 | 2 | mona | lisa | family | 666-666-6666
3 | 3 | mona | lisa | friend | 777-777-7777
db<>fiddle here
If you can refactor the table then a simple improvement would be to add an extra column to hold the enumeration of the keys and use NULL when it is a value common to every enumeration:
CREATE TABLE table_1 ( id, key, line, value ) as
select 1, 'phone_num', 1, '111-111-1111' from dual union all
select 1, 'phone_num', 2, '222-222-2222' from dual union all
select 1, 'contact', 1, 'friend' from dual union all
select 1, 'contact', 2, 'family' from dual union all
select 1, 'first_name', NULL, 'mike' from dual union all
select 1, 'last_name', NULL, 'smith' from dual
Then your set of keys is always fixed and you do not need to extract the enumeration value from the key.
This is ugly, but I think does what you need
select t1.* , t2.value, t3.n, t3.f
from table_1 t1
inner join table_1 t2 on t1.id = t2.id and REPLACE(t1.key, 'phone_num_', '') = REPLACE(t2.key, 'contact_', '')
inner join (
select ID, min(case when Key = 'first_name' then Value end) as n, min(case when Key = 'last_name' then Value end) as f
from table_1
group by ID
) t3 on t1.id = t3.id
where
t1.Key not in('first_name','last_name')
SELECT id,
phone,
contact,
first_value(last) IGNORE NULLS over (partition BY id order by id DESC range BETWEEN CURRENT row AND unbounded following ) last_name,
first_value(FIRST) IGNORE NULLS over (partition BY id order by id DESC range BETWEEN CURRENT row AND unbounded following ) first_name
FROM
(SELECT id,
value,
row_number() over ( partition BY id,SUBSTR(KEY,1 ,instr(KEY,'',1)-1) order by KEY) rn,
SUBSTR(KEY,1 ,instr(KEY,'',1) -1) KEY
FROM table_1
) pivot ( MAX(value) FOR KEY IN ( 'phone' AS phone,'last' AS last,'first' AS FIRST,'contact' AS contact))
ORDER BY id;

How can I omit groups of records where the attribute is based on a 'like' clause across 2 different fields

I need a query to omit groups where the number 16 is present in both records and are present across different attributes within the group . Basically, if we have a 16 somewhere in attributes on different records, then we know what accounts for these groups, and no further analysis is needed on them. We would like to keep results where 16 only occurs in one record in either attribute, 16 occurs in neither, and records that have nulls in them but do not have the 16 in 2 records in different attributes.
Here is an example:
---------------------------------------------
| groupid | category | test_results |
---------------------------------------------
| 001 | red13tall | |
| 001 | | blue16small |
| 002 | green16small| |
| 002 | | blue16small |
| 003 | yellow3tall | |
| 003 | | green2giant |
| 004 | orange16tall | |
| 004 | | blue16tall |
| 005 | red16short | |
| 005 | green12bald | orange14tall |
| 006 | blue3short | red16big |
| 006 | green16flat | |
---------------------------------------------
This is the result we are looking for:
---------------------------------------------
| groupid | category | test_results |
---------------------------------------------
| 001 | red13tall | |
| 001 | | blue16small |
| 003 | yellow3tall | |
| 003 | | green2giant |
| 005 | red16short | |
| 005 | green12bald | orange14tall |
------------------------------------------
Assuming your table is called your_table and has a primary key of id, then
SELECT t3.groupid, t3.category, t3.test_results
FROM your_table t3
WHERE t3.groupid NOT IN (
SELECT t1.groupid
FROM your_table t1, your_table t2
WHERE t1.id <> t2.id
AND t1.groupid = t2.groupid
AND t1.category LIKE '%16%'
AND t2.test_results LIKE '%16%'
)
Note, this assumes you're looking for 16 to appear in two different rows in the 2 different columns. If you don't care if they appear in the same row then you can remove the t1.id <> t2.id condition.
One way or another you need conditional counting. If you use analytic functions you can avoid joins, which are often a performance drag.
For the solution below I interpreted your words literally: each group has exactly two rows, and a group is excluded if all three conditions are met: BOTH rows have 16 at least once (in category or in test_results); 16 appears in category at least once; and 16 appears in test_results at least once.
You can modify the query very easily if you don't need the condition on each row of the group having 16 at least once (remove all references to r_ct).
with
test_data ( groupid, category, test_results ) as (
select '001', 'red13tall' , null from dual union all
select '001', null , 'blue16small' from dual union all
select '002', 'green16small', null from dual union all
select '002', null , 'blue16small' from dual union all
select '003', 'yellow3tall' , null from dual union all
select '003', null , 'green2giant' from dual union all
select '004', 'orange16tall', null from dual union all
select '004', null , 'blue16tall' from dual union all
select '005', 'red16short' , null from dual union all
select '005', 'green12bald' , 'orange14tall' from dual union all
select '006', 'blue3short' , 'red16big' from dual union all
select '006', 'green16flat' , null from dual
)
-- end of test data (not part of solution); SQL query begins below this line
select groupid, category, test_results
from (
select groupid, category, test_results,
count(case when category like '%16%' then 1
when test_results like '%16%' then 1 end)
over (partition by groupid) as r_ct,
count(case when category like '%16%' then 1 end)
over (partition by groupid) as c_ct,
count(case when test_results like '%16%' then 1 end)
over (partition by groupid) as t_ct
from test_data
)
where r_ct < 2 or c_ct = 0 or t_ct = 0
order by groupid -- if needed
;
Output:
GROUPID CATEGORY TEST_RESULTS
------- ------------ ------------
001 red13tall
001 blue16small
003 yellow3tall
003 green2giant
005 red16short
005 green12bald orange14tall
6 rows selected.