how to get this sql query - sql

hello everyone I have a problem for this query, I need to get the rows where the id has the same numbers:
id2 | num
----+------
28 | 6
28 | 104
28 | 106
50 | 6
50 | 104
expected result:
id2 | num
----+-----
28 | 6
28 | 104
50 | 6
50 | 104
result doesn't include 28 106 because there's no 50 106 .
case 2:
id2 | num
----+-----
29 | 1
30 | 1
31 | 1
expected result:
id2 | num
----+-----
29 | 1
30 | 1
31 | 1
retrieves all because all the ids have num equal to 1
these numbers are random the condition is that if there are more than two ids they must have the same numbers in column 2

One way to do this is to count the occurrences of each num value, and compare it with the number of DISTINCT id2 values. If they are the same, then that num value occurs for every id2 value. You can then SELECT rows from the table which match those num values:
SELECT *
FROM data
WHERE num IN (SELECT num
FROM data
GROUP BY num
HAVING COUNT(*) = (SELECT COUNT(DISTINCT id2) FROM data))
ORDER BY id2, num
Output (for first dataset):
id2 num
28 6
28 104
50 6
50 104
Output (for second dataset):
id2 num
29 1
30 1
31 1
Demo on SQLFiddle

Another way is to use an OVER clause to count the occurrences of each value in num and compare to count of distinct values in id2 - which can be laterally joined:
CREATE TABLE mytable(
id2 VARCHAR(11)
,num INTEGER
);
INSERT INTO mytable(id2,num) VALUES ('28',6);
INSERT INTO mytable(id2,num) VALUES ('28',104);
INSERT INTO mytable(id2,num) VALUES ('28',106);
INSERT INTO mytable(id2,num) VALUES ('50',6);
INSERT INTO mytable(id2,num) VALUES ('50',104);
select id2, num
from (
select
id2, num
, count(*) over(partition by num) c_num
, ca.c_id2
from mytable
left join lateral (select count(distinct id2) c_id2 from mytable) ca on true
) d
where c_num = c_id2
;
id2 | num
:-- | --:
28 | 6
50 | 6
28 | 104
50 | 104
CREATE TABLE mytable(
id2 VARCHAR(11)
,num INTEGER
);
INSERT INTO mytable(id2,num) VALUES ('29',1);
INSERT INTO mytable(id2,num) VALUES ('30',1);
INSERT INTO mytable(id2,num) VALUES ('31',1);
select id2, num
from (
select
id2, num
, count(*) over(partition by num) c_num
, ca.c_id2
from mytable
left join lateral (select count(distinct id2) c_id2 from mytable) ca on true
) d
where c_num = c_id2
;
id2 | num
:-- | --:
29 | 1
30 | 1
31 | 1
db<>fiddle here

Basically, you want to count the number of distinct id2 values in the data and the number of distinct id2 values on each num. If only Postgres supported count(distinct) as a window function, you could do:
select id2, num
from (select t.*,
count(distinct t.id2) over (partition by t.num) as cnt_id2_on_num,
count(distinct t.id2) over () as cnt_id2
from t
) t
where cnt_id2_on_num = cnt_id2;
There is a simple work-around, which is the sum of dense_rank()s:
select id2, num
from (select t.*,
(dense_rank() over (partition by t.num order by t.id2) +
dense_rank() over (partition by t.num order by t.id2 desc)
) as cnt_id2_on_num,
(dense_rank() over (order by t.id2) +
dense_rank() over (order by t.id2 desc)
) as cnt_id2
from mytable t
) d
where cnt_id2_on_num = cnt_id2;
If you know there are no duplicates, you can write this as:
select id2, num
from (select t.*,
count(*) (partition by t.num) as cnt_id2_on_num,
(dense_rank() over (order by t.id2) +
dense_rank() over (order by t.id2 desc)
) as cnt_id2
from mytable t
) d
where cnt_id2_on_num = cnt_id2;

Related

BigQuery - Recursive query to get leaf rows

I have a table in BigQuery which contains self referencing data like -
id | name | parent_id
----------------------------
1 | Gross Margin | null
2 | Revenue | 1
3 | Sales A | 2
4 | Sales B | 2
5 | 1001 | 3
6 | 1002 | 4
7 | OPEX | null
8 | Salaries | 7
9 | Payroll | 8
10 | Allowances | 9
11 | Commissions | 9
I want to write a query that returns the leaf rows of any row. For example if I give Gross Margin (or 1) as input, the query should return 1001 and 1002 (or 5 and 6) as output. Similarly if I give OPEX (or 7) as input, the query should return Allowances and Commissions (or 10 and 11) as output.
Updated script with improved convergence logic
DECLARE run_away_stop INT64 DEFAULT 0;
DECLARE flag BYTES;
CREATE TEMP TABLE ttt AS
SELECT parent_id, ARRAY_AGG(id order by id) children FROM `project.dataset.table` WHERE NOT parent_id IS NULL GROUP BY parent_id;
LOOP
SET (run_away_stop, flag) = (SELECT AS STRUCT run_away_stop + 1, md5(to_json_string(array_agg(t order by parent_id))) FROM ttt t);
CREATE OR REPLACE TEMP TABLE ttt1 AS
SELECT parent_id, ARRAY(SELECT DISTINCT id FROM UNNEST(children) id order by id) children
FROM (
SELECT parent_id, ARRAY_CONCAT_AGG(children) children
FROM (
SELECT t2.parent_id, ARRAY_CONCAT(t1.children, t2.children) children
FROM ttt t1, ttt t2
WHERE (SELECT COUNTIF(t1.parent_id = id) FROM UNNEST(t2.children) id) > 0
) GROUP BY parent_id
);
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT * FROM ttt1 UNION ALL
SELECT * FROM ttt WHERE NOT parent_id IN (SELECT parent_id FROM ttt1);
IF (flag = (SELECT md5(to_json_string(array_agg(t order by parent_id))) FROM ttt t)) OR run_away_stop > 20 THEN BREAK; END IF;
END LOOP;
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT id,
(
SELECT STRING_AGG(CAST(id AS STRING) order by id)
FROM ttt.children id
) children_as_list
FROM (SELECT DISTINCT id FROM `project.dataset.table`) d
LEFT JOIN ttt ON id = parent_id;
SELECT t1.id, STRING_AGG(child) leafs_list
FROM ttt t1, UNNEST(SPLIT(children_as_list)) child
JOIN (SELECT id FROM ttt WHERE children_as_list IS NULL) t2
ON t2.id = CAST(child AS INT64)
GROUP BY t1.id
ORDER BY id;
When applied to sample data in your question - it took 3 iterations and output is
Also, when applied to bigger example from your comments - it took 5 iterations and output is
BigQuery Team just introduced Recursive CTE! Hooray!!
With recursive cte you can use below approach
with recursive iterations as (
select id, parent_id from your_table
where not id in (
select parent_id from your_table
where not parent_id is null
)
union all
select b.id, a.parent_id
from your_table a join iterations b
on b.parent_id = a.id
)
select parent_id, string_agg('' || id order by id) as leafs_list
from iterations
where not parent_id is null
group by parent_id
If applied to sample data in your question - output is
Hope you agree it is more manageable and effective then when we were "forced" to use scripts for such logic!

Count Top 5 Elements spread over rows and columns

Using T-SQL for this table:
+-----+------+------+------+-----+
| No. | Col1 | Col2 | Col3 | Age |
+-----+------+------+------+-----+
| 1 | e | a | o | 5 |
| 2 | f | b | a | 34 |
| 3 | a | NULL | b | 22 |
| 4 | b | c | a | 55 |
| 5 | b | a | b | 19 |
+-----+------+------+------+-----+
I need to count the TOP 3 names (Ordered by TotalCount DESC) across all rows and columns, for 3 Age groups: 0-17, 18-49, 50-100. Also, how do I ignore the NULLS from my results?
If it's possible, how I can also UNION the results for all 3 age groups into one output table to get 9 results (TOP 3 x 3 Age groups)?
Output for only 1 Age Group: 18-49 would look like this:
+------+------------+
| Name | TotalCount |
+------+------------+
| b | 4 |
| a | 3 |
| f | 1 |
+------+------------+
You need to unpivot first your table and then exclude the NULLs. Then do a simple COUNT(*):
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
)
SELECT TOP 3
Name, COUNT(*) AS TotalCount
FROM CteUnpivot
WHERE Age BETWEEN 18 AND 49
GROUP BY Name
ORDER BY COUNT(*) DESC
ONLINE DEMO
If you want to get the TOP 3 for each age group:
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
),
CteRn AS (
SELECT
AgeGroup =
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name,
COUNT(*) AS TotalCount
FROM CteUnpivot
GROUP BY
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name
)
SELECT
AgeGroup, Name, TotalCount
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY AgeGroup, Name ORDER BY TotalCount DESC)
FROM CteRn
) t
WHERE rn <= 3;
ONLINE DEMO
The unpivot technique using CROSS APPLY and VALUES:
An Alternative (Better?) Method to UNPIVOT (SQL Spackle) by Dwain Camps
You can check below multiple-CTE SQL select statement
Row_Number() with Partition By clause is used ordering records within each group categorized by ages
/*
CREATE TABLE tblAges(
[No] Int,
Col1 VarChar(10),
Col2 VarChar(10),
Col3 VarChar(10),
Age SmallInt
)
INSERT INTO tblAges VALUES
(1, 'e', 'a', 'o', 5),
(2, 'f', 'b', 'a', 34),
(3, 'a', NULL, 'b', 22),
(4, 'b', 'c', 'a', 55),
(5, 'b', 'a', 'b', 19);
*/
;with cte as (
select
col1 as col, Age
from tblAges
union all
select
col2, Age
from tblAges
union all
select
col3, Age
from tblAges
), cte2 as (
select
col,
case
when age < 18 then '0-17'
when age < 50 then '18-49'
else '50-100'
end as grup
from cte
where col is not null
), cte3 as (
select
grup,
col,
count(grup) cnt
from cte2
group by
grup,
col
)
select * from (
select
grup, col, cnt, ROW_NUMBER() over (partition by grup order by cnt desc) cnt_grp
from cte3
) t
where cnt_grp <= 3
order by grup, cnt

Using Case in a select statement

Consider the following table
create table temp (id int, attribute varchar(25), value varchar(25))
And values into the table
insert into temp select 100, 'First', 234
insert into temp select 100, 'Second', 512
insert into temp select 100, 'Third', 320
insert into temp select 101, 'Second', 512
insert into temp select 101, 'Third', 320
I have to deduce a column EndResult which is dependent on 'attribute' column. For each id, I have to parse through attribute values in the order
First, Second, Third and choose the very 1st value which is available i.e. for id = 100, EndResult should be 234 for the 1st three records.
Expected result:
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 234 |
| 100 | 234 |
| 101 | 512 |
| 101 | 512 |
I tried with the following query in vain:
select id, case when isnull(attribute,'') = 'First'
then value
when isnull(attribute,'') = 'Second'
then value
when isnull(attribute,'') = 'Third'
then value
else '' end as EndResult
from
temp
Result
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 512 |
| 100 | 320 |
| 101 | 512 |
| 101 | 320 |
Please suggest if there's a way to get the expected result.
You can use analytical function like dense_rank to generate a numbering, and then select those rows that have the number '1':
select
x.id,
x.attribute,
x.value
from
(select
t.id,
t.attribute,
t.value,
dense_rank() over (partition by t.id order by t.attribute) as priority
from
Temp t) x
where
x.priority = 1
In your case, you can conveniently order by t.attribute, since their alphabetical order happens to be the right order. In other situations you could convert the attribute to a number using a case, like:
order by
case t.attribute
when 'One' then 1
when 'Two' then 2
when 'Three' then 3
end
In case the attribute column have different values which are not in alphabetical order as is the case above you can write as:
with cte as
(
select id,
attribute,
value,
case attribute when 'First' then 1
when 'Second' then 2
when 'Third' then 3 end as seq_no
from temp
)
, cte2 as
(
select id,
attribute,
value,
row_number() over ( partition by id order by seq_no asc) as rownum
from cte
)
select T.id,C.value as EndResult
from temp T
join cte2 C on T.id = C.id and C.rownum = 1
DEMO
Here is how you can achieve this using ROW_NUMBER():
WITH t
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY (CASE attribute WHEN 'First' THEN 1
WHEN 'Second' THEN 2
WHEN 'Third' THEN 3
ELSE 0 END)
) rownum
FROM TEMP
)
SELECT id
,(
SELECT value
FROM t t1
WHERE t1.id = t.id
AND rownum = 1
) end_result
FROM t;
For testing purpose, please see SQL Fiddle demo here:
SQL Fiddle Example
keep it simple
;with cte as
(
select row_number() over (partition by id order by (select 1)) row_num, id, value
from temp
)
select t1.id, t2.value
from temp t1
left join cte t2
on t1.Id = t2.id
where t2.row_num = 1
Result
id value
100 234
100 234
100 234
101 512
101 512

pgsql 1 to n relation into json

Long story short, how can i use 1 to n select data to build json like shown in example:
SELECT table1.id AS id1,table2.id AS id2,t_id,label
FROM table1 LEFT JOIN table2 ON table2.t_id = table1.id
result
|id1|id2|t_id|label|
+---+---+----+-----+
|1 | 1 | 1 | a |
| | 2 | 1 | b |
| | 3 | 1 | c |
| | 4 | 1 | d |
|2 | 5 | 2 | x |
| | 6 | 2 | y |
turn into this
SELECT table1.id, build_json(table2.id,table2.label) AS json_data
FROM table1 JOIN table2 ON table2.t_id = table1.id
GROUP BY table1.id
|id1|json_data
+--+-----------------
|1 |{"1":"a","2":"b","3":"c","4":"d"}
|2 |{"5":"x","6":"y"}
My guess the best start woulb be building an array from columns
Hstore instead of json would be ok too
well your table structure is a bit strange (is looks more like report than table), so I see two tasks here:
Replace nulls with correct id1. You can do it like this
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select *
from cte2
Now you have to aggregate data into json. As far as I remember, there's no such a function in PostgreSQL, so you have to concatenate data manually:
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select
id1,
('{' || string_agg('"' || id2 || '":' || to_json(label), ',') || '}')::json as json_data
from cte2
group by id1
sql fiddle demo
And if you want to convert into hstore:
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select
c.id1, hstore(array_agg(c.id2)::text[], array_agg(c.label)::text[])
from cte2 as c
group by c.id1
sql fiddle demo

Oracle grouping/changing rows to columns

I have the following table named foo:
ID | KEY | VAL
----------------
1 | 47 | 97
2 | 47 | 98
3 | 47 | 99
4 | 48 | 100
5 | 48 | 101
6 | 49 | 102
I want to run a select query and have the results show like this
UNIQUE_ID | KEY | ID1 | VAL1 | ID2 | VAL2 | ID3 | VAL3
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1 | 97 | 2 | 98 | 3 | 99
48_4:100_5:101 | 48 | 4 | 100 | 5 | 101 | |
49_6:102 | 49 | 6 | 102 | | | |
So, basically all rows with the same KEY get collapsed into 1 row. There can be anywhere from 1-3 rows per KEY value
Is there a way to do this in a sql query (without writing a stored procedure or scripts)?
If not, I could also work with the less desirable choice of
UNIQUE_ID | KEY | IDS | VALS
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1,2,3 | 97,98,99
48_4:100_5:101 | 48 | 4,5 | 100, 101
49_6:102 | 49 | 6 | 102
Thanks!
UPDATE:
Unfortunately my real-world problem seems to be much more difficult than this example, and I'm having trouble getting either example to work :( My query is over 120 lines so it's not very easy to post. It kind of looks like:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select distinct (...) as "UniqueID", myKey, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
If I try any of the methods below it seems that I cannot do group-bys correctly because of all the other data being returned.
Ex:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select "Unique ID",
myKey, max(decode(id_col,1,id_col)) as id_1, max(decode(id_col,1,myVal)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,myVal)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,myVal)) as val_3
from (
select distinct (...) as "UniqueID", myKey, row_number() over (partition by myKey order by id) as id_col, id, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
) group by myKey;
Gives me the error: ORA-00979: not a GROUP BY expression
This is because I am selecting the UniqueID from the inner select. I will need to do this as well as select other columns from the inner table.
Any help would be appreciated!
Take a look ath this article about Listagg function, this will help you getting the comma separated results, it works only in the 11g version.
You may try this
select key,
max(decode(id_col,1,id_col)) as id_1,max(decode(id_col,1,val)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,val)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,val)) as val_3
from (
select key, row_number() over (partition by key order by id) as id_col,id,val
from your_table
)
group by key
As #O.D. suggests, you can generate the less desirable version with LISTAGG, for example (using a CTE to generate your sample data):
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) as unique_id,
key,
listagg(id, ',') within group (order by id) as ids,
listagg(val, ',') within group (order by id) as vals
from foo
group by key
order by key;
UNIQUE_ID KEY IDS VALS
----------------- ---- -------------------- --------------------
47_1:97_2:98_3:99 47 1,2,3 97,98,99
48_4:100_5:101 48 4,5 100,101
49_6:102 49 6 102
With a bit more manipulation you can get your preferred results:
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select unique_id, key,
max(id1) as id1, max(val1) as val1,
max(id2) as id2, max(val2) as val2,
max(id3) as id3, max(val3) as val3
from (
select unique_id,key,
case when r = 1 then id end as id1, case when r = 1 then val end as val1,
case when r = 2 then id end as id2, case when r = 2 then val end as val2,
case when r = 3 then id end as id3, case when r = 3 then val end as val3
from (
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) over (partition by key) as unique_id,
key, id, val,
row_number() over (partition by key order by id) as r
from foo
)
)
group by unique_id, key
order by key;
UNIQUE_ID KEY ID1 VAL1 ID2 VAL2 ID3 VAL3
----------------- ---- ---- ---- ---- ---- ---- ----
47_1:97_2:98_3:99 47 1 97 2 98 3 99
48_4:100_5:101 48 4 100 5 101
49_6:102 49 6 102
Can't help feeling there ought to be a simpler way though...