Related
I have a table in postgres like below
I want an sql in postgres that count a combination of 2 columns that has YY
Expecting an output like
Combination Count
AB 2
AC 1
AD 2
AZ 1
BC 1
BD 3
BZ 2
CD 2
CZ 0
DZ 1
Can anyone help me?
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name AS combo
, (CASE WHEN t1.col_value = 'Y' AND t2.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
AND t1.col_name < t2.col_name) t3
GROUP BY combo
ORDER BY combo
yields
| combo | count |
|-------+-------|
| AB | 2 |
| AC | 1 |
| AD | 2 |
| AZ | 2 |
| BC | 1 |
| BD | 3 |
| BZ | 2 |
| CD | 2 |
| CZ | 0 |
| DZ | 1 |
The unnesting recipe for unpivoting the table comes from Stew's post, here.
To count occurrances of YYY among 3 columns you could use:
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name || t3.col_name AS combo
, (CASE WHEN t1.col_value = 'Y'
AND t2.col_value = 'Y'
AND t3.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
INNER JOIN stacked t3
ON t1.id = t3.id
AND t1.col_name < t2.col_name
And t2.col_name < t3.col_name
) t3
GROUP BY combo
ORDER BY combo
;
which yields
| combo | count |
|-------+-------|
| ABC | 0 |
| ABD | 1 |
| ABZ | 2 |
| ACD | 1 |
| ACZ | 0 |
| ADZ | 1 |
| BCD | 1 |
| BCZ | 0 |
| BDZ | 1 |
| CDZ | 0 |
Or, to handle combinations of N columns, you could use WITH RECURSIVE:
For example, for N = 3,
WITH RECURSIVE result AS (
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t)
SELECT id, array[col_name] AS path, array[col_value] AS path_val, col_name AS last_name
FROM stacked
UNION
SELECT r.id, path || s.col_name, path_val || s.col_value, s.col_name
FROM result r
INNER JOIN stacked s
ON r.id = s.id
AND s.col_name > r.last_name
WHERE array_length(r.path, 1) < 3) -- Change 3 to your value for N
SELECT combo, sum(cnt)
FROM (
SELECT id, array_to_string(path, '') AS combo, (CASE WHEN 'Y' = all(path_val) THEN 1 ELSE 0 END) AS cnt
FROM result
WHERE array_length(path, 1) = 3) t -- Change 3 to your value for N
GROUP BY combo
ORDER BY combo
Note that N = 3 is used in 2 places in the SQL above.
I would do this using a lateral join:
with vals as (
select v.*
from t cross join lateral
(values ('A', A), ('B', B), ('C', C), ('D', D), ('Z', Z)
) v(which, val)
)
select (v1.which || v2.which) as combo,
sum( (val = 'Y')::int ) as count
from vals v1 join
vals v2
on v1.which < v2.which
group by combo
order by combo;
I consider lateral joins to be a more direct way to unpivot the values. There is no need to convert the values to an array an unnest, much less unnest two arrays and align the values.
I have two tables that I want to join. I've tried an usual left and right join but neither gives the result I want.
TABLE A
ID_A VALUE_A
-----------------
A 1
B 2
TABLE B
ID_B ID_A VALUE_B
-------------------------
90 A 1
90 C 1
90 E 1
91 A 1
91 B 1
92 B 1
92 E 1
92 F 1
I want to get this result:
ID_A VALUE_A ID_B ID_A VALUE_B
-------------------------------------------------
A 1 90 A 1
B 2 90 NULL NULL
A 1 91 A 1
B 2 91 B 1
A 1 92 NULL NULL
B 2 92 B 1
If I understand correctly, you want all combinations of id_a and value_a from the first table along with all distinct id_b from the second table. If so:
select iv.id_a, iv.value_a, ib.id_b, b.id_a, b.value_b
from (select distinct id_a, value_a from a) iv cross join
(select distinct id_b from b) ib left join
b
on b.id_b = ib.id_b and b.id_a = iv.id_a;
The cross join generates the rows. The left join brings in the additional columns.
I usually break things like this down into CTEs:
DDL
use tempdb
CREATE TABLE Table1
([ID_A] varchar(1), [VALUE_A] int)
;
INSERT INTO Table1
([ID_A], [VALUE_A])
VALUES
('A', 1),
('B', 2)
;
CREATE TABLE Table2
([ID_B] int, [ID_A] varchar(1), [VALUE_B] int)
;
INSERT INTO Table2
([ID_B], [ID_A], [VALUE_B])
VALUES
(90, 'A', 1),
(90, 'C', 1),
(90, 'E', 1),
(91, 'A', 1),
(91, 'B', 1),
(92, 'B', 1),
(92, 'E', 1),
(92, 'F', 1)
;
Answer
with a as (
select distinct id_b
from Table2
),
b as (
select id_a, value_a, id_b
from Table1 cross join a
)
select b.id_a, b.value_a, b.id_b, t2.id_a, t2.value_b
from b left join Table2 t2
on b.id_a = t2.id_a
and b.id_b = t2.id_b
Results
+------+---------+------+------+---------+
| id_a | value_a | id_b | id_a | value_b |
+------+---------+------+------+---------+
| A | 1 | 90 | A | 1 |
| B | 2 | 90 | NULL | NULL |
| A | 1 | 91 | A | 1 |
| B | 2 | 91 | B | 1 |
| A | 1 | 92 | NULL | NULL |
| B | 2 | 92 | B | 1 |
+------+---------+------+------+---------+
I couldn't resolve the exact logic and couldn't match the results exactly as desired , but presume you'd like to get something like :
SELECT a.ID_A, COALESCE(a.VALUE_A,b.VALUE_B) VALUE_A, b.ID_B, a.ID_A,
(CASE WHEN a.ID_A IS NULL THEN a.ID_A ELSE CAST(b.VALUE_B as VARCHAR(1)) END)
as VALUE_B
FROM TABLE_A a FULL OUTER JOIN TABLE_B b
ON ( a.ID_A = b.ID_A )
GROUP BY a.ID_A, a.VALUE_A, b.ID_B, a.ID_A, b.VALUE_B
ORDER BY 3, 2, 1;
SQL Fiddle Demo
Try this:
SELECT A.ID_A , A.VALUE_A , B.ID_B , B.ID_A , B.VALUE_B
FROM TABLE_A A
LEFT OUTER JOIN TABLE_B B
ON A.ID_A = B.ID_A ;
EDIT: Typos corrected following sticky bit note (thanks!!).
I have a table that looks like the following:
col1 | col2 | col3 | col4
A | 1 | 2 | 4
A | 2 | 5 | 3
A | 5 | 1 | 6
B | 3 | 1 | 2
B | 4 | 4 | 4
I have another table where the records are unique and looks like the following:
col1 | col2
A | 2
B | 1
I want to query Table 1 in such a way that I filter out only n number of records for each category in Table 1 based on the value the categories have in Table 2.
Based on Table 2 I need to extract 2 records for A and 1 record for B. I need the resulting queried table to look like the following:
col1 | col2 | col3 | col4
A | 2 | 5 | 3
A | 1 | 2 | 4
B | 3 | 1 | 2
The choice of the records are made based on col4 sorted in ascending order. I am currently tring to do this on BigQuery.
You can use row_number() and join:
select t1.col1, t1.col2, t1.col3, t1.col4
from (select t1.*, row_number() over (partition by col1 order by col4) as seqnum
from table1 t1
) t1 join
table2 t2
on t2.col1 = t1.col1 and t1.seqnum <= t2.col2
order by t1.col1, t1.col4;
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM (
SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt
you can test / play with it using dummy data from your question as below
#standardSQL
WITH `table1` AS (
SELECT 'A' col1, 1 col2, 2 col3, 4 col4 UNION ALL
SELECT 'A', 2, 5, 3 UNION ALL
SELECT 'A', 5, 1, 6 UNION ALL
SELECT 'B', 3, 1, 2 UNION ALL
SELECT 'B', 4, 4, 4
), `table2` AS (
SELECT 'A' col1, 2 col2 UNION ALL
SELECT 'B', 1
)
SELECT t.*
FROM (
SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt
with output as
Row col1 col2 col3 col4
1 A 2 5 3
2 A 1 2 4
3 B 3 1 2
I want to
If a column is 1, then copy the column name (to a new column). For example, for ID 1, the Name1 is 1, then we copy 'Name1' (to the 'Name' column). Else, do nothing.
If two columns (Name1, Name2) are both 1, then we will have two rows for each name. For example, ID 3.
Input
ID Name1 Name2
1 1 0
2 0 1
3 1 1
Output
ID Name
1 Name1
2 Name2
3 Name1
3 Name2
Do I need some advanced keywords to do that?
You should be able to use the UNPIVOT function to get the result. This converts your columns into rows, then you can filter the final result based on whether the value of the original column is 0 or 1:
select Id, Name
from <yourtable>
unpivot
(
value for
name in (Name1, Name2)
) u
where value <> 0
Here is a demo
One way is using union all
select id,
'Name1' as name
from your_table
where name1 = 1
union all
select id,
'Name2' as name
from your_table
where name2 = 1
You could also use cross apply if there are more columns:
select t.id, x.name
from your_table t
cross apply (
values (case when t.name1 = 1 then 'Name1' end),
(case when t.name2 = 1 then 'Name2' end),
(case when t.name3 = 1 then 'Name3' end)
) x (name)
where x.name is not null;
Demo
You can use cross apply and get this as below:
select Id, nam as [Name] from #yournames
cross apply ( values (name1, 'name1'),(name2, 'name2')) v(n, nam)
where n = 1
Output:
+----+-------+
| Id | Name |
+----+-------+
| 1 | name1 |
| 2 | name2 |
| 3 | name1 |
| 3 | name2 |
+----+-------+
If there are only 3 columns, use union
select id, 'Name1' as Name from Input where Name1=1
union all
select id, 'Name2' as Name from Input where Name2=1
My table in SQL is like:-
RN Name value1 value2 Timestamp
1 Mark 110 210 20160119
1 Mark 106 205 20160115
1 Mark 103 201 20160112
2 Steve 120 220 20151218
2 Steve 111 210 20151210
2 Steve 104 206 20151203
Desired Output:-
RN Name value1Lag1 value1lag2 value2lag1 value2lag2
1 Mark 4 3 5 4
2 Steve 9 7 10 4
The difference is calculated from the most recent to the second recent and then from second recent to the third recent for RN 1
value1lag1 = 110-106 =4
value1lag2 = 106-103 = 3
value2lag1 = 210-205 = 5
value2lag2 = 205-201 = 4
similarly for other RN's also.
Note: For each RN there are 3 and only 3 rows.
I have tried in several ways by taking help from similar posts but no luck.
I've assumed that RN and Name are linked here. It's a bit messy, but if each RN always has 3 values and you always want to check them in this order, then something like this should work.
SELECT
t1.Name
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) value1Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value1 ELSE NULL END) value1Lag2
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) value2Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value2 ELSE NULL END) value2Lag2
FROM table t1
INNER JOIN
(
SELECT
t1.Name
, t1.value1
, t1.value2
, COUNT(t2.TimeStamp) Rank
FROM table t1
INNER JOIN table t2
ON t2.name = t1.name
AND t1.TimeStamp <= t2.TimeStamp
GROUP BY t1.Name, t1.value1, t1.value2
) table_ranked
ON table_ranked.Name = t1.Name
GROUP BY t1.Name
There are other answers here, but I think your problem is calling for analytic functions, specifically LAG():
select
rn,
name,
-- calculate the differences
value1 - v1l1 value1lag1,
v1l1 - v1l2 value1lag2,
value2 - v2l1 value2lag1,
v2l1 - v2l2 value2lag2
from (
select
rn,
name,
value1,
value2,
timestamp,
-- these two are the values from the row before this one ordered by timestamp (ascending)
lag(value1) over(partition by rn, name order by timestamp asc) v1l1,
lag(value2) over(partition by rn, name order by timestamp asc) v2l1
-- these two are the values from two rows before this one ordered by timestamp (ascending)
lag(value1, 2) over(partition by rn, name order by timestamp asc) v1l2,
lag(value2, 2) over(partition by rn, name order by timestamp asc) v2l2
from (
select
1 rn, 'Mark' name, 110 value1, 210 value2, '20160119' timestamp
from dual
union all
select
1 rn, 'Mark' name, 106 value1, 205 value2, '20160115' timestamp
from dual
union all
select
1 rn, 'Mark' name, 103 value1, 201 value2, '20160112' timestamp
from dual
union all
select
2 rn, 'Steve' name, 120 value1, 220 value2, '20151218' timestamp
from dual
union all
select
2 rn, 'Steve' name, 111 value1, 210 value2, '20151210' timestamp
from dual
union all
select
2 rn, 'Steve' name, 104 value1, 206 value2, '20151203' timestamp
from dual
) data
)
where
-- return only the rows that have defined values
v1l1 is not null and
v1l2 is not null and
v2l1 is not null and
v2l1 is not null
This approach has the benefit that Oracle does all the necessary buffering internally, avoiding self-joins and the like. For big data sets this can be important from a performance viewpoint.
As an example, the explain plan for that query would be something like
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 150 | 13 (8)| 00:00:01 |
|* 1 | VIEW | | 6 | 150 | 13 (8)| 00:00:01 |
| 2 | WINDOW SORT | | 6 | 138 | 13 (8)| 00:00:01 |
| 3 | VIEW | | 6 | 138 | 12 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V1L1" IS NOT NULL AND "V1L2" IS NOT NULL AND "V2L1" IS
Note that there are no joins, just a WINDOW SORT that buffers the necessary data from the "data source" (in our case, the VIEW 3 that is the UNION ALL of our SELECT ... FROM DUAL) to partition and calculate the different lags.
if just in this case, it's not that difficult.you need 2 steps
self join and get the result of minus
select t1.RN,
t1.Name,
t1.rm,
t2.value1-t1.value1 as value1,
t2.value2-t1.value2 as value2
from
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table)t1
left join
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table) t2
on t1.rm = t2.rm-1
where t2.RN is not null.
you set this as a table let's say table3.
2.you pivot it
select * from (
select t3.RN, t3.Name,t3.rm,t3.value1,t3.value2 from table3 t3
)
pivot
(
max(value1)
for rm in ('1','2')
)v1
3.you get 2 pivot table for value1 and value2 join them together to get the result.
but i think there may be a better way and i m not sure if we can just join pivot when we pivot it so i ll use join after i get the pivot result that will make 2 more tables. its not good but the best i can do
-- test data
with data(rn,
name,
value1,
value2,
timestamp) as
(select 1, 'Mark', 110, 210, to_date('20160119', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 106, 205, to_date('20160115', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 103, 201, to_date('20160112', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 120, 220, to_date('20151218', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 111, 210, to_date('20151210', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 104, 206, to_date('20151203', 'YYYYMMDD') from dual),
-- first transform value1, value2 to value_id (1,2), value
data2 as
(select d.rn, d.name, 1 as val_id, d.value1 as value, d.timestamp
from data d
union all
select d.rn, d.name, 2 as val_id, d.value2 as value, d.timestamp
from data d)
select * -- find previous row P of row D, evaluate difference and build column name as desired
from (select d.rn,
d.name,
d.value - p.value as value,
'value' || d.val_id || 'Lag' || row_number() over(partition by d.rn, d.val_id order by d.timestamp desc) as col
from data2 p, data2 d
where p.rn = d.rn
and p.val_id = d.val_id
and p.timestamp =
(select max(pp.timestamp)
from data2 pp
where pp.rn = p.rn
and pp.val_id = p.val_id
and pp.timestamp < d.timestamp))
-- pivot
pivot(sum(value) for col in('value1Lag1',
'value1Lag2',
'value2Lag1',
'value2Lag2'));