SQL query with array of counts? - sql

I have a table which looks like this:
record no firstType secondtype win?
1 X A 1
2 X A 0
3 X B 1
4 Y B 0
5 Y B 1
6 X B 1
7 X B 1
and what I need output is this.
firstType secondType winCounts
X [A,B] [A:1,B:3]
Y [B] [B:1]
So notice how the arrays under secondType tell where they OCCURED with firstType, while the arrays under winCounts tell how many wins of each secondType came with each firstType.
I can make the arrays using ARRAY_AGG but I'm lost for any possible way to make the winCounts column.

Use two levels of aggregation:
select firsttype, array_agg(secondtype order by secondtype),
array_agg(secondtype || ':' || wins order by secondtype)
from (select firsttype, secondtype, sum(win) as wins
from t
group by firsttype, secondtype
) t
group by firsttype;

Here's a more-complicated solution with a lambda method, because why not:
SELECT
PP.firstType AS "firstType"
, ARRAY_DISTINCT(
ARRAY_AGG(PP.secondType)
) AS "secondType"
, ZIP_WITH(
ARRAY_DISTINCT(
ARRAY_AGG(PP.secondType)
),
ARRAY_AGG(PP.count_str),
(x, y) -> x || ':' || y
) AS "winCount"
FROM (
SELECT
firstType
, secondType
, CAST(SUM("win?") AS VARCHAR(5))
FROM dataTable
WHERE "win?" > 0
GROUP BY
firstType
, secondType
) AS PP (firstType, secondType, count_str)
GROUP BY PP.firstType;

Related

How to unnest and pivot two columns in BigQuery

Say I have a BQ table containing the following information
id
test.name
test.score
1
a
5
b
7
2
a
8
c
3
Where test is nested. How would I pivot test into the following table?
id
a
b
c
1
5
7
2
8
3
I cannot pivot test directly, as I get the following error message at pivot(test): Table-valued function not found. Previous questions (1, 2) don't deal with nested columns or are outdated.
The following query looks like a useful first step:
select a.id, t
from `table` as a,
unnest(test) as t
However, this just provides me with:
id
test.name
test.score
1
a
5
1
b
7
2
a
8
2
c
3
Conditional aggregation is a good approach. If your tables are large, you might find that this has the best performance:
select t.id,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'a') as a,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'b') as b,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'c') as c
from `table` t;
The reason I recommend this is because it avoids the outer aggregation. The unnest() happens without shuffling the data around -- and I have found that this is a big win in terms of performance.
One option could be using conditional aggregation
select id,
max(case when test.name='a' then test.score end) as a,
max(case when test.name='b' then test.score end) as b,
max(case when test.name='c' then test.score end) as c
from
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id
Below is generic/dynamic way to handle your case
EXECUTE IMMEDIATE (
SELECT """
SELECT id, """ ||
STRING_AGG("""MAX(IF(name = '""" || name || """', score, NULL)) AS """ || name, ', ')
|| """
FROM `project.dataset.table` t, t.test
GROUP BY id
"""
FROM (
SELECT DISTINCT name
FROM `project.dataset.table` t, t.test
ORDER BY name
)
);
If to apply to sample data from your question - output is
Row id a b c
1 1 5 7 null
2 2 8 null 3

Get RowsWise data with Columns Name ColumnWise

I have a Table Structure like:
ApplicationId IsFO20Submitted IsFO08Submitted IsFO07Submitted IsFO09Submitted IsFO10Submitted
CBA202000001 Y Y Y Y Y
CBA202000002 Y Y Y Y Y
CBA202000007 Y Y Y Y Y
I want my Result to be like:
ApplicationId CBA202000001 CBA202000002 CBA202000007
IsFO20Submitted Y Y Y
IsFO08Submitted Y Y Y
IsFO07Submitted Y Y Y
IsFO09Submitted Y Y Y
IsFO10Submitted Y Y Y
Is there anything i can try in SQL to get such result
You can use APPLY :
SELECT TT.ApplicationId,
MAX(CASE WHEN t.ApplicationId = 'CBA202000001' then ApplicationVal END) AS [CBA202000001],
MAX(CASE WHEN t.ApplicationId = 'CBA202000002' then ApplicationVal END) AS [CBA202000002],
MAX(CASE WHEN t.ApplicationId = 'CBA202000007' then ApplicationVal END) AS [CBA202000002]
FROM table t CROSS APPLY
( VALUES ('IsFO20Submitted', IsFO20Submitted),
('IsFO08Submitted',IsFO08Submitted),
. . .
('IsFO10Submitted',IsFO10Submitted)
) TT(ApplicationId, ApplicationVal)
GROUP BY TT.ApplicationId;
Check pivot - unpivot function. You'll have to do the unpivot first and then pivot the result. Didn't test it though so you'll have to try it first...
Pivot example:
SELECT * FROM
(
SELECT column1, column2
FROM tables
WHERE conditions
)
PIVOT
(
aggregate_function(column2)
FOR column2
IN ( expr1, expr2, ... expr_n) | subquery
)
ORDER BY expression [ ASC | DESC ];
Unpivot example:
SELECT *
FROM unpivot_test
UNPIVOT (quantity FOR product_code IN (product_code_a AS 'A', product_code_b AS 'B', product_code_c AS 'C', product_code_d AS 'D'));

Select where x in multi column subquery

I Want to Select where the results are in a specific column of a subquery, how do I specify the column the IN should check from?
Select x from foo
where x in (Select y, Max(z) as MaxEntry
From bar
Group By y)
say I'm selecting from this:
Y Z
1 2
1 4
2 7
2 8
I want to see if x is in the 4 or 8 set of the Data
If you only check x column then you can write like this.
Select x from foo
where x in (Select x from bar where blah)
but if you want to check x and y then you can write like this.
Select x from foo
where exists(Select * from bar where bar.x = foo.x and bar.y = foo.y )
for your query, you don't need to write y at the select
Select x from foo
where x in (Select Max(z) as MaxEntry
From bar
Group By y)
I think you want:
SELECT x
FROM foo
WHERE EXISTS
(SELECT 1
FROM bar
GROUP BY y
HAVING MAX(bar.z) = foo.x)
If I understood your question correctly, then this should work:
Select x from foo
join (Select y, Max(z) as MaxEntry
From bar
Group By y)
as m on m.MaxEntry = x

Combining Rows in SQL Viia Recursive Query

I have the following table.
Animal Vaccine_Date Vaccine
Cat 2/1/2016 y
Cat 2/1/2016 z
Dog 2/1/2016 z
Dog 1/1/2016 x
Dog 2/1/2016 y
I would like to get the results to be as shown below.
Animal Vaccine_Date Vaccine
Dog 1/1/2016 x
Dog 2/1/2016 y,z
Cat 2/1/2016 y,z
I have the following code which was supplied via my other post at "Combine(concatenate) rows based on dates via SQL"
WITH RECURSIVE recCTE AS
(
SELECT
animal,
vaccine_date,
CAST(min(vaccine) as VARCHAR(50)) as vaccine, --big enough to hold concatenated list
cast (1 as int) as depth --used to determine the largest/last group_concate (the full group) in the final select
FROM TableOne
GROUP BY 1,2
UNION ALL
SELECT
recCTE.animal,
recCTE.vaccine_date,
trim(trim(recCTE.vaccine)|| ',' ||trim(TableOne.vaccine)) as vaccine,
recCTE.depth + cast(1 as int) as depth
FROM recCTE
INNER JOIN TableOne ON
recCTE.animal = TableOne.animal AND
recCTE.vaccine_date = TableOne.vaccine_date and
TableOne.vaccine > recCTE.vaccine
WHERE recCTE.depth < 5
)
--Now select the result with the largest depth for each animal/vaccine_date combo
SELECT * FROM recCTE
QUALIFY ROW_NUMBER() OVER (PARTITION BY animal,vaccine_date ORDER BY depth desc) =1
But this results in the following.
Animal Vaccine_Date vaccine depth
Cat 2/1/2016 y,z,z,z,z 5
Dog 1/1/2016 x 1
Dog 2/1/2016 y,z,z,z,z 5
The "z" keeps repeating. This is because the code is saying anything greater than the minimum vaccine. To account for this, the code was changed to the following.
WITH RECURSIVE recCTE AS
(
SELECT
animal,
vaccine_date,
CAST(min(vaccine) as VARCHAR(50)) as vaccine, --big enough to hold concatenated list
cast (1 as int) as depth, --used to determine the largest/last group_concate (the full group) in the final select
vaccine as vaccine_check
FROM TableOne
GROUP BY 1,2,5
UNION ALL
SELECT
recCTE.animal,
recCTE.vaccine_date,
trim(trim(recCTE.vaccine)|| ',' ||trim(TableOne.vaccine)) as vaccine,
recCTE.depth + cast(1 as int) as depth,
TableOne.vaccine as vaccine_check
FROM recCTE
INNER JOIN TableOne ON
recCTE.animal = TableOne.animal AND
recCTE.vaccine_date = TableOne.vaccine_date and
TableOne.vaccine > recCTE.vaccine and
vaccine_check <> recCTE.vaccine_check
WHERE recCTE.depth < 5
)
--Now select the result with the largest depth for each animal/vaccine_date combo
SELECT * FROM recCTE
QUALIFY ROW_NUMBER() OVER (PARTITION BY animal,vaccine_date ORDER BY depth desc) =1
However, this resulted in the following.
Animal Vaccine_Date vaccine depth vaccine_check
Cat 2/1/2016 y 1 y
Dog 1/1/2016 x 1 x
Dog 2/1/2016 y 1 y
What is missing in the code to get the desired results of the following.
Animal Vaccine_Date Vaccine
Dog 1/1/2016 x
Dog 2/1/2016 y,z
Cat 2/1/2016 y,z
Hmmm. I don't have Teradata on hand but this is a major shortcoming in the project (in my opinion). I think this will work for you, but it might need some tweaking:
with tt as (
select t.*,
row_number() over (partition by animal, vaccine_date order by animal) as seqnum
count(*) over (partition by animal, vaccine_date) as cnt
),
recursive cte as (
select animal, vaccine_date, vaccine as vaccines, seqnum, cnt
from tt
where seqnum = 1
union all
select cte.animal, cte.dte, cte.vaccines || ',' || t.vaccine, tt.seqnum, tt.cnt
from cte join
tt
on tt.animal = cte.animal and
tt.vaccine_date = cte.vaccine_date and
tt.seqnum = cte.seqnum + 1
)
select cte.*
from cte
where seqnum = cnt;
If your Teradata Database version is 14.10 or higher it supports XML data type. This also means that XMLAGG function is supported which would be useful for your case and would let you avoid recursion.
Check if XMLAGG function exists, which is installed with XML Services as an UDF:
SELECT * FROM dbc.FunctionsV WHERE FunctionName = 'XMLAGG'
If it does, then the query would look like:
SELECT
animal,
vaccine_date
TRIM(TRAILING ',' FROM CAST(XMLAGG(vaccine || ',' ORDER BY vaccine) AS VARCHAR(10000)))
FROM
tableone
GROUP BY 1,2
I have no way of testing this atm, but I believe this should work with possibility of minor tweaks.
I was able to get the desired results with the following SQL. This doesn't seem very efficient at all and is not dynamic. However, I can add extra sub querys as needed to combine more vaccines by animal by date.
select
qrya.animal
,qrya.vaccine_date
,case when qrya.vac1 is not null then qrya.vac1 else null end ||','||case when qrya.animal=qryb.animal and qrya.vaccine_date=qryb.vaccine_date then qryb.Vac2 else 'End' end as vaccine_List
from
(
select
qry1.Animal
,qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 1 then qry1.vaccine end as Vac1
from
(
select
animal
,vaccine_date
,vaccine
,row_number() over (partition by animal,vaccine_date order by vaccine) as Vaccine_Rank
from TableOne
) as qry1
where vac1 is not null
group by qry1.Animal,
qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 1 then qry1.vaccine end
) as qrya
join
(
select
qry1.Animal
,qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 2 then qry1.vaccine end as Vac2
from
(
select
animal
,vaccine_date
,vaccine
,row_number() over (partition by animal,vaccine_date order by vaccine) as Vaccine_Rank
from TableOne
) as qry1
where vac2 is not null
group by qry1.Animal,
qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 2 then qry1.vaccine end
) as qryb
on qrya.Animal=qryb.Animal

Issue with recursive CTE in PostgreSQL

This query generates the numbers from 1 to 4.
with recursive z(q) as (
select 1
union all
select q + 1 from z where q < 4
)
select * from z;
But, if I modify it to this,
with x as (
select 1 y
),
recursive z(q) as (
select y from x
union all
select q + 1 from z where q < 4
)
select * from z;
It gives
ERROR: syntax error at or near "z"
What did i do wrong here?
I think this is because RECURSIVE is modifier of WITH statement, not a property of common table expression z, so you can use it like this:
with recursive
x as (
select 1 y
),
z(q) as (
select y from x
union all
select q + 1 from z where q < 4
)
select * from z;
sql fiddle demo