How to pivot and get aggregates for two columns in BigQuery

How to pivot and get aggregates for two columns in BigQuery - sql

So I have a table with some raw data that looks like this:
event_date
country_code
platform
user_id
2022-10-01
UK
android
1
2022-10-01
UK
android
4
2022-10-02
FR
ios
5
2022-11-02
UK
android
144
2022-12-01
GR
android
154
And I would like to get the aggregates per day and:
per country (only for UK, FR and ES, but also for NOT UK)
per platform
as well as the total installs just for the platforms (without the country combinations)
event_date
count_ios
count_android
count_ios_non_uk
count_ios_uk
count_ios_fr
count_ios_es
count_android_non_uk
count_android_uk
count_android_fr
count_android_ies
2022-10-01
2022-10-02
2022-11-02
2022-12-01
I've tried with PIVOT
SELECT * FROM
(
SELECT
event_date,
platform,
country_code
FROM my_table
)
PIVOT
(
COUNT(*) AS count
FOR LOWER(country_code) IN ('uk', 'fr', 'ie')
)
ORDER BY event_date DESC;
but this would only give me the combinations for country codes, and also even for that one I am not entirely sure how to go about non-uk use-case as well as the counts per platform-only

Your aggregations have different conditions, therefore you probably will have a hard time getting all the counts you show.
It's not an elegant solution, but should do the trick:
WITH
my_table AS (
SELECT "2022-10-01" event_date, 'UK' country_code, 'android' platform, 1 user_id union all
SELECT "2022-10-01", 'UK', 'android', 4 user_id union all
SELECT "2022-10-02", 'FR', 'ios' , 5 user_id union all
SELECT "2022-11-02", 'UK', 'android', 144 user_id union all
SELECT "2022-12-01", 'GR', 'android', 154 user_id
)
SELECT DISTINCT
event_date,
(select count(*) from my_table t1 where t1.platform = 'ios' and LOWER(t1.country_code) IN ('uk', 'fr', 'es') and t1.event_date = t0.event_date) count_ios,
(select count(*) from my_table t1 where t1.platform = 'android' and LOWER(t1.country_code) IN ('uk', 'fr', 'es') and t1.event_date = t0.event_date) count_android,
(select count(*) from my_table t1 where t1.platform = 'ios' and LOWER(t1.country_code) NOT IN ('uk') and t1.event_date = t0.event_date) count_ios_non_uk,
(select count(*) from my_table t1 where t1.platform = 'ios' and LOWER(t1.country_code) IN ('uk') and t1.event_date = t0.event_date) count_ios_uk,
(select count(*) from my_table t1 where t1.platform = 'ios' and LOWER(t1.country_code) IN ('fr') and t1.event_date = t0.event_date) count_ios_fr,
(select count(*) from my_table t1 where t1.platform = 'ios' and LOWER(t1.country_code) IN ('es') and t1.event_date = t0.event_date) count_ios_es,
(select count(*) from my_table t1 where t1.platform = 'android' and LOWER(t1.country_code) NOT IN ('uk') and t1.event_date = t0.event_date) count_android_non_uk,
(select count(*) from my_table t1 where t1.platform = 'android' and LOWER(t1.country_code) IN ('uk') and t1.event_date = t0.event_date) count_android_uk,
(select count(*) from my_table t1 where t1.platform = 'android' and LOWER(t1.country_code) IN ('fr') and t1.event_date = t0.event_date) count_android_fr,
(select count(*) from my_table t1 where t1.platform = 'android' and LOWER(t1.country_code) IN ('es') and t1.event_date = t0.event_date) count_android_es,
FROM
my_table t0

Related

Find exactly equal rows in 2 tables, both in terms of value and number

I have two Table, that both of them have 2 field (provinceid,cityid)
i want to find provinceid that have exactly the same cityid in this two table.
for example i have this tables:
table1:
provinceid
cityid
1
1
1
2
2
3
2
4
3
6
table2:
provinceid
cityid
1
1
1
5
2
3
2
4
3
6
3
7
i want a query that just return provinceid =2 and city id =3 and 4.
i try this query and it is right. but i want a better query:
select provinceid ,t1.cityid
from t1
left join t2 on t1=provinceid=t2.provinceid and t1.cityid=t2.cityid
where t2.provinceid is not null and t2.cityid is not null
and t1.provinceid not in (select provinceid
from t2
left join t1 on t1=provinceid=t2.provinceid and t1.cityid=t2.cityid
where t1.provinceid is not null and t1.cityid is not null)
thank you

Try this :
select t1.provinceid ,t1.cityid
from table1 t1 join table2 t2
on t1.provinceid=t2.provinceid
and t1.cityid=t2.cityid
and t1.provinceid in (
select distinct(t1.provinceid)
from
(select provinceid, count(provinceid) as cnt from table1 group by provinceid) as t1
cross join
(select provinceid ,count(provinceid) as cnt from table2 group by provinceid) as t2
where t1.cnt = t2.cnt);
Output:
provinceid
cityid
1
1
2
3
2
4

The simplest method for an exact match is to use string aggregation. The exact syntax varies by database, but in Standard SQL this looks like:
select t1.provinceid, t2.provinceid
from (select provinceid,
listagg(cityid, ',') within group (order by cityid) as cities
from t1
group by provinceid
) t1 join
(select provinceid,
listagg(cityid, ',') within group (order by cityid) as cities
from t2
group by provinceid
) t2
on t1.cities = t2.cities;
If you want the provinceids to be the same as well, just add t1.provinceid = t2.provinceid to the on clause.
Or, if you want the provinceids to be the same, you can use full join instead:
select provinceid
from t1 full join
t2
using (provinceid, cityid)
group by provinceid
having count(*) = count(t1.cityid) and count(*) = count(t2.cityid);

Besides match in provid and cityid, we are looking for exactly matching sets of records as well. There might be many different methods to this. I prefer to have string comparison for list of cities for each provide with addition to provide and cityid match clause to remove other sets of provide and cityid which are available in tables but not the exact row match.
WITH table1 AS(
SELECT 1 AS PROVID, 1 AS CITYID FROM DUAL UNION ALL
SELECT 1 AS PROVID, 2 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 3 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 4 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 6 AS CITYID FROM DUAL
),
table2 AS (
SELECT 1 AS PROVID, 1 AS CITYID FROM DUAL UNION ALL
SELECT 1 AS PROVID, 5 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 3 AS CITYID FROM DUAL UNION ALL
SELECT 2 AS PROVID, 4 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 6 AS CITYID FROM DUAL UNION ALL
SELECT 3 AS PROVID, 7 AS CITYID FROM DUAL
),
listed_table1 AS (
SELECT
a.provid,
listagg(cityid,',') within GROUP (ORDER BY cityid) list_city
FROM table1 a
GROUP BY a.provid
),
listed_table2 AS (
SELECT
a.provid,
listagg(cityid,',') within GROUP (ORDER BY cityid) list_city
FROM table2 a
GROUP BY a.provid
)
SELECT
t1.provid, t1.cityid
FROM
(SELECT x.*, x1.list_city FROM table1 x, listed_table1 x1 WHERE x.provid = x1.provid) t1,
(SELECT y.*, y1.list_city FROM table2 y, listed_table2 y1 WHERE y.provid = y1.provid) t2
WHERE t1.provid = t2.provid AND t1.cityid = t2.cityid AND t1.list_city = t2.list_city
;

You can use (union ..)except (inner join..) to detect non-matches. Step by step
with u12 as (
select PROVID, CITYID from table1
union
select PROVID, CITYID from table2
),
c12 as (
select t1.PROVID, t2.CITYID
from table1 t1
join table2 t2 on t1.PROVID=t2.PROVID and t1.CITYID=t2.CITYID
),
nonMatch as (
select distinct PROVID
from (
select PROVID, CITYID from u12
except
select PROVID, CITYID from c12
) t
)
select *
from table1 t
where not exists (
select 1
from nonMatch n
where n.PROVID = t.PROVID);
If a number of doubles counts then count them first
with t1 as (
select PROVID, CITYID, count(*) n
from table1
group by PROVID, CITYID
),
t2 as (
select PROVID, CITYID, count(*) n
from table2
group by PROVID, CITYID
),
u12 as (
select PROVID, CITYID, n from t1
union
select PROVID, CITYID, n from t2
),
c12 as (
select t1.PROVID, t1.CITYID, t1.n
from t1
join t2 on t1.PROVID = t2.PROVID and t1.CITYID = t2.CITYID and t1.n = t2.n
),
nonMatch as (
select distinct PROVID
from (
select PROVID, CITYID, n from u12
except
select PROVID, CITYID, n from c12
) t
)
select *
from table1 t
where not exists (
select 1
from nonMatch n
where n.PROVID = t.PROVID)
db<>fiddle

Left join only on first row

I have the following sample query:
WITH a As
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,10 as purchases Union all
SELECT '2020-04-01', 'test123', 'abc', 0 Union all
SELECT '2020-04-01', 'test123', 'abc', 0
),
b as
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,50 as budget
)
select
a.date,a.id,a.foo,a.purchases,budget
from a
LEFT JOIN b
ON
concat(a.date,a.id)=concat(b.date,b.id)
and I'd like the following output
Row date,id,foo,purchases,budget
1 2020-04-01,test123,abc,10,50
2 2020-04-01,test123,abc,0,null
3 2020-04-01,test123,abc,0,null
I read many questions on the similar topic but I wasn't able to make it work.

You can use row_number():
select a.date, a.id, a.foo, a.purchases,
(case when a.seqnum = 1 then b.budget end) as budget
from (seleect a.*, row_number() over (partition by date, id order by purchases desc) as seqnum
from a
) a
b
using (date, id);

In the b inner query add this to the columns being selected:
ROW_NUMBER() OVER(PARTITION BY Date ORDER BY Purchases DESC) AS SequenceNumber
Then in your LEFT JOIN add "AND b.SequenceNumber = 1"
If you really are hard-coding everything and not selecting from an actual table, it would be:
WITH a As
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,10 as purchases Union all
SELECT '2020-04-01', 'test123', 'abc', 0 Union all
SELECT '2020-04-01', 'test123', 'abc', 0
),
b as
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,50 as budget
)
select
a.date,a.id,a.foo,a.purchases,budget
from a
LEFT JOIN b
ON a.date = b.date
AND a.id = b.id
AND a.purchases > 0
Otherwise, if you have a table in the inner "b" query. Then it would be something like this:
WITH a As
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,10 as purchases Union all
SELECT '2020-04-01', 'test123', 'abc', 0 Union all
SELECT '2020-04-01', 'test123', 'abc', 0
),
b as
(
SELECT '2020-04-01' as date,'test123' as id,'abc' as foo,50 as budget,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY Id) SequenceNumber
)
select
a.date,a.id,a.foo,a.purchases,budget
from a
LEFT JOIN b
ON a.date = b.date
AND a.id = b.id
AND b.SequenceNumber = 1

How to to give preference to null value during select

I have a table with
Id value
1000 null
1000 En
1000 Fr
1000 Es
1001 En
1001 Fr
1001 Es
Output of the select query should be as follows. (Since 1000 has a null value only, select the row with null value)
Id value
1000 null
1001 En
1001 Fr
1001 Es

You can use NOT EXISTS and a correlated subquery to check for the non-existence of a NULL for an ID. Include these rows and also rows where value is NULL.
SELECT t1.id,
t1.value
FROM elbat t1
WHERE NOT EXISTS (SELECT *
FROM elbat t2
WHERE t2.id = t1.id
AND t2.value IS NULL)
OR t1.value IS NULL;

with
t (id, value) as (
select 1000, null from dual union all
select 1000, 'En' from dual union all
select 1000, 'Fr' from dual union all
select 1000, 'Es' from dual union all
select 1001, 'En' from dual union all
select 1001, 'Fr' from dual union all
select 1001, 'Es' from dual
)
select id, value
from (
select t.*,
dense_rank() over (partition by id order by nvl2(value, 1, 0)) rnk
from t
)
where rnk = 1
;
ID VA
---------- --
1000
1001 En
1001 Fr
1001 Es
Functions used in this query:
NVL2() https://docs.oracle.com/database/121/SQLRF/functions132.htm#SQLRF00685
DENSE_RANK() https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions043.htm

In the most recent versions of Oracle, you can actually do this without a subquery:
select t.*
from t
order by rank() over (partition by id order by (case when value is null then 1 else 2 end))
fetch first 1 row with ties;
Here is a db<>fiddle.

You can use analytical function as following:
Select id , value from
(Select t.*,
Coalesce(Sum(case when value is null then 1 end) over (partition by id), 0) as cnt
From your_table)
Where (cnt = 1 and value is null)
or cnt = 0
Cheers!!

how use column from one subquery to anoter subquery

I have two subquery .i want put p.price from first subquery into secound subquery(place XXX) . but get error => ORA-00904: "P"."PRICE": invalid identifier
select
p.product_id,
p.price,
l.delegation,
l.state
from users
inner join (
select
start_date,
price,
product_id,
row_number() over (partition by serial order by start_date desc ) as rn
from prices
) p on users.serial = p.serial
inner join (
select
sso_id ,
delegation,
state,
updated_at,
row_number() over (partition by state order by updated_at asc) as tl
from payments
where state = 'green' or (state = 'yellow' and delegation > XXX)
) l on users.sso_id= l.sso_id
where
p.rn = 1
and l.tl = 1

Join on the PAYMENTS table and filter the rows to exclude the invalid state/delegation rows and then generate the ROW_NUMBER and filter to find the first row per partition:
Oracle Setup:
CREATE TABLE users ( serial, sso_id ) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 2, 2 FROM DUAL UNION ALL
SELECT 3, 3 FROM DUAL;
CREATE TABLE prices ( serial, start_date, price, product_id ) AS
SELECT 1, DATE '2019-01-01', 20, 1 FROM DUAL UNION ALL
SELECT 1, DATE '2019-02-01', 30, 2 FROM DUAL UNION ALL
SELECT 1, DATE '2019-03-01', 25, 3 FROM DUAL UNION ALL
SELECT 2, DATE '2019-01-01', 20, 1 FROM DUAL UNION ALL
SELECT 2, DATE '2019-02-01', 25, 2 FROM DUAL UNION ALL
SELECT 2, DATE '2019-03-01', 30, 3 FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', 40, 3 FROM DUAL;
CREATE TABLE payments ( sso_id, delegation, state, updated_at ) AS
SELECT 1, 20, 'green', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 1, 30, 'green', DATE '2019-02-01' FROM DUAL UNION ALL
SELECT 1, 27, 'green', DATE '2019-03-01' FROM DUAL UNION ALL
SELECT 1, 22, 'green', DATE '2019-04-01' FROM DUAL UNION ALL
SELECT 1, 26, 'yellow', DATE '2019-05-01' FROM DUAL UNION ALL
SELECT 2, 31, 'yellow', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, 31, 'green', DATE '2019-02-01' FROM DUAL UNION ALL
SELECT 3, 30, 'green', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 3, 30, 'yellow', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 3, 50, 'yellow', DATE '2019-02-01' FROM DUAL;
Query:
SELECT product_id,
price,
delegation,
state
FROM (
select p.product_id,
p.price,
l.delegation,
l.state,
row_number() over ( partition by l.sso_id, l.state order by l.updated_at asc) as tl
from users
inner join (
select serial,
start_date,
price,
product_id,
row_number() over (partition by serial order by start_date desc ) as rn
from prices
) p
on ( users.serial = p.serial AND p.rn = 1 )
inner join payments l
on ( users.sso_id = l.sso_id AND ( l.state = 'green' or (l.state = 'yellow' and l.delegation > p.price ) ) )
)
where tl = 1
Output:
PRODUCT_ID | PRICE | DELEGATION | STATE
---------: | ----: | ---------: | :-----
3 | 25 | 20 | green
3 | 25 | 26 | yellow
3 | 30 | 31 | green
3 | 30 | 31 | yellow
3 | 40 | 30 | green
3 | 40 | 50 | yellow
db<>fiddle here

Use it in the main WHERE clause
select
p.product_id,
p.price,
l.delegation,
l.state
from users
inner join (
select
start_date,
price,
product_id,
row_number() over (partition by serial order by start_date desc ) as rn
from prices
) p on users.serial = p.serial
inner join (
select
sso_id ,
delegation,
state,
updated_at,
row_number() over (partition by state order by updated_at asc) as tl
from payments
where state = 'green' or state = 'yellow'
) l on users.sso_id= l.sso_id
where
p.rn = 1
and l.tl = 1
-- add following condition
and l.delegation > p.price

We need to rejoin your users and price table to your payment to get matching p.price
SELECT p.product_id,
p.price,
l.delegation,
l.state
FROM users
INNER JOIN
(SELECT start_date,
price,
product_id,
ROW_NUMBER() OVER (PARTITION BY serial ORDER BY start_date DESC) AS rn
FROM prices) p ON users.serial = p.serial
INNER JOIN
(SELECT y.sso_id,
y.delegation,
y.state,
y.updated_at,
ROW_NUMBER () OVER (PARTITION BY y.state ORDER BY y.updated_at ASC) AS tl
FROM payments y
INNER JOIN users u ON u.sso_id = y.sso_id
INNER JOIN prices p ON p.serial = y.serial
WHERE tl.state = 'green' OR (tl.state = 'yellow' AND tl.delegation > p.price)) l ON users.sso_id = l.sso_id
WHERE
AND p.rn = 1 AND l.tl = 1

How to insert values which are not present in the table?

I have a table with rows which are look like this:
| ID | NAME | LOCALE |
| x | name | en |
| x | name | ru |
| y | name1| en |
| y | name1| ru |
And so on. But some rows are present in just one locale. I need to insert missing rows, so for every ID and NAME there is 2 rows for 2 locales.

Assuming that each name would only ever have two locales present, then here is a straightforward option:
INSERT INTO yourTable (ID, NAME, LOCALE)
SELECT
ID,
NAME,
CASE WHEN LOCALE = 'en' THEN 'ru' ELSE 'en' END
FROM
(
SELECT ID, NAME, MAX(LOCALE) AS LOCALE
FROM yourTable
GROUP BY ID, NAME
HAVING COUNT(*) = 1
) t;
If you actually have more than two locales, then I think we would have to assume that there is some table containing all locales. The query for that case would be more complicated than what I wrote above.

If I understand well, you may need something like the following:
test case:
create table someTable(ID, NAME, LOCALE) as (
select 'x', 'name' ,'en' from dual union all
select 'x', 'name' ,'ru' from dual union all
select 'y', 'name1' ,'en' from dual union all
select 'y', 'name1' ,'ru' from dual union all
select 'z', 'ZZZZ' ,'ru' from dual
)
add missing rows:
merge into someTable s
using(
select *
from
(select 'en' LOCALE from dual union
select 'ru' LOCALE from dual
)
cross join
( select distinct ID, name from someTable)
) x
on (x.id = s.id and x.name = s.name and s.locale = x.locale)
when not matched then
insert values (x.id, x.name, x.locale)
The result:
ID NAME LOCALE
-- ----- ------
x name en
x name ru
y name1 en
y name1 ru
z ZZZZ ru
z ZZZZ en

Add missing name entries:
INSERT INTO <YourTable>
(ID, NAME, LOCALE)
(
SELECT t1. ID, 'name', t1.LOCALE
FROM <YourTable> t1
WHERE NOT EXISTS
(
SELECT t2.LOCALE
FROM <YourTable> t2
WHERE t2.NAME = 'name' AND t1.LOCALE = t2.LOCALE
)
)
Add missing name1 entries:
INSERT INTO <YourTable>
(ID, NAME, LOCALE)
(
SELECT t1.ID, 'name1', t1.LOCALE
FROM <YourTable> t1
WHERE NOT EXISTS
(
SELECT t2.LOCALE
FROM <YourTable> t2
WHERE t2.NAME = 'name1' AND t1.LOCALE = t2.LOCALE
)
)
If the strings name and name1 don't play a role, and you just need a second row for any locale whioch exist only once you can use:
INSERT INTO <YourTable>
(ID, NAME, LOCALE)
(
SELECT t1.ID, 'Placeholder for locale', t1.LOCALE
FROM <YourTable> t1
WHERE
(
SELECT COUNT(*)
FROM <YourTable> t2
WHERE t1.LOCALE = t2.LOCALE
) = 1
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to pivot and get aggregates for two columns in BigQuery - sql

Related

Find exactly equal rows in 2 tables, both in terms of value and number

Left join only on first row

How to to give preference to null value during select

how use column from one subquery to anoter subquery

How to insert values which are not present in the table?

Categories

Resources