BigQuery unnest percentile struct to a table with columns - google-bigquery

I have a table which looks like below
id
city
price
1
Bacelona
300
2
Barcelona
200
3
London
1000
4
London
2000
I want to create a table with percentiles per city
I am creating a struct with the below query
SELECT AS STRUCT City, APPROX_QUANTILES(CAST(Price as INTEGER), 100) AS quants
FROM T
where Price > 0
group by City
How can I flatten (unnest) the struct into a table with the below columns
City, Offset, Qant
which includes all percentiles for all available cities in the source table T?
I will appreciate your help
Thank you

At first I generate your table with T as ()
Then using your query to obtain QUANTILES (only 10 here to make the example more readable). Then is data is unnest and with the row_number each line gets an index value. Pivot puts each number in a column. The numbers from 1 to 100 have to be written by hand.
With T as (
Select 1 id, "Bacelona" city, 300 price UNION ALL
select 2,"Barcelona", 200 UNION ALL
select 3,"London", 1000 UNION ALL
select 4,"London", 2000
)
SELECT *
from(
SELECT city,quant,row_number() over (partition by city) as r
from (
SELECT AS STRUCT City, APPROX_QUANTILES(CAST(Price as INTEGER), 10) AS quants, row_number() over (partition by city) as r
FROM T
where Price > 0
group by City
), unnest(quants) as quant
)
pivot( min(quant) for r in ( 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) )
If the column names should be named:
SELECT city,quant,concat("quant",cast(row_number() over (partition by city) as string)) as r
...
pivot( min(quant) for r in ( "quant1","quant2","quant3") )

Related

SQL find the 2 highest score for each country

I have two tables: maps_query and map_time like below:
CREATE TABLE maps_query (
id int
day varchar
search_query varchar
country varchar
query_score int
)
CREATE TABLE map_time (
id int
am_pm varchar
)
The question is to find the 2 highest score for each country. Desired output is like below:
country search_query query_score
CA Target 1000
CA Store 900
FR Eiffiel 1500
FR London 800
I was trying to use row_number() over but don't know how to complete the query.
Select t1.country, t1.search_query, t1.score_rank,
from (select *, (row_number() over(partition by country order by query_score) as score_rank from maps_search) t1
where score_rank = 1 and score_rank=2
This can be achieved by rank() instead of row_number().
select
*
from
(
select
*,
rank() over (
PARTITION by country
order by
query_score desc
)
from
maps_query
) q
where
rank <= 2;
A good reference article: https://spin.atomicobject.com/2016/03/12/select-top-n-per-group-postgresql/

Select row with max value from each group in Oracle SQL [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 1 year ago.
I have table people containing people, their city and their money balance:
id city_id money
1 1 25
2 1 13
3 2 97
4 2 102
5 2 37
Now, I would like to select richest person from each city. How can I do that using Oracle SQL? Desired result is:
id city_id money
1 1 25
4 2 102
Something like that would be useful:
SELECT * as tmp FROM people GROUP BY city_id HAVING money = MAX(money)
You should be thinking "filtering", not "aggregation", because you want the entire row. You can use a subquery:
select p.*
from people p
where p.money = (select max(p2.money) from people p2 where p2.city_id = p.city_id);
You can use DENSE_RANK() analytic function through grouping by city_id(by using partition by clause) and descendingly ordering by money within the subquery to pick the returned values equal to 1 within the main query in order to determine the richest person including ties(the people having the same amount of money in each city) such as
SELECT id, city_id, money
FROM( SELECT p.*,
DENSE_RANK() OVER ( PARTITION BY city_id ORDER BY money DESC ) AS dr
FROM people p )
WHERE dr = 1
You can use RANK() as its flexible as you can get richest or top N richest
SELECT
id, city_id, money
FROM (
SELECT
p.* ,RANK() OVER (PARTITION BY city_id ORDER BY money DESC ) as rank_per_city
FROM
people p )
WHERE
rank_per_city = 1

SQL change the specific row

In my table below, i want to change the placement of specific row.
For example,
ID Name Count
1 X 50
2 Y 30
3 other 25
4 Z 20
It is DESC ordered and i would like to see X,Y,Z orderly. Also, in total, 'other' should be counted. In other words, count should be 125.
You can use union all and add last row to the end.
Something like this:
select id, name,count from table where name<>other
union all
select 4 as id, "other"as name, 135 as count from table
order by 1
or if you want to sum it
select id, name,count from table where name<>other
union all
select 4 as id, 'other' as name, sum(count) as count from table
order by 1
You can put some logic in the order by clause:
select id, name, count
from table
order by case when id <> 3 then 1 else 2 end, id
This way, the first ordering criteria is "rows X, Y, Z first, then the other ones", then you order the groups the way you want, in your case either by id or by name will work.
You can find a working example here
TRY THIS :
SELECT ID,
Name,
CASE
WHEN Name = 'OTHER' THEN (SELECT SUM (COUNT) FROM YOUR_TABLE)
ELSE SUM (COUNT)
END
FROM YOUR_TABLE
GROUP BY Name
ORDER BY Name DESC
I think union all may be the simplest approach, but like this:
select id, name, count
from ((select id, name, count, 1 as ord
from t
where name in ('X', 'Y', 'Z')
) union all
(select 4, 'other', sum(count), 2 as ord
from t
)
) t
order by ord, name;

Aggregation and Total

SELECT Region ,
flag ,
Name,
COUNT(ID) AS 'CountWithFlag'
FROM Table
GROUP BY flag
this query gives me the following results. I am grouping by flag and I am able to get the counts for English/non-English based on flag. I also want to display Total Counts of English and non-English adjacent to counts
OUTPUT:
Region Flag Name CountWithFlag
a 0 English 100
b 1 Non-English 200
c 0 English 100
d 1 Non-English 200
DESIRED OUTPUT:
Region Flag Name CountWithFlag Total
a 0 English 100 200
b 1 Non-English 200 400
c 0 English 100 200
d 1 Non-English 200 400
How can I do that? I want to apply group by for specific counts with flag. But I also want to get total counts in same query!
Any inputs on how I can do that?
Another way would be something like this:
;
WITH agg1
AS (
SELECT region,
flag,
name,
COUNT(id) AS 'CountWithFlag'
FROM [dbo].[t2]
GROUP BY region,
flag,
name
),
agg2
AS (
SELECT [name],
COUNT(id) AS CountByName
FROM [dbo].[t2]
GROUP BY [name]
)
SELECT [agg1].[region],
[agg1].[flag],
[agg1].[name],
[agg1].[CountWithFlag],
[agg2].[CountByName]
FROM [agg1]
INNER JOIN [agg2]
ON [agg2].[name] = [agg1].[name]
try this
;
WITH cte
AS ( SELECT DISTINCT
Region ,
flag ,
Name ,
COUNT(ID) OVER ( PARTITION BY flag, Region, Name ) AS [CountWithFlag]
FROM [Table]
)
SELECT Region ,
flag ,
Name ,
SUM([CountWithFlag]) OVER ( PARTITION BY Name ) AS Total
FROM cte
If you want to avoid using window functions, you can do that:
SELECT
Region,
flag,
Name,
COUNT(ID) AS CountWithFlag,
(select count(ID) from Table as tbl1 where tbl1.Name=tbl.Name) AS Total
from Table as tbl
group by Region, flag, Name
But my opinion is that window aggregation should work much faster.
If you want use window aggregation then do this:
select
Region,
flag,
Name,
CountWithFlag,
sum(CountWithFlag) over(partition by Name) as Total
from (
SELECT
Region,
flag,
Name,
COUNT(ID) AS CountWithFlag
from Table as tbl
group by Region, flag, Name
) as tbl

SQL query to return only 1 record per group ID

I'm looking for a way to handle the following scenario. I have a database table that I need to return only one record for each "group id" that is contained within the table, furthermore the record that is selected within each group should be the oldest person in the household.
ID Group ID Name Age
1 134 John Bowers 37
2 134 Kerri Bowers 33
3 135 John Bowers 44
4 135 Shannon Bowers 42
So in the sample data provided above I would need ID 1 and 3 returned, as they are the oldest people within each group id.
This is being queried against a SQL Server 2005 database.
SELECT t.*
FROM (
SELECT DISTINCT groupid
FROM mytable
) mo
CROSS APPLY
(
SELECT TOP 1 *
FROM mytable mi
WHERE mi.groupid = mo.groupid
ORDER BY
age DESC
) t
or this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY groupid ORDER BY age DESC) rn
FROM mytable
) x
WHERE x.rn = 1
This will return at most one record per group even in case of ties.
See this article in my blog for performance comparisons of both methods:
SQL Server: Selecting records holding group-wise maximum
Use:
SELECT DISTINCT
t.groupid,
t.name
FROM TABLE t
JOIN (SELECT t.groupid,
MAX(t.age) 'max_age'
FROM TABLE t
GROUP BY t.groupid) x ON x.groupid = t.groupid
AND x.max_age = t.age
So what if there's 2+ people with the same age for a group? It'd be better to store the birthdate rather than age - you can always calculate the age for presentation.
Try this (assuming Group is synonym for Household)
Select * From Table t
Where Age = (Select Max(Age)
From Table
Where GroupId = t.GroupId)
If there are two or more "oldest" people in some household (They all are the same age and there is noone else older), then this will return all of them, not just one at random.
If this is an issue, then you need to add another subquery to return an arbitrary key value for one person in that set.
Select * From Table t
Where Id =
(Select Max(Id) Fom Table
Where GroupId = t.GroupId
And Age =
(Select(Max(Age) From Table
Where GroupId = t.GroupId))
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
**GROUP BY GroupID**
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST