array rank and group by in a SQL - google-bigquery

Team
Greetings!
Here is my BigQuery table.
customer state questions response_displayed datetime
--------------------------------------------------------------------------------
john NY q1 answer1,answer2,answer3 22-jan
jack NJ q2 answer1,answer2,answer3 23-jan
mario OH q3 answer1,answer2 24-jan
john NY q4 answer1,answer120 25-jan
jack NJ q5 answer2 26-jan
Here I am trying to sum all questions asked by a customer, state combination, and get the most frequent response_displayed. we have to split & unnest response_dispalyed and then take the count of each response and get the 1st rank.
Here is the sample output
customer state total_question frequent_response_dispalyed
john ny 2 answer1
jack nj 2 answer2
mario oh 1 answer1,answer2
I am unable to use the unnest(split(response_dispalyed)) along with the group by function.
Any pointers would be appreciated.

Consider below approach
select * except(responses),
array_to_string(array(
select response from (
select response, count(*) frequency
from unnest(responses) response
group by response
)
where true
qualify rank() over(order by frequency desc) = 1
), ',') as frequent_response_dispalyed
from (
select customer, state, count(*) as total_question,
array_concat_agg(split(response_displayed)) as responses
from `project.dataset.table`
group by customer, state
)
if applied to sample data in your question - output is

Related

How to compute number of friends of friends in a BigQuery table with repeated records?

I have a BigQuery table with the following format:
person
friends.name
friends.year
John
Mary
1977
Mike
1984
Mary
John
1980
Mike
John
1977
Jane
1971
I want to compute, for each person, the maximum year in a separate column, and also for each friends record I would like to get the number of friends that each of the friends has (which would be achieved either with a self join, or with a window function).
I am not sure how to write this query, my approach so far has been:
SELECT person,
ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f),
ARRAY_LENGTH(friends) AS number_friends
FROM table
However, this does not compute the number of friends for each array struct value. This is the output I am expecting:
person
friends.name
friends.year
friends.num_friends
max_year
John
Mary
1977
1
1984
Mike
1984
2
Mary
John
1980
2
1980
Mike
John
1977
2
1977
Jane
1971
0
How can I write this query in an optimised way?
Consider below approach
with friends_count as (
select person, ifnull(num_friends, 0) num_friends from (
select distinct name as person
from your_table, unnest(friends)
) left join (
select person, array_length(friends) num_friends
from your_table
) using(person)
)
select person, array(
select as struct name, year, ifnull(num_friends, 0) num_friends
from t.friends join friends_count on name = person
) friends,
(select max(year) from t.friends) max_year
from your_table t
if applied to sample data in your question - output is

Advanced Sql query solution required

player team start_date end_date points
John Jacob SportsBallers 2015-01-01 2015-03-31 100
John Jacob SportsKings 2015-04-01 2015-12-01 115
Joe Smith PointScorers 2014-01-01 2016-12-31 125
Bill Johnson SportsKings 2015-01-01 2015-06-31 175
Bill Johnson AllStarTeam 2015-07-01 2016-12-31 200
The above table has many more rows. I was asked the below questions in an interview.
1.)For each player, which team were they play for on 2015-01-01?
I could not answer this one.
2.)For each player, how can we get the team for whom they scored the most points?
select team from Players
where points in (select max(points) from players group by player).
Please, solutions for both.
1
select *
from PlayerTeams
where startdate <='2015-01-01' and enddate >= '2015-01-01'
2
Select player, team, points
from(
Select *, row_number() over (partition by player order by points desc) as rank
From PlayerTeams) as player
where rank = 1
For #1:
Select Player
,Team
From table
Where '2015-01-01' between start_date and end_date
For #2:
select t.Player
,t.Team
from table t
inner join (select Player
,Max(points)
from table
group by Player) m
on t.Player = m.Player
and t.points = m.points

Issue with returning distinct records based on single column (Oracle)

If I have the table "members" (shown below), how would I go about getting the record of the first occurrence of a membership_id (Oracle).
Expected results
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
Table - members
membership_id first_name last_name status type
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
123 Bob Dole A A
313 Lucas Smith A A
SELECT membership_id,
first_name,
last_name,
status,
type
FROM( SELECT membership_id,
first_name,
last_name,
status,
type,
rank() over (partition by membership_id
order by type desc) rnk
FROM members )
WHERE rnk = 1
will work for your sample data set. If you can have ties-- that is, multiple rows with the same membership_id and the same maximum type-- this query will return all those rows. If you only want to return one of the rows where there is a tie, you would either need to add additional criteria to the order by to ensure that all ties are broken or you would need to use the row_number function rather than rank which will arbitrarily break ties.
Select A.*
FROM Members AS A inner join
(Select membership_id, first(first_name) AS FN, first(last_name) AS LN
From Members
Group by membership_id) AS B
ON A.membership_id=B.membership_id and A.first_name=B.FN and A.last_name=B.LN
Hope that helps!
select *
from members
where rowid in (
select min(rowid)
from members
group by membership_id
)

Retrieve highest value from sql table

How can retrieve that data:
Name Title Profit
Peter CEO 2
Robert A.D 3
Michael Vice 5
Peter CEO 4
Robert Admin 5
Robert CEO 13
Adrin Promotion 8
Michael Vice 21
Peter CEO 3
Robert Admin 15
to get this:
Peter........4
Robert.......15
Michael......21
Adrin........8
I want to get the highest profit value from each name.
If there are multiple equal names always take the highest value.
select name,max(profit) from table group by name
Since this type of request almost always follows with "now can I include the title?" - here is a query that gets the highest profit for each name but can include all the other columns without grouping or applying arbitrary aggregates to those other columns:
;WITH x AS
(
SELECT Name, Title, Profit, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Profit DESC)
FROM dbo.table
)
SELECT Name, Title, Profit
FROM x
WHERE rn = 1;

How to produce detail, not summary, report sorted by count(*)?

Oracle 11g:
I want results to list by highest count, then ch_id. When I use group by to get the count then I loose the granularity of the detail. Is there an analytic function I could use?
SALES
ch_id desc customer
=========================
ANAR Anari BOB
SWIS Swiss JOE
SWIS Swiss AMY
BRUN Brunost SAM
BRUN Brunost ANN
BRUN Brunost ROB
Desired Results
count ch_id customer
===========================================
3 BRUN ANN
3 BRUN ROB
3 BRUN SAM
2 SWIS AMY
2 SWIS JOE
1 ANAR BOB
Use the analytic count(*):
select * from
(
select count(*) over (partition by ch_id) cnt,
ch_id, customer
from sales
)
order by cnt desc
select total, ch_id, customer
from sales s
inner join (select count(*) total, ch_id from sales group by ch_id) b
on b.ch_id = s.chi_id
order by total, ch_id
ok - the other post that happened at the same time, using partition, is the better solution for Oracle. But this one works regardless of DB.