How to compute number of friends of friends in a BigQuery table with repeated records? - sql

I have a BigQuery table with the following format:
person
friends.name
friends.year
John
Mary
1977
Mike
1984
Mary
John
1980
Mike
John
1977
Jane
1971
I want to compute, for each person, the maximum year in a separate column, and also for each friends record I would like to get the number of friends that each of the friends has (which would be achieved either with a self join, or with a window function).
I am not sure how to write this query, my approach so far has been:
SELECT person,
ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f),
ARRAY_LENGTH(friends) AS number_friends
FROM table
However, this does not compute the number of friends for each array struct value. This is the output I am expecting:
person
friends.name
friends.year
friends.num_friends
max_year
John
Mary
1977
1
1984
Mike
1984
2
Mary
John
1980
2
1980
Mike
John
1977
2
1977
Jane
1971
0
How can I write this query in an optimised way?

Consider below approach
with friends_count as (
select person, ifnull(num_friends, 0) num_friends from (
select distinct name as person
from your_table, unnest(friends)
) left join (
select person, array_length(friends) num_friends
from your_table
) using(person)
)
select person, array(
select as struct name, year, ifnull(num_friends, 0) num_friends
from t.friends join friends_count on name = person
) friends,
(select max(year) from t.friends) max_year
from your_table t
if applied to sample data in your question - output is

Related

array rank and group by in a SQL

Team
Greetings!
Here is my BigQuery table.
customer state questions response_displayed datetime
--------------------------------------------------------------------------------
john NY q1 answer1,answer2,answer3 22-jan
jack NJ q2 answer1,answer2,answer3 23-jan
mario OH q3 answer1,answer2 24-jan
john NY q4 answer1,answer120 25-jan
jack NJ q5 answer2 26-jan
Here I am trying to sum all questions asked by a customer, state combination, and get the most frequent response_displayed. we have to split & unnest response_dispalyed and then take the count of each response and get the 1st rank.
Here is the sample output
customer state total_question frequent_response_dispalyed
john ny 2 answer1
jack nj 2 answer2
mario oh 1 answer1,answer2
I am unable to use the unnest(split(response_dispalyed)) along with the group by function.
Any pointers would be appreciated.
Consider below approach
select * except(responses),
array_to_string(array(
select response from (
select response, count(*) frequency
from unnest(responses) response
group by response
)
where true
qualify rank() over(order by frequency desc) = 1
), ',') as frequent_response_dispalyed
from (
select customer, state, count(*) as total_question,
array_concat_agg(split(response_displayed)) as responses
from `project.dataset.table`
group by customer, state
)
if applied to sample data in your question - output is

SQL query to get only rows match the condition based on two separated columns under one 'group by'

The simple SELECT query would return the data as below:
Select ID, User, Country, TimeLogged from Data
ID User Country TimeLogged
1 Samantha SCO 10
1 John UK 5
1 Andrew NZL 15
2 John UK 20
3 Mark UK 10
3 Mark UK 20
3 Steven UK 10
3 Andrew NZL 15
3 Sharon IRL 5
4 Andrew NZL 25
4 Michael AUS 5
5 Jessica USA 30
I would like to return a sum of time logged for each user grouped by ID
But for only ID numbers where both of these values Country = UK and User = Andrew are included within their rows.
So the output in the above example would be
ID User Country TimeLogged
1 John UK 5
1 Andrew NZL 15
3 Mark UK 30
3 Steven UK 10
3 Andrew NZL 15
First you need to identify which IDs you're going to be returning
SELECT ID FROM MyTable WHERE Country='UK'
INTERSECT
SELECT ID FROM MyTable WHERE [User]='Andrew';
and based on that, you can then filter to aggregate the expected rows.
SELECT ID,
[User],
Country,
SUM(Timelogged) as Timelogged
FROM mytable
WHERE (Country='UK' OR [User]='Andrew')
AND ID IN( SELECT ID FROM MyTable WHERE Country='UK'
INTERSECT
SELECT ID FROM MyTable WHERE [User]='Andrew')
GROUP BY ID, [User], country;
So, you have described what you need to write almost perfectly but not quite. Your result table indicates that you want Country = UK OR User = Andrew, rather than AND
You need to select and group by, then include a WHERE:-
Select ID, User, Country, SUM(Timelogged) as Timelogged from mytable
WHERE Country='UK' OR User='Andrew'
Group by ID, user, country

Issue with returning distinct records based on single column (Oracle)

If I have the table "members" (shown below), how would I go about getting the record of the first occurrence of a membership_id (Oracle).
Expected results
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
Table - members
membership_id first_name last_name status type
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
123 Bob Dole A A
313 Lucas Smith A A
SELECT membership_id,
first_name,
last_name,
status,
type
FROM( SELECT membership_id,
first_name,
last_name,
status,
type,
rank() over (partition by membership_id
order by type desc) rnk
FROM members )
WHERE rnk = 1
will work for your sample data set. If you can have ties-- that is, multiple rows with the same membership_id and the same maximum type-- this query will return all those rows. If you only want to return one of the rows where there is a tie, you would either need to add additional criteria to the order by to ensure that all ties are broken or you would need to use the row_number function rather than rank which will arbitrarily break ties.
Select A.*
FROM Members AS A inner join
(Select membership_id, first(first_name) AS FN, first(last_name) AS LN
From Members
Group by membership_id) AS B
ON A.membership_id=B.membership_id and A.first_name=B.FN and A.last_name=B.LN
Hope that helps!
select *
from members
where rowid in (
select min(rowid)
from members
group by membership_id
)

How to produce detail, not summary, report sorted by count(*)?

Oracle 11g:
I want results to list by highest count, then ch_id. When I use group by to get the count then I loose the granularity of the detail. Is there an analytic function I could use?
SALES
ch_id desc customer
=========================
ANAR Anari BOB
SWIS Swiss JOE
SWIS Swiss AMY
BRUN Brunost SAM
BRUN Brunost ANN
BRUN Brunost ROB
Desired Results
count ch_id customer
===========================================
3 BRUN ANN
3 BRUN ROB
3 BRUN SAM
2 SWIS AMY
2 SWIS JOE
1 ANAR BOB
Use the analytic count(*):
select * from
(
select count(*) over (partition by ch_id) cnt,
ch_id, customer
from sales
)
order by cnt desc
select total, ch_id, customer
from sales s
inner join (select count(*) total, ch_id from sales group by ch_id) b
on b.ch_id = s.chi_id
order by total, ch_id
ok - the other post that happened at the same time, using partition, is the better solution for Oracle. But this one works regardless of DB.

Select query which returns exect no of rows as compare in values of sub query

I have got a table named student. I have written this query:
select * From student where sname in ('rajesh','rohit','rajesh')
In the above query it's returning me two records; one matching 'rajesh' and another matching: 'rohit'.
But i want there to be 3 records: 2 for 'rajesh' and 1 for 'rohit'.
Please provide me some solution or tell me where i am missing.
NOTE: the count of result of sub query is not fix there can be many words there some distinct and some multiple occurrence .
Thanks
Your requirements are not clear, and I'll try to explain why.
Let's define table students
ID FirstName LastName
1 John Smith
2 Mike Smith
3 Ben Bray
4 John Bray
5 John Smith
6 Bill Lynch
7 Bill Smith
Query with WHERE clause:
FirstName in ('Mike', 'Ben', 'Mike')
will return 2 rows only, because it could be rewritten as:
FirstName='Mike' or FirstName='Ben' or FirstName='Mike'
WHERE is filtering clause that just says if existing row satisfy given conditions or not (for each of rows created by FROM clause.
Let's say we have subquery that returns any number of non distinct FirstNames
In case if SQ contains 'Mike', 'Ben', 'Mike' using inner join you can get those 3 rows without problem
Select ST.* from Students ST
Inner Join (Select name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Result will be:
ID FirstName LastName
2 Mike Smith
2 Mike Smith
3 Ben Bray
Note data are not ordered by order of names returning by SQ. If you want that, SQ should return some ordering number, eg.:
Ord Name
1. Mike
2. Ben
3. Mike
In that case query should be:
Select ST.* from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Order By SQ.ord
And result:
ID FirstName LastName
2 Mike Smith (1)
3 Ben Bray (2)
2 Mike Smith (3)
Now, let's se what will happen if subquery returns
Ord Name
1. Mike
2. Bill
3. Mike
You will end up with
ID FirstName LastName
2 Mike Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
2 Mike Smith (3)
Even worse, if you have something like:
Ord Name
1. John
2. Bill
3. John
Result is:
ID FirstName LastName
1 John Smith (1)
4 John Bray (1)
5 John Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
1 John Smith (3)
4 John Bray (3)
5 John Smith (3)
This is an complex situation, and you have to clarify precisely what requirement is.
If you need only one student with the same name, for each of rows in SQ, you can use something like SQL 2005+):
;With st1 as
(
Select Row_Number() over (Partition by SQ.ord Order By ID) as rowNum,
ST.ID,
ST.FirstName,
ST.LastName,
SQ.ord
from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
)
Select ID, FirstName, LastName
From st1
Where rowNum=1 -- that was missing row, added later
Order By ord
It will return (for SQ values John, Bill, John)
ID FirstName LastName
1 John Smith (1)
6 Bill Lynch (2)
1 John Smith (3)
Note, numbers (1),(2),(3) are shown to display value of ord although they are not returned by query.
If you can split the where clause in your calling code, you could perform a UNION ALL on each clause.
SELECT * FROM Student WHERE sname = 'rajesh'
UNION ALL SELECT * FROM Student WHERE sname = 'rohit'
UNION ALL SELECT * FROM Student WHERE sname = 'rajesh'
Try using a JOIN:
SELECT ...
FROM Student s
INNER JOIN (
SELECT 'rajesh' AS sname
UNION ALL
SELECT 'rohit'
UNION ALL
SELECT 'rajesh') t ON s.sname = t.sname
just because you've got a criteria in there two times doesn't mean that it will return 1 result per criteria. SQL engines usually just use the unique criteria - thus, from your example, there will be 2 criteria in IN clause: 'rajesh','rohit'
WHY do you need to return 2 results? are there two rajesh in your table? they should BOTH return then. You don't need to ask for rajesh twice for that to happen. What does your data look like? What do you want to see returned?
Hi i am query just as you give above and it give me all data that matches in the condition of in clause. just like your post
select * from person
where personid in (
'Carson','Kim','Carson'
)
order by FirstName
and its give me all records which fulfill this Criteria