Get the attributes of the most recent row in BigQuery? - sql

I'm working in BigQuery. I have a table t1 which has address, postcode, price and date fields. I want to group this by address and postcode, an find the price of the most recent row for each address.
How can I do this in BigQuery? I know how to get the address, postcode and most recent date:
SELECT
ADDRESS, POSTCODE, MAX(DATE)
FROM
[mytable]
GROUP BY
ADDRESS,
POSTCODE
But I don't know how to get the price of these rows matching these fields. This is my best guess, which does produce results - will this be correct?
SELECT
t1.address, t1.postcode, t1.date, t2.price
FROM [mytable] t2
JOIN
(SELECT
ADDRESS, POSTCODE, MAX(DATE) AS date
FROM
[mytable]
GROUP BY
ADDRESS,
POSTCODE) t1
ON t1.address=t2.address
AND t1.postcode=t2.postcode
AND t1.date=t2.date
This seems to me like it should work, but some of the similar questions have solutions that are much more complex.

Just use row_number():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ADDRESS, POSTCODE
ORDER BY DATE DESC
) as seqnum
FROM [mytable] t
) t
WHERE seqnum = 1;
This is not an aggregation query. You want to filter the rows to get the most recent value.

Try below for BigQuery Standard SQL
#standardSQL
SELECT row.* FROM (
SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
FROM `yourTable` AS t
GROUP BY address, postcode
)
You can play/test it with dummy data as below
#standardSQL
WITH yourTable AS (
SELECT 'address_1' AS address, 'postcode_1' AS postcode, '2017-01-01' AS date, 1 AS price UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-02', 2 UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-03', 3 UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-04', 4 UNION ALL
SELECT 'address_2', 'postcode_2', '2017-01-01', 5 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-01', 6 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-02', 7 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-03', 8
)
SELECT row.* FROM (
SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
FROM `yourTable` AS t
GROUP BY address, postcode
)

Related

SQL Query for finding longest streak of wins

I have data like below -
Year,winning_country
2001,IND
2002,IND
2003,IND
2004,AUS
2005,AUS
2006,SA
2007,SA
2008,SA
2009,IND
2010,IND
2011,IND
2012,IND
2013,AUS
2014,AUS
2015,SA
2016,NZ
2017,SL
2018,IND
The question here is to find out the longest streak of wins for each country and desired output will be like below -
Country,no_of_wins
IND,4
AUS,2
SA,3
SL,1
NZ,1
Can someone help here.
This is a gaps and islands problem, but the simplest method is to subtract a sequence from the year. So, to get all the sequences:
select country, count(*) as streak,
min(year) as from_year, max(year) as to_year
from (select year, country,
row_number() over (partition by country order by year) as seqnum
from t
) t
group by country, (year - seqnum);
To get the longest per country, aggregate again or use window functions:
select country, streak
from (select country, count(*) as streak,
min(year) as from_year, max(year) as to_year,
row_number() over (partition by country order by count(*) desc) as seqnum_2
from (select year, country,
row_number() over (partition by country order by year) as seqnum
from t
) t
group by country, (year - seqnum)
) cy
where seqnum_2 = 1;
I prefer using row_number() to get the longest streak because it allows you to also get the years when it occurred.
Looks like an gaps-and-islands problem.
The SQL below calculates some ranking based on 2 row_number.
Then it's just a matter of grouping.
SELECT q2.Country, MAX(q2.no_of_wins) AS no_of_wins
FROM
(
SELECT q1.winning_country as Country,
COUNT(*) AS no_of_wins
FROM
(
SELECT t.Year, t.winning_country,
(ROW_NUMBER() OVER (ORDER BY t.Year ASC) -
ROW_NUMBER() OVER (PARTITION BY t.winning_country ORDER BY t.Year)) AS rnk
FROM yourtable t
) q1
GROUP BY q1.winning_country, q1.rnk
) q2
GROUP BY q2.Country
ORDER BY MAX(q2.no_of_wins) DESC
If Redshift supports analytic function, below would be the query.
with t1 as
(
select 2001 as year,'IND' as cntry from dual union
select 2002,'IND' from dual union
select 2003,'IND' from dual union
select 2004,'AUS' from dual union
select 2005,'AUS' from dual union
select 2006,'SA' from dual union
select 2007,'SA' from dual union
select 2008,'SA' from dual union
select 2009,'IND' from dual union
select 2010,'IND' from dual union
select 2011,'IND' from dual union
select 2012,'IND' from dual union
select 2013,'AUS' from dual union
select 2014,'AUS' from dual union
select 2015,'SA' from dual union
select 2016,'NZ' from dual union
select 2017,'SL' from dual union
select 2018,'IND' from dual) ,
t2 as (select year, cntry, year - row_number() over (partition by cntry order by year) as grpBy from t1 order by cntry),
t3 as (select cntry, count(grpBy) as consWins from t2 group by cntry, grpBy),
res as (select cntry, consWins, row_number() over (partition by cntry order by consWins desc) as rnk from t3)
select cntry, consWins from res where rnk=1;
Hope this helps.
Here is a solution that leverages the use of Redshift Python UDF's
There may be simpler ways to achieve the same but this is a good example of how to create a simple UDF.
create table temp_c (competition_year int ,winning_country varchar(4));
insert into temp_c (competition_year, winning_country)
values
(2001,'IND'),
(2002,'IND'),
(2003,'IND'),
(2004,'AUS'),
(2005,'AUS'),
(2006,'SA'),
(2007,'SA'),
(2008,'SA'),
(2009,'IND'),
(2010,'IND'),
(2011,'IND'),
(2012,'IND'),
(2013,'AUS'),
(2014,'AUS'),
(2015,'SA'),
(2016,'NZ'),
(2017,'SL'),
(2018,'IND')
;
create or replace function find_longest_streak(InputStr varChar)
returns integer
stable
as $$
MaxStreak=0
ThisStreak=0
ThisYearStr=''
LastYear=0
for ThisYearStr in InputStr.split(','):
if int(ThisYearStr) == LastYear + 1:
ThisStreak+=1
else:
if ThisStreak > MaxStreak:
MaxStreak=ThisStreak
ThisStreak=1
LastYear=int(ThisYearStr)
return max(MaxStreak,1)
$$ language plpythonu;
select winning_country,
find_longest_streak(listagg(competition_year,',') within group (order by competition_year))
from temp_c
group by winning_country
order by 2 desc
;
How about something like...
SELECT
winning_country,
COUNT(*)
GROUP BY winning_country
HAVING MAX(year) - MIN(year) = COUNT(year) - 1
This assumes no duplicate entries.
Creating a session abstraction do the trick:
WITH winning_changes AS (
SELECT *,
CASE WHEN LAG(winning_country) OVER (ORDER BY year) <> winning_country THEN 1 ELSE 0 END AS same_winner
FROM winners
),
sequences AS (
SELECT *,
SUM(same_winner) OVER (ORDER BY year) AS winning_session
FROM winning_changes
),
streaks AS (
SELECT winning_country AS country,
winning_session,
COUNT(*) streak
FROM sequences
GROUP BY 1,2
)
SELECT country,
MAX(streak) AS no_of_wins
FROM streaks
GROUP BY 1;

Select a third column based on two distant rows within the same table

I want to select a third column based on two distant columns within the same table.
I could only think of this:
select tl.thirdcolumn
from table1 t1
WHERE
EXISTS
(
Select distinct tl.firstcolumn , t1.secondcolumn
From t1
)
This:
select distinct tl.thirdcolumn
from table t1
won't work as I don't want the distinct thirdrow. I want the thirdrow to be based on the first two rows being distinct.
I guess its a kind of nested sql statment with a select top 1... idk
CATEGORY NAME Query
---------------------------------------------------
STUDENTS NUMBER_OF_CHAPTERS QueryA
STUDENTS NUMBER_OF_STUDENT_MEMBERS QueryB
STUDENTS NUMBER_OF_STUDENT_MEMBERS QueryB
MEMBERS NUMBER_OF_MEMBERS_WORLDWIDE QueryC
MEMBERS NUMBER_OF_MEMBERS_WORLDWIDE QueryC
Your question is rather hard to follow, but I think you might simply want group by:
select tl.firstcolumn , t1.secondcolumn, max(tl.thirdcolumn)
from table1 t1
group by tl.firstcolumn , t1.secondcolumn;
If you want rows where the pair of values only appears once, then add having count(*) = 1:
select tl.firstcolumn , t1.secondcolumn, max(tl.thirdcolumn)
from table1 t1
group by tl.firstcolumn , t1.secondcolumn
having count(*) = 1;
Query -
SELECT
CATEGORY,NAME,QUERY
FROM
(
WITH TAB AS (
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_CHAPTERS' AS NAME,
'QUERYA' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_STUDENT_MEMBERS' AS NAME,
'QUERYB' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'STUDENTS' AS CATEGORY,
'NUMBER_OF_STUDENT_MEMBERS' AS NAME,
'QUERYB' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'MEMBERS' AS CATEGORY,
'NUMBER_OF_MEMBERS_WORLDWIDE' AS NAME,
'QUERYC' AS QUERY
FROM
DUAL
UNION ALL
SELECT
'MEMBERS' AS CATEGORY,
'NUMBER_OF_MEMBERS_WORLDWIDE' AS NAME,
'QUERYC' AS QUERY
FROM
DUAL
) SELECT
CATEGORY,
NAME,
QUERY,
COUNT(*) OVER(PARTITION BY
CATEGORY,
NAME
ORDER BY
CATEGORY,
NAME,
QUERY
) AS RNK
FROM
TAB
)
WHERE
RNK = 1;
Output -
"CATEGORY","NAME","QUERY"
"STUDENTS","NUMBER_OF_CHAPTERS","QueryA"

Join based on min

I have two tables.
Table1:
id, date
Table2:
id,date
Both the table contain information about id. Table1 and Table2 can have some extra rows which are not present in another table.
Example:
Table1:
1,15-Jun
2,16-Jun
4,17-Jun
Table2
1,14-Jun
2,17-Jun
3,18-Jun
I need a summarize result which give minimum date for each row.
Expected result:
1,14-Jun
2,16-Jun
3,18-Jun
4,17-Jun
select id, min(date_) from (
select id, date_ from table1
union all
select id, date_ from table12
) group by id;
SELECT id, MIN(date)
FROM (SELECT id, date
FROM Table1
UNION
SELECT id, date
FROM Table2)
GROUP BY id
with a as(select t.i_id,t.dt_date from t
union
select b.i_id,b.dt_date from b)
select a.i_id,min(a.dt_date) from a group by a.i_id order by a.i_id;
You can check this link

How do i convert columns into rows for each status

Data is saved in a table is as below.
I need to show data as below.
Please suggest a query
This is how you would do it in SQL Server:
SELECT Name, 'Joined' AS [ACTION], JOIN_DT AS ACTION_DATE
FROM SomeTable
UNION ALL
SELECT Name, 'Started' START_DTTM
FROM SomeTable
UNION ALL
SELECT Name, 'ended', END_DT
FROM SomeTable
Try this:
SELECT Name, Action, Action_Date FROM (
SELECT Name, 'Joined' as Action, JOIN_DT as ACTION_DATE FROM TableA
UNION ALL
SELECT Name, 'Started', START_DT FROM TableA
UNION ALL
SELECT Name, 'Ended', END_DT FROM TableA)
ORDER BY Name;

A simple way to sum a result from UNION in MySQL

I have a union of three tables (t1, t2, t3).
Each rerun exactly the same number of records, first column is id, second amount:
1 10
2 20
3 20
1 30
2 30
3 10
1 20
2 40
3 50
Is there a simple way in SQL to sum it up, i.e. to only get:
1 60
2 80
3 80
select id, sum(amount) from (
select id,amount from table_1 union all
select id,amount from table_2 union all
select id,amount from table_3
) x group by id
SELECT id, SUM(amount) FROM
(
SELECT id, SUM(amount) AS `amount` FROM t1 GROUP BY id
UNION ALL
SELECT id, SUM(amount) AS `amount` FROM t2 GROUP BY id
) `x`
GROUP BY `id`
I groupped each table and unioned because i think it might be faster, but you should try both solutions.
Subquery:
SELECT id, SUM(amount)
FROM ( SELECT * FROM t1
UNION ALL SELECT * FROM t2
UNION ALL SELECT * FROM t3
)
GROUP BY id
Not sure if MySQL uses common table expression but I would do this in postgres:
WITH total AS(
SELECT id,amount AS amount FROM table_1 UNION ALL
SELECT id,amount AS amount FROM table_2 UNION ALL
SELECT id,amount AS amount FROM table_3
)
SELECT id, sum(amount)
FROM total
I think that should do the trick as well.
As it's not very clear from previous answers, remember to give aliases (on MySQL/MariaDb) or you'll get error:
Every derived table must have its own alias
select id, sum(amount) from (
select id,amount from table_1 union all
select id,amount from table_2 union all
select id,amount from table_3
) AS 'aliasWhichIsNeeded'
group by id
Yes!!! Its okay! Thanks!!!!
My code finishing:
SELECT SUM(total)
FROM (
(SELECT 1 as id, SUM(e.valor) AS total FROM entrada AS e)
UNION
(SELECT 1 as id, SUM(d.valor) AS total FROM despesa AS d)
UNION
(SELECT 1 as id, SUM(r.valor) AS total FROM recibo AS r WHERE r.status = 'Pago')
) x group by id
SELECT BANKEMPNAME, workStation, SUM (CALCULATEDAMOUNT) FROM(
SELECT BANKEMPNAME, workStation, SUM(CALCULATEDAMOUNT) AS CALCULATEDAMOUNT,SALARYMONTH
FROM dbo.vw_salaryStatement
WHERE (ITEMCODE LIKE 'A%')
GROUP BY BANKEMPNAME,workStation, SALARYMONTH
union all
SELECT BANKEMPNAME, workStation, SUM(CALCULATEDAMOUNT) AS CALCULATEDAMOUNT,SALARYMONTH
FROM dbo.vw_salaryStatement
WHERE (ITEMCODE NOT LIKE 'A%')
GROUP BY BANKEMPNAME, workStation, SALARYMONTH) as t1
WHERE SALARYMONTH BETWEEN '20220101' AND '20220131'
group by BANKEMPNAME, workStation
order by BANKEMPNAME asc
IN MSSQL You can write this way, But Doing UNION ALL THE Column should be the same for both ways.
I have given this example So that you can understand the process...