SQL : How to select the most recent value by country - sql

I have a table with 3 columns : day, country, value. There are many values by country with different dates. For example :
DAY COUNTRY VALUE
04-SEP-19 BELGIUM 2124
15-MAR-19 BELGIUM 2135
21-MAY-19 SPAIN 1825
18-JUL-19 SPAIN 1724
26-MAR-19 ITALY 4141
I want to select the most recent value by country. For example :
DAY COUNTRY VALUE
04-SEP-19 BELGIUM 2124
18-JUL-19 SPAIN 1724
26-MAR-19 ITALY 4141
What is the sql query I can use?
Thank you for your help.

You can use the row_number() window function (if your DBMS supports it)).
SELECT x.day,
x.country,
x.value
FROM (SELECT t.day,
t.country,
t.value,
row_number() OVER (PARTITION BY t.country
ORDER BY t.day DESC) rn
FROM elbat t) x
WHERE x.rn = 1;

Another way of doing this is using a window function (SQL Server, MySQL8 etc)
e.g.
ROW_NUMBER() OVER ( PARTITION BY COUNTRY ORDER BY CONVERT(DATE, [Day]) DESC )
Then just filter to where this function returns 1
full example:
WITH TestData
AS ( SELECT '04-SEP-19' AS [Day], 'BELGIUM' AS [COUNTRY], 2124 AS [VALUE]
UNION
SELECT '15-MAR-19' AS [Day], 'BELGIUM' AS [COUNTRY], 2135 AS [VALUE]
UNION
SELECT '21-MAY-19' AS [Day], 'SPAIN' AS [COUNTRY], 1825 AS [VALUE]
UNION
SELECT '18-JUL-19' AS [Day], 'SPAIN' AS [COUNTRY], 1724 AS [VALUE]
UNION
SELECT '26-MAR-19' AS [Day], 'ITALY' AS [COUNTRY], 4141 AS [VALUE] ),
TestDataRanked
AS ( SELECT *,
ROW_NUMBER() OVER ( PARTITION BY COUNTRY ORDER BY CONVERT(DATE, [Day]) DESC ) AS SelectionRank
FROM TestData )
SELECT [Day],
COUNTRY,
[VALUE]
FROM TestDataRanked
WHERE SelectionRank = 1;

I understand the problem as you want the most recent value for all countries, as the country can repeat in the table(?):
select distinct t1.DAY, t1.COUNTRY, t1.VALUE
FROM day_test t1
inner join day_test t2 on t1.day in
(select max(day) from day_test t3 where t1.country = t3.country )
and t1.country = t2.country
I tested it and it works.

Let's suppose that the type of day column is date.
In the subquery, you can find the tuple of (country, max date) and to add the value, you can join as mentioned in the comments or use IN
SELECT DISTINCT day, country, value
FROM yourTable
WHERE (country, day)
in (
SELECT country, MAX(day) as day
FROM yourTable
GROUP BY country, value
)

You can use the following query:
Just replace the TABLE_NAME with the name of your table.
SELECT
COUNTRY,
VALUE,
MAX(DATE) AS "MostRecent"
FROM TABLE_NAME
GROUP BY COUNTRY;

Related

Top 25 Pageviews in each Country SQL

I am looking for the top 25 blog search in each country
Please help me out in this. Thanks in advance
with Result as ( select
sum(Pageviews) Total_Page,page_path,date
,case
when "PROFILE" = 44399579 then 'India'
when "PROFILE" = 36472271 then 'China'
when "PROFILE" = 41751607 then 'Russia'
else null
end COUNTRY,
Dense_rank() over(PARTITION BY Country order by sum(Pageviews) desc) as Test
From ""GOOGLE_ANALYTICS_PHASE1"."PAGES"
where PAGE_PATH like '%blog%' //and PAGE_PATH = '/blog?category_id=8&page=3'
group by page_path,country,date)
select top 100 Total_Page,
page_path,country,test,date
from result
where test <= 25 and Date between '2022-05-01' and '2022-05-31'
Snowflake SQL:
If you want the TOP 25 page views, per country only for the pages in date period defined.
using this fake data:
with PAGES(pageviews, profile, page_path, date) as (
select * from values
(100, 44399579, 'blog1', '2022-05-31'::date),
(1000, 44399579, 'blog1', '2022-05-30'::date),
(200, 44399579, 'blog2', '2022-05-31'::date),
(2000, 44399579, 'blog2', '2022-04-01'::date)
)
and with 25 changed to 1 to "show it working"
SELECT
b.total_page
,b.page_path
,b.date
,b.country
FROM (
SELECT a.*
,SUM(total_page) over(partition by country, page_path) as tt
FROM (
SELECT
SUM(pageviews) AS total_page
,page_path
,date
,CASE profile
WHEN 44399579 THEN 'United States'
WHEN 36472271 THEN 'New Zealand'
WHEN 41751607 THEN 'Australia'
ELSE null
END AS country
FROM pages //"FIVETRAN_DATABASE_COMVITA"."GOOGLE_ANALYTICS_PHASE1"."PAGES"
WHERE page_path LIKE '%blog%'
AND Date BETWEEN '2022-05-01' AND '2022-05-31'
//and PAGE_PATH = '/blog?category_id=8&page=3'
GROUP BY 2,3,4
) as A
) as B
QUALIFY DENSE_RANK() OVER (PARTITION BY country ORDER BY tt desc) <= 1
gives:
TOTAL_PAGE
PAGE_PATH
DATE
COUNTRY
100
blog1
2022-05-31
United States
1,000
blog1
2022-05-30
United States
where-as if you want all time top pages, but only showing the top pages in the current period..
SELECT
b.total_page
,b.page_path
,b.date
,b.country
FROM (
SELECT a.*
,SUM(total_page) over(partition by country, page_path) as tt
FROM (
SELECT
SUM(pageviews) AS total_page
,page_path
,date
,CASE profile
WHEN 44399579 THEN 'United States'
WHEN 36472271 THEN 'New Zealand'
WHEN 41751607 THEN 'Australia'
ELSE null
END AS country
FROM pages //"FIVETRAN_DATABASE_COMVITA"."GOOGLE_ANALYTICS_PHASE1"."PAGES"
WHERE page_path LIKE '%blog%'
//and PAGE_PATH = '/blog?category_id=8&page=3'
GROUP BY 2,3,4
) as A
) as B
WHERE Date BETWEEN '2022-05-01' AND '2022-05-31'
QUALIFY DENSE_RANK() OVER (PARTITION BY country ORDER BY tt desc) <= 1
now returns:
TOTAL_PAGE
PAGE_PATH
DATE
COUNTRY
200
blog2
2022-05-31
United States
Because blog2 has the all time record, but the 200 views is the only one in the window of interest.

conditional running sum

I'm trying to return the number of unique users that converted over time.
So I have the following query:
WITH CTE
As
(
SELECT '2020-04-01' as date,'userA' as user,1 as goals Union all
SELECT '2020-04-01','userB',0 Union all
SELECT '2020-04-01','userC',0 Union all
SELECT '2020-04-03','userA',1 Union all
SELECT '2020-04-05','userC',1 Union all
SELECT '2020-04-06','userC',0 Union all
SELECT '2020-04-06','userB',0
)
select
date,
COUNT(DISTINCT
IF
(goals >= 1,
user,
NULL)) AS cad_converters
from CTE
group by date
I'm trying to count distinct user but I need to find a way to apply the distinct count to the whole date. I probably need to do something like a cumulative some...
expected result would be something like this
date, goals, total_unique_converted_users
'2020-04-01',1,1
'2020-04-01',0,1
'2020-04-01',0,1
'2020-04-03',1,2
'2020-04-05',1,2
'2020-04-06',0,2
'2020-04-06',0,2
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.date, t.goals, total_unique_converted_users
FROM `project.dataset.table` t
LEFT JOIN (
SELECT a.date,
COUNT(DISTINCT IF(b.goals >= 1, b.user, NULL)) AS total_unique_converted_users
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.date >= b.date
GROUP BY a.date
)
USING(date)
I would approach this by tagging when the first goal is scored for each name. Then simply do a cumulative sum:
select cte.* except (seqnum), countif(seqnum = 1) over (order by date)
from (select cte.*,
(case when goals = 1 then row_number() over (partition by user, goals order by date) end) as seqnum
from cte
) cte;
I realize this can be expressed without the case in the subquery:
select cte.* except (seqnum), countif(seqnum = 1 and goals = 1) over (order by date)
from (select cte.*,
row_number() over (partition by user, goals order by date) as seqnum
from cte
) cte;

SQL Addition Formula

Noob alert...
I have an example table as followed.
I am trying to create a column in SQL that shows the what percentage each customer had of size S per year.
So output should be something like:
(Correction: the customer C for 2019 Percentage should be 1)
Window functions will get you there.
DECLARE #TestData TABLE
(
[Customer] NVARCHAR(2)
, [CustomerYear] INT
, [CustomerCount] INT
, [CustomerSize] NVARCHAR(2)
);
INSERT INTO #TestData (
[Customer]
, [CustomerYear]
, [CustomerCount]
, [CustomerSize]
)
VALUES ( 'A', 2017, 1, 'S' )
, ( 'A', 2017, 1, 'S' )
, ( 'B', 2017, 1, 'S' )
, ( 'B', 2017, 1, 'S' )
, ( 'B', 2018, 1, 'S' )
, ( 'A', 2018, 1, 'S' )
, ( 'C', 2017, 1, 'S' )
, ( 'C', 2019, 1, 'S' );
SELECT DISTINCT [Customer]
, [CustomerYear]
, SUM([CustomerCount]) OVER ( PARTITION BY [Customer]
, [CustomerYear]
) AS [CustomerCount]
, SUM([CustomerCount]) OVER ( PARTITION BY [CustomerYear] ) AS [TotalCount]
, SUM([CustomerCount]) OVER ( PARTITION BY [Customer]
, [CustomerYear]
) * 1.0 / SUM([CustomerCount]) OVER ( PARTITION BY [CustomerYear] ) AS [CustomerPercentage]
FROM #TestData
ORDER BY [CustomerYear]
, [Customer];
Will give you
Customer CustomerYear CustomerCount TotalCount CustomerPercentage
-------- ------------ ------------- ----------- ---------------------------------------
A 2017 2 5 0.400000000000
B 2017 2 5 0.400000000000
C 2017 1 5 0.200000000000
A 2018 1 2 0.500000000000
B 2018 1 2 0.500000000000
C 2019 1 1 1.000000000000
Assuming there are no duplicate rows for a customer in a year, you can use window functions:
select t.*,
sum(count) over (partition by year) as year_cnt,
count * 1.0 / sum(count) over (partition by year) as ratio
from t;
Break it apart into tasks - that's probably the best rule to follow when it comes to SQL. So, I created a variable table #tmp which I populated with your sample data, and started out with this query:
select
customer,
year
from #tmp
where size = 'S'
group by customer, year
... this gets a row for each customer/year combo for 'S' entries.
Next, I want the total count for that customer/year combo:
select
customer,
year,
SUM(itemCount) as customerItemCount
from #tmp
where size = 'S'
group by customer, year
... now, how do we get the count for all customers for a specific year? We need a subquery - and we need that subquery to reference the year from the main query.
select
customer,
year,
SUM(itemCount) as customerItemCount,
(select SUM(itemCount) from #tmp t2 where year=t.year) as FullTotalForYear
from #tmp t
where size = 'S'
GROUP BY customer, year
... that make sense? That new line in the ()'s is a subquery - and it's hitting the table again - but this time, its just getting a SUM() over the particular year that matches the main table.
Finally, we just need to divide one of those columns by the other to get the actual percent (making sure not to make it int/int - which will always be an int), and we'll have our final answer:
select
customer,
year,
cast(SUM(itemCount) as float) /
(select SUM(itemCount) from #tmp t2 where year=t.year)
as PercentageOfYear
from #tmp t
where size = 'S'
GROUP BY customer, year
Make sense?
With a join of 2 groupings:
the 1st by size, year, customer and
the 2nd by size, year.
select
t.customer, t.year, t.count, t.size,
ty.total_count, 1.0 * t.count / ty.total_count percentage
from (
select t.customer, t.year, sum(t.count) count, t.size
from tablename t
group by t.size, t.year, t.customer
) t inner join (
select t.year, sum(t.count) total_count, t.size
from tablename t
group by t.size, t.year
) ty
on ty.size = t.size and ty.year = t.year
order by t.size, t.year, t.customer;
See the demo

combine two sql queries having each a with clause

I have got two queries who are doing the job for me but I would like to combine them to have the results in one table instead of copy past the results into excel.
First query gives me the number of users whose at least one subscription have expired per month in 2018:
WITH UniqueUsers AS
(
SELECT DISTINCT MONTH(ValidTo) ExpireMonth, UserId
FROM UserInAppPurchase
WHERE YEAR(ValidTo) = 2018
)
SELECT ExpireMonth, COUNT(UserId) UserCount
FROM UniqueUsers
GROUP BY ExpireMonth order by ExpireMonth;
Second query gives me the number of users whose at least made one subscription purchase per month in 2018:
WITH UniqueUsers AS
(
SELECT DISTINCT MONTH(PurchaseDate) PurchaseMonth, UserId
FROM UserInAppPurchase
WHERE YEAR(PurchaseDate) = 2018
)
SELECT PurchaseMonth, COUNT(UserId) UserCount
FROM UniqueUsers
GROUP BY PurchaseMonth order by PurchaseMonth;
Actually the PurchaseMonth and ExpireMonth are the same.
My expected output is: 1st column: months of 2018
2nd column: results from first query
3nd column: results from second query
It is not a big harm to just copy the two results and combine them manually, but I am curious how to do it directly in SQL.
Thanks for the help
I would unpivot the dates and just do aggregation:
SELECT MONTH(dte) as mon,
COUNT(DISTINCT ExpireUserId) as numExpiredUsers,
COUNT(DISTINCT PurchaseUserId) as numPurchaseUsers
FROM UserInAppPurchase uiap CROSS APPLY
(VALUES (ValidTo, UserId, NULL),
(PurchaseDate, NULL, UserId)
) v(dte, ExpireUserId, PurchaseUserId)
WHERE dte >= '2018-01-01' AND dte < '2019-01-01'
GROUP BY MONTH(dte)
ORDER BY MONTH(dte);
No subqueries, explicit JOINs, or CTEs are really needed for this logic.
You can use smth like this:
with months as
(
SELECT MONTH(ValidTo) as month_
FROM UniqueUsers
WHERE YEAR(ValidTo) = 2018
union
SELECT MONTH(PurchaseDate)
FROM UniqueUsers
WHERE YEAR(PurchaseDate) = 2018
),
UniqueUsers AS
(
SELECT MONTH(PurchaseDate) as ExpireMonth,
COUNT(UserId) UserCount
FROM UniqueUsers
WHERE YEAR(ValidTo) = 2018
GROUP BY MONTH(PurchaseDate)
),
UniqueUsers1 AS
(
SELECT MONTH(PurchaseDate) as PurchaseMonth,
COUNT(UserId) UserCount1
FROM UniqueUsers
WHERE YEAR(PurchaseDate) = 2018
GROUP BY MONTH(PurchaseDate)
)
select m.month_,
u.UserCount,
u1.UserCount1
from months m
left join UniqueUsers u
on m.month_ = u.ExpireMonth
left join UniqueUsers1 u1
on m.month_ = u1.PurchaseMonth
order by m.month_;
But I'm not sure your order by clause is what you want. Your months will be sorted as 1, 11, 12. Maybe you wanted another order, I just rewrote your queries in order to combine them.
Here you are, you can join CTEs
;WITH UniqueUsers1
AS (
SELECT DISTINCT MONTH(ValidTo) ExpireMonth
,UserId
FROM UserInAppPurchase
WHERE YEAR(ValidTo) = 2018
)
,UniqueUsers2
AS (
SELECT DISTINCT MONTH(PurchaseDate) PurchaseMonth
,UserId
FROM UserInAppPurchase
WHERE YEAR(PurchaseDate) = 2018
)
SELECT *
FROM (
SELECT ExpireMonth
,COUNT(UserId) UserCount
FROM UniqueUsers1 c1
GROUP BY ExpireMonth
) First
INNER JOIN (
SELECT PurchaseMonth
,COUNT(UserId) UserCount
FROM UniqueUsers2
GROUP BY PurchaseMonth
) Sec ON First.ExpireMonth = Sec.PurchaseMonth
You can store each result in separate temporary table, for example create two temporary tables like this:
CREATE Table #ExpiryDates (ExpireMonth int, TotalUsers int)
CREATE Table #PurchaseDates (PurchaseMonth int, TotalUsers int)
And then store first query result into the temporary table #ExpirayDates
;WITH UniqueUsers AS
(
SELECT DISTINCT MONTH(ValidTo) ExpireMonth, UserId
FROM UserInAppPurchase
WHERE YEAR(ValidTo) = 2018
)
INSERT INTO #ExpiryDates (ExpireMonth, TotalUsers)
SELECT ExpireMonth, COUNT(UserId) UserCount
FROM UniqueUsers
GROUP BY ExpireMonth order by ExpireMonth;
Second Query will be like this:
;WITH UniqueUsers AS
(
SELECT DISTINCT MONTH(PurchaseDate) PurchaseMonth, UserId
FROM UserInAppPurchase
WHERE YEAR(PurchaseDate) = 2018
)
INSERT INTO #PurchaseDates (PurchaseMonth, TotalUsers)
SELECT PurchaseMonth, COUNT(UserId) UserCount
FROM UniqueUsers
GROUP BY PurchaseMonth order by PurchaseMonth;
Finally, compine these into one query
SELECT p.PurchaseMonth MonthNumber, p.TotalUsers As UsersMadePurchases,
e.ExpireMonth, e.TotalUsers As UsersExpiredSubs
FROM #PurchaseDates p
LEFT JOIN #ExpiryDates e
ON e.ExpireMonth = p.PurchaseMonth
Result will be:
MonthNumber UsersMadePurchases ExpireMonth UsersExpiredSubs
4 1 NULL NULL
5 2 5 2
6 2 6 2
7 1 7 1
8 1 8 1
Let me know if this solves your problem.

Grouping duplicate rows and calculating effective and end dates

As per the attached sample, I have data of repeated rows with different date values. I would like to combine the duplicate records to reduce the number of rows and at the same time would like to calculate the end date of record.
“CountryCode” column should be used to combine the records and value changes in “CountryRiskLevel” or “RegionRiskLevel” columns should be used to define the start and end date ranges.
Database - SQL Server.
Try this query, I used slightly different sample data, but query will work for you as well:
;with SampleData as(
select 1 CountryCode,
1 RegionCode,
5 CountryRiskLevel,
5 RegionRiskLevel,
CONVERT(date, '2018-01-01') EffectiveDate
union all
select 1,1,5,5,CONVERT(date, '2018-01-02')
union all
select 1,1,5,5,CONVERT(date, '2018-01-03')
union all
select 1,1,5,5,CONVERT(date, '2018-01-04')
union all
select 1,1,2,2,CONVERT(date, '2018-01-05')
union all
select 1,1,5,5,CONVERT(date, '2018-01-06')
union all
select 1,1,5,5,CONVERT(date, '2018-01-07')
union all
select 1,1,5,3,CONVERT(date, '2018-01-08')
union all
select 1,1,5,5,CONVERT(date, '2018-01-09')
union all
select 1,1,5,5,CONVERT(date, '2018-01-10')
union all
select 1,1,5,5,CONVERT(date, '2018-01-11')
)
select CountryCode,
RegionCode,
CountryRiskLevel,
RegionRiskLevel,
MIN(effectiveDate) EffecticeStartDate,
case when MAX(effectiveDate) = MIN(effectiveDate) then MAX(dt) else MAX(effectiveDate) end EffectiveEndDate
from (
select *,
ROW_NUMBER() over (partition by CountryCode, RegionCode, CountryRiskLevel, RegionRiskLevel order by EffectiveDate) rn1,
ROW_NUMBER() over (order by EffectiveDate) rn2,
case when COUNT(*) over (partition by countrycode, RegionCode, CountryRiskLevel, RegionRiskLevel) = 1
then LEAD(effectivedate) over (order by effectivedate) end dt
from SampleData
) a group by CountryCode, RegionCode, CountryRiskLevel, RegionRiskLevel, rn2 - rn1