Selecting only last part of the data - sql

I've the following data
I want the outcome to be like this:
For Albania, I want to select minimum and maximum values of date for the last value of City_Code (here, it's 20008) of Albania (The min value of date for 20008 is 18.01.2013, and max value for 20008 is 20.01.2013). For Croatia the last value of City_Code is 'zero', so we shouldn't select anything (if last value of 'City_Code' is zero, don't select it at all). For Slovenia, the last value of City_Code is 70005, so we select minimum and maximum values of corresponding dates (here maximum and minimum values are 22.01.2013). How should code look like? I don't have any idea. Thanks in advance

Try this...
;
WITH cte_countries ( Country, City_code, CurrentDate, LatestRank )
AS ( SELECT Country ,
City_code ,
CurrentDate ,
RANK() OVER ( PARTITION BY country ORDER BY CurrentDate DESC ) AS LatestRank
FROM #countries
WHERE City_code != 0
)
SELECT *
FROM cte_countries
WHERE LatestRank = 1

SELECT Country,
max(City_code),
min(DATE),
max(Date)
FROM T as T1
WHERE City_code = (SELECT TOP 1 City_Code
FROM T WHERE T.Country=T1.Country
ORDER BY Date DESC)
GROUP BY Country
HAVING max(City_Code)<>'0'

Try this:
With Cities
AS (
select Country, City_Code, Min([Date]) Date1, Max([Date]) Date2,
ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Country, City_Code DESC) Seq
from MyCountryCityTable t
group by t.Country, t.City_Code
)
Select
Country,
NULLIF(City_Code,0) City_Code,
CASE WHEN City_Code = 0 THEN NULL ELSE Date1 END Date1,
CASE WHEN City_Code = 0 THEN NULL ELSE Date2 END Date2
From Cities Where Seq = 1
Order by Country
EDIT:
Version without the common table expression (WITH):
Select
Country,
NULLIF(City_Code,0) City_Code,
CASE WHEN City_Code = 0 THEN NULL ELSE Date1 END Date1,
CASE WHEN City_Code = 0 THEN NULL ELSE Date2 END Date2
From (select Country, City_Code, Min([Date]) Date1, Max([Date]) Date2,
ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Country, City_Code DESC) Seq
from MyCountryCityTable t
group by t.Country, t.City_Code) Cities
Where Seq = 1
Order by Country

Related

Top 25 Pageviews in each Country SQL

I am looking for the top 25 blog search in each country
Please help me out in this. Thanks in advance
with Result as ( select
sum(Pageviews) Total_Page,page_path,date
,case
when "PROFILE" = 44399579 then 'India'
when "PROFILE" = 36472271 then 'China'
when "PROFILE" = 41751607 then 'Russia'
else null
end COUNTRY,
Dense_rank() over(PARTITION BY Country order by sum(Pageviews) desc) as Test
From ""GOOGLE_ANALYTICS_PHASE1"."PAGES"
where PAGE_PATH like '%blog%' //and PAGE_PATH = '/blog?category_id=8&page=3'
group by page_path,country,date)
select top 100 Total_Page,
page_path,country,test,date
from result
where test <= 25 and Date between '2022-05-01' and '2022-05-31'
Snowflake SQL:
If you want the TOP 25 page views, per country only for the pages in date period defined.
using this fake data:
with PAGES(pageviews, profile, page_path, date) as (
select * from values
(100, 44399579, 'blog1', '2022-05-31'::date),
(1000, 44399579, 'blog1', '2022-05-30'::date),
(200, 44399579, 'blog2', '2022-05-31'::date),
(2000, 44399579, 'blog2', '2022-04-01'::date)
)
and with 25 changed to 1 to "show it working"
SELECT
b.total_page
,b.page_path
,b.date
,b.country
FROM (
SELECT a.*
,SUM(total_page) over(partition by country, page_path) as tt
FROM (
SELECT
SUM(pageviews) AS total_page
,page_path
,date
,CASE profile
WHEN 44399579 THEN 'United States'
WHEN 36472271 THEN 'New Zealand'
WHEN 41751607 THEN 'Australia'
ELSE null
END AS country
FROM pages //"FIVETRAN_DATABASE_COMVITA"."GOOGLE_ANALYTICS_PHASE1"."PAGES"
WHERE page_path LIKE '%blog%'
AND Date BETWEEN '2022-05-01' AND '2022-05-31'
//and PAGE_PATH = '/blog?category_id=8&page=3'
GROUP BY 2,3,4
) as A
) as B
QUALIFY DENSE_RANK() OVER (PARTITION BY country ORDER BY tt desc) <= 1
gives:
TOTAL_PAGE
PAGE_PATH
DATE
COUNTRY
100
blog1
2022-05-31
United States
1,000
blog1
2022-05-30
United States
where-as if you want all time top pages, but only showing the top pages in the current period..
SELECT
b.total_page
,b.page_path
,b.date
,b.country
FROM (
SELECT a.*
,SUM(total_page) over(partition by country, page_path) as tt
FROM (
SELECT
SUM(pageviews) AS total_page
,page_path
,date
,CASE profile
WHEN 44399579 THEN 'United States'
WHEN 36472271 THEN 'New Zealand'
WHEN 41751607 THEN 'Australia'
ELSE null
END AS country
FROM pages //"FIVETRAN_DATABASE_COMVITA"."GOOGLE_ANALYTICS_PHASE1"."PAGES"
WHERE page_path LIKE '%blog%'
//and PAGE_PATH = '/blog?category_id=8&page=3'
GROUP BY 2,3,4
) as A
) as B
WHERE Date BETWEEN '2022-05-01' AND '2022-05-31'
QUALIFY DENSE_RANK() OVER (PARTITION BY country ORDER BY tt desc) <= 1
now returns:
TOTAL_PAGE
PAGE_PATH
DATE
COUNTRY
200
blog2
2022-05-31
United States
Because blog2 has the all time record, but the 200 views is the only one in the window of interest.

Get last date of modification in database by value

How it is possible to get - when was the last change (by date) - in
this table:
id
date
value
1
01.01.2021
0.0
1
02.01.2021
10.0
1
03.01.2021
15.0
1
04.01.2021
25.0
1
05.01.2021
25.0
1
06.01.2021
25.0
Of course I could use clause where and it will works, but i have a lot of rows and for some i don't now exactly day when this happend.
The resault should be:
id
date
value
1
04.01.2021
25.0
Try this one:
with mytable as (
select 1 as id, date '2021-01-01' as date, 0 as value union all
select 1, date '2021-01-02', 10 union all
select 1, date '2021-01-03', 15 union all
select 1, date '2021-01-04', 25 union all
select 1, date '2021-01-05', 25 union all
select 1, date '2021-01-06', 25
)
select id, array_agg(struct(date, value) order by last_change_date desc limit 1)[offset(0)].*
from (
select *, if(value != lag(value) over (partition by id order by date), date, null) as last_change_date
from mytable
)
group by id
in this scenario I would be using two field in my database "created_at and updated_at" with the type as "timestamp". You may simply fetch your records using OrderBY "updated_at" field.
see what this gives you:
SELECT MAX(date) OVER (PARTITION BY(value)) AS lastChange
FROM Table
WHERE id = 1
The following query and reproducible example on db-fiddle works. I've also included some additional test records.
CREATE TABLE my_data (
`id` INTEGER,
`date` date,
`value` INTEGER
);
INSERT INTO my_data
(`id`, `date`, `value`)
VALUES
('1', '01.01.2021', '0.0'),
('1', '02.01.2021', '10.0'),
('1', '03.01.2021', '15.0'),
('1', '04.01.2021', '25.0'),
('1', '05.01.2021', '25.0'),
('1', '06.01.2021', '25.0'),
('2', '05.01.2021', '25.0'),
('2', '06.01.2021', '23.0'),
('3', '03.01.2021', '15.0'),
('3', '04.01.2021', '25.0'),
('3', '05.01.2021', '17.0'),
('3', '06.01.2021', '17.0');
Query #1
SELECT
id,
date,
value
FROM (
SELECT
*,
row_number() over (partition by id order by date desc) as id_rank
FROM (
SELECT
id,
m1.date,
m1.value,
rank() over (partition by id,m1.value order by date asc) as id_value_rank,
CASE
WHEN (m1.date = (max(m1.date) over (partition by id,m1.value ))) THEN 1
ELSE 0
END AS is_max_date_for_group,
CASE
WHEN (m1.date = (max(m1.date) over (partition by id ))) THEN 1
ELSE 0
END AS is_max_date_for_id
from
my_data m1
) m2
WHERE (m2.is_max_date_for_group = m2.is_max_date_for_id and is_max_date_for_group <> 0 and id_value_rank=1) or (id_value_rank=1 and is_max_date_for_id=0)
) t
where t.id_rank=1
order by id, date, value;
id
date
value
1
04.01.2021
25
2
06.01.2021
23
3
05.01.2021
17
View on DB Fiddle
I actually find that the simplest method is to enumerate the rows by id/date and by id/date/value in descending order. These are the same for the last group . . . and the rest is aggregation:
select id, min(date), value
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum,
row_number() over (partition by id, value order by date desc) as seqnum_2
from t
) t
where seqnum = seqnum_2
group by id;
If you use lag(), I would recommend using qualify for performance:
select t.*
from (select t.*
from t
qualify lag(value) over (partition by id order by date) <> value or
lag(value) over (partition by id order by date) is null
) t
qualify row_number() over (partition by id order by date desc) = 1;
Note: Both of these work if the value is the same for all rows. Other methods may not work in that situation.

Identify date range and merge into max and min dates

I have data ( int, date , date types)
SELECT * FROM
(
VALUES
(1700171048,'2020-12-21','2021-01-03'),
(1700171048,'2021-01-05','2021-01-12'),
(1700171048,'2021-01-13','2021-01-17'),
(1700171048,'2021-01-18','2021-01-19'),
(1700171048,'2021-01-22','2021-01-27'),
(1700171048,'2021-01-28','2021-02-17')
(1700171049,'2020-12-21','2021-01-03'),
(1700171049,'2021-01-04','2021-01-05'),
(1700171049,'2021-01-06','2021-01-17'),
(1700171049,'2021-01-18','2021-01-19'),
(1700171049,'2021-01-20','2021-01-27'),
(1700171049,'2021-01-28','2021-02-17')
) AS c (id1, st, endt )
I need output( i.e. if start and end dates are continuous then make it part of group )
id1 st endt
1700171048 '2020-12-21' , '2021-01-03'
1700171048 '2021-01-05' , '2021-01-19'
1700171048 '2021-01-22' , '2021-02-17'
1700171049 '2020-12-21' to '2021-02-17'
I tried this, won't work.
select id, case when min(b.st) = max(b.endt) + 1 then min(b.st) end,
case when min(b.endt) = min(b.st) + 1 then max(b.st) end
from c a join c b
group by id
This is a type of gaps-and-islands problem. Use lag() to identify if there is an overlap. Then a cumulative sum of when there is no overlaps and aggregation:
select id1, min(st), max(endt)
from (select t.*,
sum(case when prev_endt >= st + interval '-1 day' then 0 else 1 end) over (partition by id1 order by st) as grp
from (select t.*,
lag(endt) over (partition by id1 order by st) as prev_endt
from t
) t
) t
group by id1, grp;
Here is a db<>fiddle.

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

SQL - Row Sequencing

I have a table play day with following columns date_played, winner, loser
with following values,
(Jun-03-14, USA, China)
(Jun-05-14, USA, Russia)
(Jun-06-14, France, Germany)
.
.
.
.
(Jun-09-14, USA, Russia)
I need to obtain all instances where USA has won exactly 3 rows in a sequence.
I tried with the following query.
Select
date, winner, loser,
RANK() OVER (PARTITION BY winner ORDER BY date rows 2 preceding) as rank
from playday;
You can use the following query.
select winner,loser,date,cnt from (select winner, loser, date, date - lag(date,3) over ( order by date) as cnt from playday) where cnt >=3
first you need to find out when was the last time they lost.
second count the number of wins, greater than (>) the date of the last time they lost.
third return all rows greater than last loss, if count > 3.
sorry, don't have an SQL parser in front of me to put it in code properly.
Set #team_name = "USA";
select date, winner, loser
from playday
where (select count(*) as wins_since_loss from playday
where playday.winner = #team_name
and playday.date >
(select max(date) as losing_date from playday where playday.loser = #team_name)) = 3
The query is to pull sequence of rows where USA won 3 time in a row, not less or more (I used date as date1)
select date1, winner, loser from
(
select count (*) over (partition by change) as id, date1,winner,loser from
(
select date1,winner,loser,lag_loser, sum(case when loser <> lag_loser and (loser='USA' or lag_loser='USA') then 1 else 0 end) over (order by date1 rows unbounded preceding) as change from
(
select date1, winner,loser, lag(loser) over (order by date1) as lag_loser from
(
select date1, winner, loser from playday
where winner ='USA' or loser = 'USA'
ORDER BY date1 ASC
)
)
)
)
where winner ='USA' and id =3