I am trying to solve one query which I have already solved in SQL Server.
Write a SQL query to find continuous dates appear at least three times.
SQLfiddle
Table: orders
*------------*
| mdate |
*------------*
|'2012-05-01'|
|'2012-05-02'|
|'2012-05-03'|
|'2012-05-06'|
|'2012-05-07'|
|'2012-05-10'|
|'2012-05-11'|
*------------*
SQL Server:
select
mdate
from
(
select
mdate,
count(gap) over (partition by gap) as total
from
(
select
mdate,
dateadd(day, - row_number() over (order by mdate), mdate) as gap
from orders
) t
) tt
where total >= 3
Result:
*------------*
| mdate |
*------------*
|'2012-05-01'|
|'2012-05-02'|
|'2012-05-03'|
*------------*
I cannot use dateadd() function in PostgreSQL so how can I achieve same result in it?
In SQL Server, this would look like:
select mdate
from (select o.*,
count(*) over (partition by dateadd(day, - seqnum, mdate)) as cnt
from (select o.*,
row_number() over (order by mdate) as seqnum
from orders o
) o
) o
where cnt >= 3;
In Postgres:
select mdate
from (select o.*,
count(*) over (partition by mdate - seqnum * interval '1 day') as cnt
from (select o.*,
row_number() over (order by mdate) as seqnum
from orders o
) o
) o
where cnt >= 3;
The only difference is the date arithmetic.
You can make an interval out of your ROW_NUMBER computation by CONCAT with ' day' and typecasting; then you can subtract that from mdate. Your query remains the same other than changing
dateadd(day, - row_number() over (order by mdate), mdate) as gap
to
mdate - concat(row_number() over (order by mdate), ' day')::interval as gap
Demo on SQLFiddle
Related
I have a big query program below;
WITH cte AS(
SELECT *
FROM (
SELECT project_name,
SUM(reward_value) AS total_reward_value,
DATE_TRUNC(date_signing, MONTH) as month,
date_signing,
Row_number() over (partition by DATE_TRUNC(date_signing, MONTH)
order by SUM(reward_value) desc) AS rank
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month, date_signing
)
)
SELECT * FROM cte WHERE rank <= 5
that returns the following result:
While I expect to have each unique project to be SUM within each month and then I filter only the top 5.
Something like this:
I got the following error if the date_signing grouping is removed
PARTITION BY expression references column date_signing which is neither grouped nor aggregated at [16:48]
Any hints what should be corrected will be appreciated!
One more subquery maybe then?
WITH cte AS(
SELECT project_name,
SUM(reward_value) as reward_sum,
DATE_TRUNC(date_signing, MONTH) as month
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month
),
ranks AS (
SELECT
project_name,
reward_sum,
month,
ROW_NUMBER() over (PARTITION BY month ORDER BY reward_sum DESC) AS rank
)
SELECT *
FROM ranks
WHERE rank <= 5
yeah you can't do that , yo can show the last signing date instead:
WITH cte AS(
SELECT project_name,
SUM(reward_value),
DATE_TRUNC(date_signing, MONTH) as month,
MAX(date_signing) as last_signing_date,
Row_number() over (partition by DATE_TRUNC(date_signing, MONTH)
order by SUM(reward_value) desc) AS rank
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month
)
SELECT * FROM cte WHERE rank <= 5
I'm working with a table that has an id and date column. For each id, there's a 90-day window where multiple transactions can be made. The 90-day window starts when the first transaction is made and the clock resets once the 90 days are over. When the new 90-day window begins triggered by a new transaction I want to start the count from the beginning at one. I would like to generate something like this with the two additional columns (window and count) in SQL:
id date window count
name1 7/7/2019 first 1
name1 12/31/2019 second 1
name1 1/23/2020 second 2
name1 1/23/2020 second 3
name1 2/12/2020 second 4
name1 4/1/2020 third 1
name2 6/30/2019 first 1
name2 8/14/2019 first 2
I think getting the rank of the window can be done with a CASE statement and MIN(date) OVER (PARTITION BY id). This is what I have in mind for that:
CASE WHEN MIN(date) OVER (PARTITION BY id) THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 90 THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 90 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 180 THEN 'third'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 180 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 270 THEN 'fourth'
ELSE NULL END
And incrementing the counts within the windows would be ROW_NUMBER() OVER (PARTITION BY id, window)?
You cannot solve this problem with window functions only. You need to iterate through the dataset, which can be done with a recursive query:
with
tab as (
select t.*, row_number() over(partition by id order by date) rn
from mytable t
)
cte as (
select id, date, rn, date date0 from tab where rn = 1
union all
select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
from cte c
inner join tab t on t.id = c.id and t.rn = c.rn + 1
)
select
id,
date,
dense_rank() over(partition by id order by date0) grp,
count(*) over(partition by id order by date0, date) cnt
from cte
The first query in the with clause ranks records having the same id by increasing date; then, the recursive query traverses the data set and computes the starting date of each group. The last step is numbering the groups and computing the window count.
GMB is totally correct that a recursive CTE is needed. I offer this as an alternative form for two reasons. First, because it uses SQL Server syntax, which appears to be the database being used in the question. Second, because it directly calculates window and count without window functions:
with t as (
select t.*, row_number() over (partition by id order by date) as seqnum
from tbl t
),
cte as (
select t.id, t.date, dateadd(day, 90, t.date) as window_end, 1 as window, 1 as count, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date > cte.window_end then dateadd(day, 90, t.date)
else cte.window_end
end) as window_end,
(case when t.date > cte.window_end then window + 1 else window end) as window,
(case when t.date > cte.window_end then 1 else cte.count + 1 end) as count,
t.seqnum
from cte join
t
on t.id = cte.id and
t.seqnum = cte.seqnum + 1
)
select id, date, window, count
from cte
order by 1, 2;
Here is a db<>fiddle.
ref to this post: link, I used the answer provided by #Gordon Linoff:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
and got my answer perfectly like this:
Tom 3 (AA count as 1, AAA count as 1 and BB count as 1, so total of 3 count)
Bob 1
But now I would like to add one more condition which is the time between two consecutive clients for same taxi should not be longer than 2hrs.
I know that I should probably use row_number() again and calculate the time difference with datediff. But I have no idea where to add and how to do.
So any suggestion?
This requires a bit more logic. In this case, I would use lag() to calculate the groups:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
sum(case when prev_client = client and
prev_time > time - interval '2 hour'
then 1
else 0
end) over (partition by client order by time) as grp
from (select t.*,
lag(client) over (partition by taxi order by time) as prev_client,
lag(time) over (partition by taxi order by time) as prev_time
from t
) t
) t
group by t.taxi, t.client, grp
having count(*) >= 2
)
group by taxi;
Note: You don't specify the database, so this uses ISO/ANSI standard syntax for date/time comparisons. You can adjust this for your actual database.
I have a table:
Value Date
100 01/01/2000
110 01/05/2002
100 01/10/2003
100 01/12/2004
I want to group the data in this way
Value StartDate EndDate
100 01/01/2000 30/04/2002
110 01/05/2002 30/09/2003
100 01/10/2003 NULL --> or value like '01/01/2099'
How can I accomplish this?
Can a CTE be useful and how?
For RDBMS supported window functions (example on MS SQL database):
with Test(value, dt) as(
select 100, cast('2000-01-01' as date) union all
select 110, cast('2002-05-01' as date) union all
select 100, cast('2003-10-01' as date) union all
select 100, cast('2004-12-01' as date)
)
select max(value) value, min(dt) startDate, max(end_dt) endDate
from (
select a.*, sum(brk) over(order by dt) grp
from (
select t.*,
case when value!=lag(value) over(order by dt) then 1 else 0 end brk,
DATEADD(DAY,-1,lead(dt,1,cast('2099-01-02' as date)) over(order by dt)) end_dt
from Test t
) a
) b
group by grp
order by startDate
I think the difference of row numbers is simpler in this case:
select value, min(date) as endDate,
dateadd(day, -1, lead(min(date)) over (order by min(date))) as endDate
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by value order by date) as seqnum_v
from t
) t
group by value, (seqnum - seqnum_v);
The difference of the row numbers defines the groups you want. This is a bit hard to see at first . . . if you stare at the results of the subquery, you'll see how it works.
I am running a query against MS SQL Server 2008 and am selecting an accountnumber and the max of the column mydate grouped by accountnumber:
select AccountNumber,
max(mydate),
from #SampleData
group by AccountNumber
I want to add a column to the result that contains the second highest mydate that is associated with the AccountNumber group. I know it would have to be something like:
select max(mydate)
from #SampleData
where mydate < (select max(mydate) from #SampleData)
But how do I get both the max and 2nd max in one select query?
You didn't specify your DBMS so this is ANSI SQL:
select accountnumber,
rn,
mydate
from (
select accountnumber,
mydate,
row_number() over (partition by accountnumber order by mydate desc) as rn
from #SampleData
) t
where rn <= 2;
Try this
Select AccountNumber,
MAX(Case when Rnum = 1 Then mydate END) mydate_1,
MAX(Case when Rnum = 2 Then mydate END) mydate_2
From
(
select
AccountNumber, mydate,
ROW_NUMBER() OVER (PARTITION By AccountNumber ORDER BY mydate DESC) as Rnum
from #SampleData
) V
Group By AccountNumber
Something like this should select the second highest:
select AccountNumber,
max(mydate),
(select max(SD2.mydate) from #SampleData SD2 where SD2.AccountNumber=#SampleData.AccountNumber AND SD2.mydate<max(#SampleData.mydate))
from #SampleData
group by AccountNumber
You could also use a TOP N clause combined with an order by:
select
TOP 2
accountnumber,
mydate,
row_number() over (partition by accountnumber order by mydate desc) as rn
from #SampleData
ORDER BY
row_number() over (partition by accountnumber order by mydate desc)