Find maximum number of days between consecutive events - sql

I am trying to find the maximum number of days between any consecutive events for each company. I have a table events with fields company,eventid,date.
|eventid|company |date|
|1 | Company1 |2020-10-15|
|2 | Company2 |2018-03-22|
|3 | Company2 |2019-12-02|
|4 | Company3 |2021-01-02|
|5 | Company3 |2019-06-20|
|6 | Company1 |2018-07-21|
|7 | Company2 |2016-10-18|
|8 | Company2 |2017-04-12|
|9 | Company1 |2020-05-07|
|10| Company3 |2021-11-03|
I have managed to get a column of amount of days between each consecutive event:
select e1.company, e1.date, (e1.date - min(e2.date)) as daysbetween
from events e1 join events e2 on (e1.company=e2.company and e2.date > e1.date)
group by e1.company,e1.date;
This returns 10 results, but I only need the maximum from the daysbetween column for each company which would give 3 results. However since I used min() to get the daysbetween column I cannot use max() again on that column to find the maximum for each company.
I have been stuck on this for a few days now and cannot work out how I can find the maximum number of days between consecutive events for each company.

You can use LEAD function.
I am putting PosgreSQL example, since I noticed you tagged PostgreSQL.
Here is CTE broken into steps to make it clear how it works:
;with cte1 as (
select
company, eventDate, LEAD(eventDate, 1) OVER (PARTITION BY company ORDER BY eventDate) as nextEventDate
from tbl
order by company, eventDate
),
cte2 as (
select
company, nextEventDate - eventDate as daysBetweenEvents
from cte1
),
cte3 as (
select company, max(daysBetweenEvents)
from cte2
group by company
)
select *
from cte3

You can find the number of days between events for each company using lag function through a subquery, and then find the maximum number of days for each company in the main query.
Select Company, Max(daysbetween)
From
(Select Company, date - Lag(date) Over (Partition by Company Order by date) As daysbetween
From events) As T
Group by Company

Related

SUM function and between two max dates

I am trying to find out the total amount between 2 dates with MAX function in dates grouping by ID.
column 365date is the difference of column sessiondate-364.
Currently with this query I am getting the total amount, but I want to find out the amount between these 2-column date (i.e. 365 days).
This is my query:
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS 365date,
MAX(sessiondate)) AS lastdate,
SUM(Amount) AS amount,
ID
FROM
tablename
WHERE
date BETWEEN 365date AND lastdate
GROUP BY
MemberID
date | LastDate| Amount| ID| output amount(only last 365 days)| Total amount(all year)
29/07/2020 |28/07/2021 |100 |1 |1500 |63000
29/08/2020 |28/07/2021 |500 |1
02/05/2020 |28/07/2021 |600 |1
15/01/2020 |28/07/2021 |300 |1
10/10/2000 |28/07/2021 |50000 |1
10/10/1989 |28/07/2021 |10000 |1
"So need to take max(lastdate) for this ID which is 28/07/2021
and subtract 365 days from that then take all the days which lies
between 365 (29/07/2020,29/08/2020,02/05/2020,15/01/2020) and do sum and show it in last 365 days column.
For column totalamount(all year) needs to add all amount no matter of 365 days
Logic:
calculate date column
(MAX(date))-364
calculate lastdate column
Max(lastdate)
calculate last365 amount column
Sum (amount) Between (MAX(date))-364 and Max(lastdate)
calculate Total amount(all year)
sum(amount)
I need only 1 row which is highlighted. Not sure whats wrong with the query.
Can someone please help with this?
You can use CTE, I have convert your query to CTE for where clause.
;with cte AS (
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS '365date',
MAX(sessiondate)) AS 'lastdate',
SUM(Amount) AS 'amount',
ID
FROM
tablename
)
SELECT * FROM cte
WHERE
DATE BETWEEN 365date AND lastdate
GROUP BY
MemberID

Group By multple columns with conditions Spark SQL

Can anyone shed some lights to how I should tackle this problem.
Current data
Name
Code
Date
Count
A
1A
2020-05-03
34
A
1A
2020-04-02
25
B
3D
2021-04-23
24
C
2X
2021-04-01
01
C
2X
2021-03-31
01
Desired Output:
Name
Code
Date
Count
A
1A
2020-05-03
34
B
3D
2021-04-23
24
C
2X
2021-04-01
01
C
2X
2021-03-31
01
Output from my code:
Name
Code
Date
Count
A
1A
2020-05-03
34
B
3D
2021-04-23
24
C
2X
2021-04-01
01
Below is my code:
SELECT
name,
code,
MAX(date) AS dates,
MAX(Cases_Number) AS Max_Num
FROM(
SELECT
lhd_2010_name AS name,
lhd_2010_code AS code,
notification_date AS date,
FLOOR(SUM(num)) as Cases_Number
FROM cases
GROUP BY
notification_date,
lhd_2010_name,
lhd_2010_code
ORDER BY Cases_Number DESC, notification_date, lhd_2010_name DESC
) AS innertable
GROUP BY name,code ORDER BY Max_Num DESC")
In the innertable I had to sum up the counts as all the counts were 1 before with GroupBy Name Code and Date to get the total counts. Then on the outertable I have to find the max count based on Name+Code combination. If max count is the same name+code combination, we will output the row too.
I understand the reason for the missing row is because I have used max(date), but this is the only way for me to be able to group by name and code, and also showing the dates. If I try to group by name, code, and dates it will show all other rows.
Thanks in Advance
Let's call your main table main, we can first group by name, code and count to find the count (of duplicates), we name the alias countDup and we filter countDup > 1, basically, we need these kind of rows:
|C |2X |1 |2020-04-01|
The code looks like this:
val ds2 = main.groupBy("name", "code", "count")
.agg(count("*").alias("countDup"))
.where(col("countDup")
.gt(1))
Preview of the code:
+----+----+-----+--------+
|name|code|count|countDup|
+----+----+-----+--------+
| C| 2X| 1| 2|
+----+----+-----+--------+
Then, we join with main table (left join), we add a rank to get maximum count, then we use a filter to filter only rows that we want, code:
main
.join(ds2, Seq("name", "code", "count"), "left")
.withColumn("ranking", expr("max(count) over (partition by name,code)"))
.filter(col("countDup").isNotNull || col("count").equalTo(col("ranking")))
.drop("countDup", "ranking")
.orderBy("name")
Final output (with order in name):
+----+----+-----+----------+
|name|code|count|date |
+----+----+-----+----------+
|A |1A |34 |2020-05-03|
|B |3D |24 |2020-04-23|
|C |2X |1 |2020-04-01|
|C |2X |1 |2020-03-31|
+----+----+-----+----------+
I hope this is what you need!
SPARK SQL VERSION
First, we create the temp table:
main.createTempView("main")
Then apply the following SQL:
SELECT name,code,date,count FROM (
SELECT m.name,m.code,m.date,m.count,r.countDup,MAX(m.count) OVER (PARTITION BY m.name,m.code) AS ranking FROM main m LEFT JOIN (
SELECT name,code,count,COUNT(*) AS countDup FROM main GROUP BY name,code,count HAVING COUNT(*) > 1) r
ON m.name = r.name AND m.code = r.code AND m.count = r.count)
WHERE countDup > 0 OR count == ranking ORDER BY name
Result is the same as above!

group by value but only for continue value

OK, the title is far from obvious, I could not explain it better.
Let's consider the table with columns (date, xvalue, some other columns), what I need is to group them by xvalue but only when they are not interrupted considering time (column date), so for example, for:
Date |xvalue |yvalue|
1 Mar |10 |1 |
2 Mar |10 |2 |
3 Mar |20 |6 |
4 Mar |20 |1 |
5 Mar |10 |4 |
6 Mar |10 |2 |
From the above data, I would like to get three rows, for the first xvalue==10, for xvalue==20 and again for xvalue==10 and for each group aggregate of the other values, for example for sum:
1 Mar, 10, 3
3 Mar, 20, 7
5 Mar, 10, 6
It's like query:
select min(date), xvalue, sum(yvalue) from t group by xvalue
Except above will merge 1,2,5 and 6th of March and I want them separately
This is an example of a gaps-and-islands problem. But you need an ordering column. With such a column, you can use the difference of row numbers:
select min(date), xvalue, sum(yvalue)
from (select t.*,
row_number() over (partition by xvalue order by date) as seqnum_d,
row_number() over (order by date) as seqnum
from t
) t
group by xvalue, (seqnum - seqnum_d)
order by min(date)
Here is a db<>fiddle.
Datas in a database are logically stored in mathematicl sets inside which there is absolutly no order and no way to have a default ordering. they are comparable to bags in which objects can move during their use.
So there is no solution to answer your query until you add a specific column to give the requested sort order that the user need to have...

How to merge dates between 2 locations using SQL?

TIA for any assistance towards this problem, I am rather new to SQL/SSMS.
I would like to understand how I can create date rows for 1 location based on another. For example, I have the following table:
ClientFK | LocationFK | Month | Sales
---------|------------|-----------|-------
15 |1 |2019-04-01 | $100
15 |2 |2019-04-01 | $50
15 |2 |2019-03-01 | $30
15 |2 |2019-02-01 | $20
How can I create rows in location 1 in which location 2 had sales? The output would look like this:
ClientFK | LocationFK | Month | Sales
---------|------------|-----------|-------
15 |1 |2019-04-01 | $100
15 |1 |2019-03-01 | $0
15 |1 |2019-02-01 | $0
15 |2 |2019-04-01 | $50
15 |2 |2019-03-01 | $30
15 |2 |2019-02-01 | $20
My goal is to make this dynamic, so I'm not trying to work with this specific client/location, it is just an example. Ideally this should work for any client/location combo.
Again, I'm rather new to this, and wasn't sure how to best ask this question. Any advice on how to clarify what I'm asking would be much appreciated as well. Thanks!
This will give you all months for all locations and all clients.
with AvailableDates as(
select distinct [Month] from MonthlySales
),
Locations as(
select distinct LocationFk from MonthlySales
),
Clients as(
select distinct ClientFk from MonthlySales
)
select c.ClientFk, l.LocationFk, ad.Month, IsNull(ms.Sales,0) as Sales
from Locations l
left join AvailableDates ad on 1=1
left join clients c on 1=1
left join MonthlySales ms on
l.LocationFk=ms.LocationFk
and c.ClientFk = ms.ClientFk
and ad.Month = ms.Month
order by locationFK, Month desc
Here is a way to go about with this. First find out all of the unique months for Location2.
Then carterisan prod with the Client1 data.
After that you have all of the records for Client1 and Client2 in the carteisan prod block, which would left join with the data and set up the sales=0 for those missing months data.
with loc2_data
as (select distinct month
from tbl
where locationfk=2
and ClientFK=15
)
,cartesian_prod
(
select a.ClientFK,a.LocationFK,b.Month
from tbl a
join loc2_data b
on 1=1
where a.locationfk=1
and a.ClientFK=15
)
select a.ClientFK,a.LocationFK,a.Month,isnull(v.Sales,0) as sales
from cartesian_prod a
left join tbl v
on a.ClientFK=v.ClientFK
and a.LocationFK=v.LocationFK
and a.Month=v.Month

PostgreSQL: How to write a query for this scenario

I have this below table.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 345 | 144.9 | 2015-09|
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
I need to write a query to show only the bill amount that has most recent transaction date (Date) like below.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
Please help me how do I do it.
MAX(Date) can be used if you want to display only the playid and the most recent date.
However, The issue with what you are trying to do, is that you want to display all the columns. And this where the ranking functions come into play. In this case you can use the row_number function like this:
SELECT PlayId, billid, amount, date
FROM
(
SELECT
PlayId, billid, amount, date,
row_number() over(partition by playid order by date dec) as rn
FROM tablename
) t
where rn = 1
The row_number() over(partition by playid order by date dec) will give each group of playid a ranking number, the first one (the lowest one) will be the one with the most recent date. Then you just need to filter on the row number equal to 1.
Postgres offers distinct on. This is simpler to write and often has the best performance:
select distinct on (playid) t.*
from t
order by playid, order by date desc;