I have a table month_totals, which looks like:
Name DateFrom Total
a 2017-01-01 34
b 2017-01-01 54
a 2017-02-01 22
b 2017-02-01 12
a 2017-03-01 34
b 2017-03-01 54
How to select latest Totals per Name where DateFrom < '2017-03-01' (possibly using analytical functions)
Following statement does not work as expected
SELECT name,
First_value(total)
OVER (
ORDER BY Max(datefrom) DESC)
FROM month_totals
WHERE datefrom < '2017-03-01'
GROUP BY NAME
The desired result should be
Name Total
a 22
b 12
You can try below using min aggregation
SELECT name,min(total)
FROM month_totals
GROUP BY NAME
OR you can use row_number()
select * from
(
SELECT name,
row_number() over(partition by name order by total)rn
FROM month_totals
WHERE datefrom < '2017-03-01'
)A where rn=1
Although you can use window functions, I think a correlated subquery is a simple-enough way to write the query and should have good performance;
select mt.*
from month_totals mt
where mt.datefrom = (select max(mt2.datefrom)
from month_totals mt2
where mt2.name = mt.name and mt2.datefrom < '2017-03-01'
);
Related
Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values (let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here
This are some sample queries I wrote:
SELECT
CAST(datecolumn AS DATE) AS DateColumn,
COUNT(*) AS count
FROM
dbo.myTableName
WHERE
status = 'stage1'
GROUP BY CAST(datecolumn AS DATE) ORDER BY DateColumn DESC;
SELECT
CAST(datecolumn AS DATE) AS DateColumn,
COUNT(*) AS count
FROM
dbo.myTableName
WHERE
status = 'stage2'
GROUP BY CAST(datecolumn AS DATE) ORDER BY DateColumn DESC;
This is the output from the 1st query:
DateColumn count
------------------
2022-05-26 23
2022-05-25 51
2022-05-24 39
2022-05-23 55
2022-05-22 27
2022-05-21 90
and this is the output from the 2nd query:
DateColumn count
-----------------
2022-05-26 31
2022-05-25 67
2022-05-24 38
2022-05-23 54
2022-05-22 28
I want to only have a single query that will output it like this
DateColumn stage1count stage2count
-----------------------------------
2022-05-26 23 31
2022-05-25 51 67
2022-05-24 39 38
2022-05-23 55 54
2022-05-22 27 28
Thanks for answer
Can you try this:
select cast(datecolumn as DATE) as DateColumn,
sum(case when status = 'stage1' then 1 else 0 end) as stage1count,
sum(case when status = 'stage2' then 1 else 0 end) as stage2count
from dbo.myTableName
where status in ('stage1', 'stage2')
group by cast(datecolumn as DATE)
order by DateColumn DESC
Another note: Most SQL systems treat datecolumn and DateColumn the same, so it is somewhat ambiguous which it is actually using in the group by and order by clauses. I think the order by is using the casted value in the select list, and the groupby might be using the base column (uncasted) but I'm not sure about that. If you want to avoid the ambiguity, you can use a delimited identifier "DateColumn" instead.
#hewszz, you mention that you also need this for the case where you have two tables. This might do the job if you have two tables:
select t1.DateColumn, stage1count, stage2count
from (select cast(datecolumn as DATE) as DateColumn,
count(*) as stage1count
from dbo.myTableName1
where status = 'stage1'
group by cast(datecolumn as DATE)) t1
full outer join
(select cast(datecolumn as DATE) as DateColumn,
count(*) as stage2count
from dbo.myTableName2
where status = 'stage2'
group by cast(datecolumn as DATE)) t2
on t1.DateColumn = t2.DateColumn
order by t1.DateColumn DESC
By grouping each table separately we make sure that DateColumn is unique on each side, so each row will join with at most one row from the other grouped query. By using a full outer join we make sure no rows get lost when we have only a stage1 or a stage2 record for a given day.
I have a problem with writing a query.
Row data is as follow :
DATE CUSTOMER_ID AMOUNT
20170101 1 150
20170201 1 50
20170203 1 200
20170204 1 250
20170101 2 300
20170201 2 70
I want to know when(which date) the sum of amount for each customer_id becomes more than 350,
How can I write this query to have such a result ?
CUSTOMER_ID MAX_DATE
1 20170203
2 20170201
Thanks,
Simply use ANSI/ISO standard window functions to calculate the running sum:
select t.*
from (select t.*,
sum(t.amount) over (partition by t.customer_id order by t.date) as running_amount
from t
) t
where running_amount - amount < 350 and
running_amount >= 350;
If for some reason, your database doesn't support this functionality, you can use a correlated subquery:
select t.*
from (select t.*,
(select sum(t2.amount)
from t t2
where t2.customer_id = t.customer_id and
t2.date <= t.date
) as running_amount
from t
) t
where running_amount - amount < 350 and
running_amount >= 350;
ANSI SQL
Used for the test: TSQL and MS SQL Server 2012
select
"CUSTOMER_ID",
min("DATE")
FROM
(
select
"CUSTOMER_ID",
"DATE",
(
SELECT
sum(T02."AMOUNT") AMOUNT
FROM "TABLE01" T02
WHERE
T01."CUSTOMER_ID" = T02."CUSTOMER_ID"
AND T02."DATE" <= T01."DATE"
) "AMOUNT"
from "TABLE01" T01
) T03
where
T03."AMOUNT" > 350
group by
"CUSTOMER_ID"
GO
CUSTOMER_ID | (No column name)
----------: | :------------------
1 | 03/02/2017 00:00:00
2 | 01/02/2017 00:00:00
db<>fiddle here
DB-Fiddle
SELECT
tmp.`CUSTOMER_ID`,
MIN(tmp.`DATE`) as MAX_DATE
FROM
(
SELECT
`DATE`,
`CUSTOMER_ID`,
`AMOUNT`,
(
SELECT SUM(`AMOUNT`) FROM tbl t2 WHERE t2.`DATE` <= t1.`DATE` AND `CUSTOMER_ID` = t1.`CUSTOMER_ID`
) AS SUM_UP
FROM
`tbl` t1
ORDER BY
`DATE` ASC
) tmp
WHERE
tmp.`SUM_UP` > 350
GROUP BY
tmp.`CUSTOMER_ID`
Explaination:
First I select all rows and subselect all rows with SUM and ID where the current row DATE is smaller or same as all rows for the customer. From this tabe i select the MIN date, which has a current sum of >350
I think it is not an easy calculation and you have to calculate something. I know It could be seen a little mixed but i want to calculate step by step. As fist step if we can get success for your scenario, I believe it can be made better about performance. If anybody can make better my query please edit my post;
Unfortunately the solution that i cannot try on computer is below, I guess it will give you expected result;
-- Get the start date of customers
SELECT MIN(DATE) AS DATE
,CUSTOMER_ID
INTO #table
FROM TABLE t1
-- Calculate all possible date and where is sum of amount greater than 350
SELECT t1.CUSTOMER_ID
,SUM(SELECT Amount FROM TABLE t3 WHERE t3.DATE BETWEEN t1.DATE
AND t2.DATE) AS total
,t2.DATE AS DATE
INTO #tableCalculated
FROM #table t1
INNER JOIN TABLE t2 ON t.ID = t2.ID
AND t1.DATE != t2.DATE
WHERE total > 350
-- SELECT Min amount and date for per Customer_ID
SELECT CUSTOMER_ID, MIN(DATE) AS DATE
FROM #tableCalculated
GROUP BY ID
SELECT CUSTOMER_ID, MIN(DATE) AS GOALDATE
FROM ( SELECT cd1.*, (SELECT SUM(AMOUNT)
FROM CustData cd2
WHERE cd2.CUSTOMER_ID = cd1.CUSTOMER_ID
AND cd2.DATE <= cd1.DATE) AS RUNNINGTOTAL
FROM CustData cd1) AS custdata2
WHERE RUNNINGTOTAL >= 350
GROUP BY CUSTOMER_ID
DB Fiddle
I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138
I have a table that looks like this(this is just some of the records and their are also more columns too but these are the ones I care about):
nbr amt date
1 10 10/30/2012
1 15 1/30/2012
1 50 11/30/2012
2 10 4/30/2012
2 1000 5/30/2012
2 45 1/15/2012
4 90 12/30/2012
4 89 8/30/2012
3 100 7/30/2012
I'm trying to select the nbr,amt, and date that corresponds to the max(amt) for each nbr using SQL Server 2012.
I have query like this so far which groups it by nbr and selects the max(amt) but it won't let me select date because its not in an aggregate function but if I put it in an aggregate function it selects max(date) which doesn't corrsepond to the actual date that goes with the amt:
,topamt as (
select
nbr
,amt
,date
,amtrank = row_number() over (partition by ah.member_nbr order by ah.tran_amt desc)
from HISTORY ah
amt>=10
and id=6061
and date between '11-01-2012' and '12-31-2012'
so if I change the query to this where am I defining it to grab the max(amt) the results aren't showing the max atleast.
Try using a ranking function:
with TopAmt as
(
select *
, amtRank = row_number() over (partition by nbr order by amt desc)
)
select nbr
, amt
, date
from TopAmt
where amtRank = 1