Separate by month, with each month having data from beginning of time to end of month in question - SQL - sql

I currently have a window function that takes all my data and finds the latest value (amount) for each account and then averages this across all accounts.
Now I want to segment by month. The problem is if there has been no data for the account in the month specified we need to get the last possible value used. Therefore we need each month to segment from the beginning of the month to the chosen month. Currently the query provides one value 'average amount'. Ideally I would like this average value for each month from inception
SELECT AVG(amount) as "average amount"
FROM (
SELECT *
FROM(
SELECT account_no,amount,_date,row_number() over(partition by account_no order by _date desc) as rn, source
FROM ('another subquery too long to write out fully') k
) j
WHERE j.rn = 1
) l

Related

SQL List previous sales of dates in the current month

On the sql side, as seen in the table below, I want to subtract the sale of the day before today's date and have it printed in a separate column.
example; Subtract the sale on 2022-04-17 from the sale on 2022-04-18 and write it in the side column.
subtract the sales from the previous day until you find the last date in the table in a loop.
You can try to use LEAD window function with CASE WHEN expression
SELECT *,
CASE WHEN LEAD(n_Sales) OVER(PARTITION BY CustomerId,ProductId ORDER BY Date DESC) IS NULL
THEN Sales ELSE
Sales - LEAD(n_Sales) OVER(PARTITION BY CustomerId,ProductId ORDER BY Date DESC)
END
FROM T

SQL Summarizing the number of days in each month from a data set

I have a table that looks like this and I want to be able to summarize by ReportID the following. There should be one listing for a ReportID and type and the number of days in each month under the listed months. I don't want to have to figure out the begin and end date for the dataset, it should be automatic.
[
.
[
If you are working with Sql server and you need to calculate number of days between Startdate and EndDate within ReportId than you use window function sum and datetime function datediff (to count days):
select Type,ReportID,sum(datediff(dd,StartDate,EndDate))
over (partition by ReportId order by StartDate rows unbounded preceding)count_days_rep
from Table
Or if you need to summarize count days within ReportId and Type:
select Type,ReportID,sum(datediff(dd,StartDate,EndDate))
over (partition by ReportId,Type order by StartDate rows unbounded preceding) count_days_rep_type
from Table
EDIT:
At first, we counting days for startdate and enddate and then using Cross apply to get day's count in one column.
After that just summing values for each ReportId,Type:
(write comments):
--counting days for each month group by ReportId,Type
select ReportId,Type,Tab.month_num,
sum(Tab.count_days)count_days
from
(
select *,
--startdate: if startdate's month=enddate's month then difference between days
--ELSE count report days for startdate (counting days from this date to the end of the month)
case when datepart(month,StartDate)=datepart(month,EndDate) then datediff(dd,StartDate,EndDate)+1
else datepart(dd,EOMONTH(StartDate))-datepart(dd,StartDate)+1 end CountStartDays,
--stardate's month
datename(month,StartDate)MonthStartDate,
--enddate: if startdate's month=enddate's month then 0 (because this value taken into account already in CountStartDays) ELSE count report days for enddate
--(counting days from the begginning of enddate's month till date)
case when datepart(month,StartDate)=datepart(month,EndDate) then 0 else datepart(dd,EndDate) end CountEndDays,
--enddate's month
datename(month,EndDate)MonthEndDate
from Table
)y
CROSS APPLY
(values (MonthStartDate,CountStartDays),
(MonthEndDate, CountEndDays) ) Tab (month_num,count_days)
group by Type,ReportId,Tab.month_num
I hope you can appreciate my efforts.

I'm trying to calculate the difference between two weeks but I'm getting a weird peak when plotting the results ( SQL / BigQuery )

so I have this daily table that contains the number of visitors per store, everyday.
My tables columns are:
Date
Store
Number_of_Visitors
Views : number of views of the stores' ads.
So I first started with aggregating my table to a weekly table so that I can calculate the variance between a week and the next one.
Here is how I defined variance:
Variance = `Number Of Visitors in WEEK N+1 / Number of Visitors in WEEK N
I wrote the following query to do that (new table called: weekly)
SELECT
year_week,
min(date) as date,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS
FROM (
SELECT
*,
CONCAT(cast((extract(YEAR from date)), LPAD(cast((extract(WEEK from date)) as string), 2, '0') ) AS year_week
FROM `my-project`)
GROUP BY
year_week, Store
ORDER BY year_week
Then, in order to calculate the variance, I used the following query as well:
SELECT
base.*,
((base.TOTAL_VISITORS-lw.TOTAL_VISITORS)/lw.TOTAL_VISITORS) AS VAR_FF,
FROM
`weekly` base
JOIN (
SELECT
* EXCEPT (date),
DATE_ADD(DATE(TIMESTAMP(date)), INTERVAL 1 Week)AS n_date
FROM
`weekly` ) lw
ON
base.date = lw.n_date
AND base.Store= lw.Store
When I'm plotting the variance (VAR_FF) using Data Studio and I'm getting the following plot that doesnt 't seem to be making sense with the high peak in the middle;
I am thinking your code should look like this:
SELECT date_trunc(date, week) as year_week,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS,
(1 -
(LAG(SUM(Number_Of_Visitors)) OVER (PARTITION BY Store ORDER BY MIN(date) /
SUM(Number_Of_Visitors)
)
) as VAR_FF,
FROM`my-project`
GROUP BY year_week, Store
ORDER BY year_week;
I'm not sure what your weird calculations for calculating the week are really doing. This is based on the previous week in the data.

Microsoft SQL Server : getting highest cost for last purchase date

It's been a while since I used SQL so I'm a bit rusty. Let's say you want to compare the cost of things purchased from the previous month to this month. So an example would be a data table like this...
An item purchased on October cost $3 but the same item cost in September was $2 and $1. So you'd get the max cost of the max date (which would then be the $2 not $1). This would happen for every row of data.
I've done this with a stored scalar-value function, but when handling 100K+ rows of data, speeds are no where near fast. How would you do this with a select query in itself? What I did before was select both the max's in a select statement and only return 1, then call that function in a select statement. I want to do the same without stored procedures or functions for speed reasons. I know the following query won't work because you can only return 1 value, but it's something that I'm going for.
Select
Purchase, Item, USD,
(select MAX(Purchase), MAX(USD) from Table
where Item = 845 and MONTH(Purchase) = MONTH(Purchase) -1) LastCost
from Table
An example of what it should display can be portrayed as this.
What would be the best way to approach this?
Attention:
Select MAX(Purchase), MAX(USD) from Table will not return the highest cost for the highest date, but will return the highest date and the highest cost (no matter of what date).
This is how I would do this (on at least SQL Server 2012):
To get only one record per month and item (with the highest cost on the latest date), I use a numbering for the purchase date and cost (per item and month) with a descending sort order, first by date, then by cost. In the next step, I filter out only those records where the numbering is 1 (max cost for max date per item and month) and use the LAG function to access the previous cost:
WITH
numbering (Purchase, Item, Cost, p_no) AS (
SELECT Purchase,Item, Cost
,ROW_NUMBER() OVER (PARTITION BY Item, EOMONTH(Purchase) ORDER BY Purchase DESC, Cost DESC)
FROM tbl
)
SELECT Purchase, Item, Cost
, LAG(Cost) OVER (PARTITION BY Item ORDER BY Purchase) AS LastCost
FROM numbering
WHERE p_no = 1
SELECT Date, item, usd,
LAG(Date, 1) OVER(Order by date asc) as FormerDate,
LAG(usd, 1) OVER(Order by date asc) as FormerUsd
from (select date, item, max(usd) as usd from Data group by date, item) t
This basically returns the day before the current entry with its max price.
For SQL server 2017 below query will work for sample data
select purchase,item,
substring(usd,CHARINDEX(',',usd),len(usd)) as USD,
substring(usd,1,CHARINDEX(',',usd)) as lastcost from
(select max(purchase) as purchase,item, STRING_AGG (usd, ',') AS usd
from
(
select purchase,item,max(usd) as usd from t
group by purchase,item
) as T group by item
) T1
For your results, you need to use MAX() and ROW_NUMBER() with OVER(). Then partition the records by Item, Year, and Month. This will assure that the sort will be on each item, by each year, by each month. The ROW_NUMBER() will act as as simple way to put the last records at the top of the results, so you'll call row number 1 for each item to get the latest cost. After that, you use it as subquery to refine it as needed. For a start (your sample), you'll need to use CASE in order to split the USD (previous and Last cost). then you do the rest from there (simple methods).
I need to note that it's important to sort the records by year first, then month. then if you need to include the day, include it. This way you'll insure the records will be sorted correctly.
So, the query would look like something like this :
SELECT
MAX(Purchase) Purchase
, MAX(Item) Item
, MAX(CASE WHEN LastCost > USD THEN LastCost ELSE NULL END) USD
, MAX(CASE WHEN LastCost = USD THEN LastCost ELSE NULL END) LastCost
FROM (
SELECT
Purchase
, Item
, USD
, MAX(USD) OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase)) LastCost
, ROW_NUMBER() OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase) ORDER BY MONTH(Purchase)) RN
FROM Table
) D
WHERE
RN = 1
with data as (
select Item, eomonth(Purchase) as PurchaseMonth, max(USD) as MaxUSD
from T
group by Item, eomonth(Purchase)
)
select
PurchaseMonth, Item,
lag(MaxUSD) over (partition by Item order by PurchaseMonth) as PriorUSD
from data;

Top 10 based on last month showing 6 previous months

I want to show a graph with income from different parties over the last 6 months, but based on the top income of 10 people only based on the last month.
So this can change each month as the top 10 people can change when they deposit more money, so the graph will show these 10 people's deposits of the last 6 months, based on the last month deposit only.
I already used a LAG function and a RANK() OVER PARTITION function.
I don't understand why you'll need rank or lag functions.
You can simply use an IN statement:
SELECT * FROM YourTable t
WHERE t.depositDate between StartRangeDate and EndRangeDate
AND t.ID in(select ID from(SELECT s.id,sum(s.depositAmount) as total
from YourTable s
where s.date between ThisMonthStart and ThisMonthEnd
group by s.id)
order by total
limit 10)
You can play with the first select to select what ever you want/add a group by and sum them or I don't know.