Calculating difference in column value from one row to the next - sql

I am using an Access DB to keep track of Utility Usage on hundreds of accounts. The meters in these accounts have consumption values only per month. I need to take the consumption value of this month and subtract it from the value of the previous month. to get the consumption for this month. I know that in SQL Server, there is a lead/lag function that can calculate those differences. Is there a similar function in access? or is there a simple way to subtract the value in one row from the one above it?
Ex.
The first Line is Billed Date
The Second Line is the Meter Reading
The Third Line is Consumption
1/26/2014
2/25/2014
3/27/2014
4/28/2014
5/26/2014
7/29/2014
0
3163
4567
5672
7065
8468
1538
1625
1404
1105
1393
1403

I do not quite get some of your results, but I think you want something like:
SELECT Meters.MeterDate,
Meters.MeterReading,
(SELECT TOP 1 MeterReading
FROM Meters m WHERE m.MeterDate <Meters.MeterDate
ORDER BY MeterDate DESC) AS LastReading,
[MeterReading]-Nz([LastReading],0) AS MonthResult
FROM Meters
ORDER BY Meters.MeterReading;

Related

Perdurance of a mean over a threshold

I hope I can make this understandable, sorry if my English isn't perfect.
I have a database composed of dated data (measured every 5 minutes since March 2017).
My boss wants me to work on C# and/or SQL but I'm still a beginner in those ( Always been working on R).
The goal is to notice moments where the mean (for an hour or more) is superior to a threshold and for how long.
I've tried doing this by first doing a moving average;
Select DATEPART(YEAR,[Date]),DATEPART(MONTH,[Date]),
DATEPART(DAY,[Date]),DATEPART(HOUR,[Date]),
DATEPART(MINUTE,[Date]) as "minute", AVG(Mesure)
OVER(Order by DATEPART(YEAR,[Date]) ROWS
between 11 PRECEDING and CURRENT ROW) as "moving_average"
from My_data_base
where Code_Prm = 920
I do have to keep the "Where" clause because it's how I can select only the value I need to work on .
From here I don't know if I could find a way to add the "Perdurance" of the mean, for example by concatenating when multiples rows return an average superior to X.
Or if I should rely on C# with multiple if conditions to try and get what I want
Hope this is understandable, thanks
EDIT :
The data is is stored in 3 fields( I don't know if there is a way to show it a better way)
Date
Code_prm
Mesure
2017-03-10 11:18:00
920
X
2017-03-10 11:18:00
901
X
2017-03-10 11:18:00
903
X
2017-03-10 11:23:00
920
X
The expected result would be the average for an hour, for example : From 11:18 too 12:18, only if the average is superior to X.( I think I kind of did it with the moving average)
The next step and what I'm looking for how I could know if the mean superior to X lasts more than an hour, and then how much is it lasting.
Hour is "any hour" I guess, so 12 rows as there is a value every 5 minutes and
I'm sure there are no missing values!

SQL performance issues with window functions on daily basis

Given ~23 million users, what is the most efficient way to compute the cumulative number of logins within the last X months for any given day (even when no login was performed) ? Start date of a customer is its first ever login, end date is today.
Desired output
c_id day nb_logins_past_6_months
----------------------------------------------
1 2019-01-01 10
1 2019-01-02 10
1 2019-01-03 9
...
1 today 5
➔ One line per user per day with the number of logins between current day and 179 days in the past
Approach 1
1. Cross join each customer ID with calendar table
2. Left join on login table on day
3. Compute window function (i.e. `sum(nb_logins) over (partition by c_id order by day rows between 179 preceding and current row)`)
+ Easy to understand and mantain
- Really heavy, quite impossible to run on daily basis
- Incremental does not bring much benefit : still have to go 179 days in the past
Approach 2
1. Cross join each customer ID with calendar table
2. Left join on login table on day between today and 179 days in the past
3. Group by customer ID and day to get nb logins within 179 days
+ Easier to do incremental
- Table at step 2 is exceeding 300 billion rows
What is the common way to deal with this knowing this is not the only use case, we have to compute other columns like this (nb logins in the past 12 months etc.)
In standard SQL, you would use:
select l.*,
count(*) over (partition by customerid
order by login_date
range between interval '6 month' preceding and current row
) as num_logins_180day
from logins l;
This assumes that the logins table has a date of the login with no time component.
I see no reason to multiply 23 million users by 180 days to generate a result set in excess of 4 million rows to answer this question.
For performance, don't do the entire task all at once. Instead, gather subtotals at the end of each month (or day or whatever makes sense for your data). Then SUM up the subtotals to provide the 'report'.
More discussion (with a focus on MySQL): http://mysql.rjweb.org/doc.php/summarytables
(You should tag questions with the specific product; different products have different syntax/capability/performance/etc.)

How to calculated on created fields? Why the calculation is wrong?

I am working on the workforce analysis project. And I did some case when conditional calculations in Google Data Studio. However, when I successfully conducted the creation of the new field, I couldn't do the calculation again based on the fields I created.
Based on my raw data, I generated the start_headcount, new_hires, terminated, end_headcount by applying the Case When conditional calculations. However, I failed in the next step to calculate the Turnover rate and Retention rate.
The formula for Turnover rate is
terms/((start_headcount+end_headcount)/2)
for retention is
end_headcount/start_headcount
However, the result is wrong. Part of my table is as below:
Supervisor sheadcount newhire terms eheadcount turnover Retention
A 1 3 1 3 200% 0%
B 6 2 2 6 200% 500%
C 6 1 3 4 600% 300%
So the result is wrong. The turnover rate for A should be 1/((1+3)/2)=50%; For B should be 2/((6+6)/2)=33.33%.
I don't know why it is going wrong. Can anyone help?
For example, I wrote below for start_headcount for each employee
CASE
WHEN Last Hire Date<'2018-01-01' AND Termination Date>= '2018-01-01'
OR Last Hire Date<'2018-01-01' AND Termination Date IS NULL
THEN 1
ELSE 0
END
which means if an employee meets the above standard, will get 1. And then they all grouped under a supervisor. I think it might be the problem why the turnover rate in sum is wrong since it is not calculated on the grouped date but on each record and then summed up.
Most likely you are trying to do both steps within the same query and thus newly created fields like start_headcount, etc. not visible yet within the same select statement - instead you need to put first calculation as a subquery as in example below
#standardSQL
SELECT *, terms/((start_headcount+end_headcount)/2) AS turnover
FROM (
<query for your first step>
)

Accessing data from multiple rows

So I am having some troubles writing a sql query. It is a financial stock problem and we have two table, one named A and one named B. The dates are divided into periods, so we want to calculate the investment value for the next period based on some criteria of the current period.
For instance, to calculate the investment value of period 2, we first need to compare the price of stock of the first date of period 1 from table A with the strike price of the same date from table B, and then take the max price and divide it by the investment of first date which is given as 10,000. After that, you just simply do 10000/max(price,strike), and then multiply the value by 10000. I know how to get the max which can be done by either using CASE or max, but the difficulty I am facing is that how can I actually get the investment value of the previous period. The example above is an exception because we know the value of the first day. However, if you want to calculate the investment value for period 3, you will first need the value of period 2, and this is where I am stuck at.
EDIT
Table A
Date Price
1/16/15 206
2/20/15 208
3/20/15 205
Table B
Date Strike
1/16/15 195
2/20/15 201
3/20/15 206
For example, the number of shares on 2/20/2015 is 10000(206) = 48.54
And the investment value is 48.54 * 208 = 10096.
206 is the max of 206 and 195, and 208 is the max of 208 and 201.
Thanks in advance!

identifying trends and classifying using sql

i have a table xyz, with three columns rcvr_id,mth_id and tpv. rcvr_id is an id given to a customer, mth_id is a column which stores the month number( mth_id is calculated as (2012-1900) * 12 + 1,2,3.. ( depending on the month). So for example Dec 2011 will have month_id of 1344, Jan 2012 1345 etc. Tpv is a variable which shows the customers transaction amount.
Example table
rcvr_id mth_id tpv
1 1344 23
2 1344 27
3 1344 54
1 1345 98
3 1345 102
.
.
.
so on
P.S if a customer does not have a transaction in a given month, his row for that month wont exist.
Now, the question. Based on transactions for the months 1327 to 1350, i need to classify a customer as steady or sporadic.
Here is a description.
The above image is for 1 customer. i have millions of customers.
How do i go about it? I have no clue how to identify trends in sql .. or rather how to do it the best way possible.
ALSO i am working on teradata.
Ok i have found out how to get standard deviation. Now the important question is : How do i set a standard deviation limit on my own? i just cant randomly say "if standard dev is above 40% he is sporadic else steady". I thought of calculating average of standard deviation for all customers and if it is above that then he is sporadic else steady. But i feel there could be a better logic
I would suggest the STDDEV_POP function - a higher value indicates a greater variation in values.
select
rcvr_id, STDDEV_POP(tpv)
from yourtable
group by rcvr_id
STDDEV_POP is the function for Standard Deviation
If this doesn't differentiate enough, you may need to look at regression functions and variance.