Cumulative Difference - sql

I have a table
Meter_Reading
MeterID | Reading | DateRead |
1 10 1-Jan-2012
1 20 2-Feb-2012
1 30 1-Mar-2012
1 60 2-Apr-2012
1 80 1-May-2012
The reading is a cumulative value where i would need to calculate the difference from the previous month and the current month.
Could you help me figure out how to generate a view where i can see the consumption (previous month reading - current month reading) for each month?
I had tried the between function:
select address, reading as Consumption, dateread
from ServiceAddress, reading, meter
where address like '53 Drip Drive%'
and dateread
between (to_date('01-JAN-2012','DD-MON-YYYY')) and (to_date('30-SEP-2012', 'DD-MON-YYYY'))
and serviceaddress.serviceaddid = meter.serviceaddid and meter.meterid = reading.meterid;
but all i got was the readings for each month not the difference.
How could I make it list the monthly consumption?

Try with analytic functions. Something like this should do the trick:
SELECT meterid, dateread,
reading - LAG(reading, 1, 0) OVER(PARTITION BY meterid ORDER BY dateread)
FROM meter_reading

You can use the LAG function to get the reading for the prior month. The query you posted references three tables-- ServiceAddress, Reading, and Meter none of which are the Meter_Reading table you posted the structure and data for. I'll ignore the query you posted since I'm not sure what the data in those tables looks like and focus on the Meter_Reading table that you posted data for
SELECT MeterID,
DateRead,
Reading,
PriorReading,
Reading - PriorReading AmountUsed
FROM (SELECT MeterID,
DateRead,
Reading,
nvl(lag(Reading) over (partition by MeterID
order by DateRead),
0) PriorReading
FROM meter_reading)
I assume that if there is no prior reading that you want to assume that the prior reading was 0

Related

Cross apply historical date range in BigQuery

I have a growing table of orders which looks something like this:
units_sold
timestamp
1
2021-03-02 10:00:00
2
2021-03-02 11:00:00
4
2021-03-02 12:00:00
3
2021-03-03 13:00:00
9
2021-03-03 14:00:00
I am trying to partition the table into each day, and gather statistics on units sold on the day, and on the day before. I can pretty easily get the units sold today and yesterday for just today, but I need to cross apply a date range for every date in my orders table.
The expected result would look like this:
units_sold_yesterday
units_sold_today
date_measured
12
7
2021-03-02
NULL
12
2021-03-03
One way of doing it, is by creating or appending the order data every day to a new table. However, this table could grow very large and also I need historical data as well.
In my minds eye I know I have cascade the data, so that BigQuery compares the data to "todays date" which would shift across a all the dates in the table.
I'm thinking this shift could come from a cross apply of all the distinct dates in the table, and so I would get a copy of the orders table for each date, but with a different "todays date" column that I can extrapolate the units_sold_today data from by using that column to date-diff the salesdate to.
This would still, however, create a massive amount of data to process, and I guess maybe there is a simple function for this in BigQuery or standard SQL syntax.
This sounds like aggregation and lag():
select timestamp_trunc(timestamp, day), count(*) as sold_today,
lag(count(*)) over (order by min(timestamp)) as sold_yesterday
from t
group by 1
order by 1;
Note: This assumes that you have data for every day.
Consider below
select date_measured, units_sold_today,
lag(units_sold_today) over(order by date_measured) units_sold_yesterday,
from (
select date(timestamp) date_measured,
sum(units_sold) units_sold_today
from `project.dataset.table`
group by date_measured
)
if applied to sample data in your question - output is

Grouping by last day of each month—inefficient running

I am attempting to pull month end balances from all accounts a customer has for every month. Here is what I've written. This runs correctly and gives me what I want—but it also runs extremely slowly. How would you recommend speeding it up?
SELECT DISTINCT
[AccountNo]
,SourceDate
,[AccountType]
,[AccountSub]
,[Balance]
FROM [DW].[dbo].[dwFactAccount] AS fact
WHERE SourceDate IN (
SELECT MAX(SOURCEDATE)
FROM [DW].[dbo].[dwFactAccount]
GROUP BY MONTH(SOURCEDATE),
YEAR (SOURCEDATE)
)
ORDER BY SourceDate DESC
I'd try a window function:
SELECT * FROM (
SELECT
[AccountNo]
,[SourceDate]
,[AccountType]
,[AccountSub]
,[Balance]
,ROW_NUMBER() OVER(
PARTITION BY accountno, EOMONTH(sourcedate)
ORDER BY sourcedate DESC
) as rn
FROM [DW].[dbo].[dwFactAccount]
)x
WHERE x.rn = 1
The row number will establish an incrementing counter in order of sourcedate descending. The counter will restart from 1 when the month in sourcedate changes (or the account number changes) thanks to the EOMONTH function quantising any date in a given month to be the last date of the month (2020-03-9 12:34:56 becomes 2020-03-31, as do all other datetimes in March). Any similar tactic to quantise to a fixed date in the month would also work such as using YEAR(sourcedate), MONTH(sourcedate)
You need to build a dimension table for the date with Date as PK, and your SourceDate in the fact table ref. that date dimension table.
Date dimension table can have month, year, week, is_weekend, is_holiday, etc. columns. You join your fact table with the date dimension table and you can group data using any columns in date table you want.
Your absolute first step should be to view the execution plan for the query and determine why the query is slow.
The following explains how to see a graphical execution plan:
Display an Actual Execution Plan
The steps to interpreting the plan and optimizing the query are too much for an SO answer, but you should be able to find some good articles on the topic by Googling. You could also post the plan in an edit to your question and get some real feedback on what steps to take to improve query performance.

How to check if dates overlap on different lines in SQL Server?

I have a database with electricity meter readings. Sometimes people get a new meter and then their original meter gets an end date and the new meter gets a start date and the end date remains NULL. This can happen multiple times in a year and I want to know if there are no gaps in measurement. In other words, I need to figure out if end date 1 is the same as start date 2 and so on.
Sample data:
cust_id meter_id start_date end_date
--------------------------------------------------
a 1 2017-01-01 2017-05-02
a 2 2017-05-02 Null
b 3 2017-01-01 2017-06-01
b 4 2017-06-05 Null
This is what the data looks like and the result I am looking for is that for customer a the end date of meter 1 is equal to the start date of meter 2. For customer b however, there are 4 days between the end date of meter 3 and the start date of meter 4. That is something I want to flag.
I found customers for whom this can happen up to 8 times in the period I am researching. I tried something with nested queries and very complex cases but even I lost my way around it, so I was wondering if someone here has an idea of how to get to the answer a little smarter.
You can get the offending rows using lag():
select r.*
from (select r.*,
lag(end_date) over (partition by cust_id, meter_id order by start_date) as prev_end_date,
row_number() over (partition by cust_id, meter_id order by start_date) as seqnum
from readings r
) r
where prev_end_date <> start_date or prev_end_date is null and seqnum > 1;
Guessing there is now a better way to pull this off using LEAD and LAG, but I wrote an article in SQL 2008R2 called T-SQL: Identify bad dates in a time series where you can modify the big cte in the middle of the article to handle your definition of a bad date.
Good luck. There's too much detail in the article to post in a single SO question, otherwise I'd do that here.

Display a rolling 12 weeks chart in SSRS report

I am calling the data query in ssrs like this:
SELECT * FROM [DATABASE].[dbo].[mytable]
So, the current week is the last week from the query (e.g. 3/31 - 4/4) and each number represents the week before until we have reached the 12 weeks prior to this week and display in a point chart.
How can I accomplish grouping all the visits for all locations by weeks and adding it to the chart?
I suggest updating your SQL query to Group by a descending Dense_Rank of DatePart(Week,ARRIVED_DATE). In this example, I have one column for Visits because I couldn't tell which columns you were using to get your Visit count:
-- load some test data
if object_id('tempdb..#MyTable') is not null
drop table #MyTable
create table #MyTable(ARRIVED_DATE datetime,Visits int)
while (select count(*) from #MyTable) < 1000
begin
insert into #MyTable values
(dateadd(day,round(rand()*100,0),'2014-01-01'),round(rand()*1000,0))
end
-- Sum Visits by WeekNumber relative to today's WeekNumber
select
dense_rank() over(order by datepart(week,ARRIVED_DATE) desc) [Week],
sum(Visits) Visits
from #MyTable
where datepart(week,ARRIVED_DATE) >= datepart(week,getdate()) - 11
group by datepart(week,ARRIVED_DATE)
order by datepart(week,ARRIVED_DATE)
Let me know if I can provide any more detail to help you out.
You are going to want to do the grouping of the visits within SQL. You should be able to add a calculated column to your table which is something like WorkWeek and it should be calculated on the days difference from a certain day such as Sunday. This column will then by your X value rather than the date field you were using.
Here is a good article that goes into first day of week: First Day of Week

Oracle SQL Sub-Query how to use result from query A to restrict query B

Goal:
Combine two queries I currently run.
Have the WEEK from query 1 be a filtering criteria for query 2.
Query 1:
----------------------------------------------------
-- ************************************************
-- Accounts Recieveable (WEEKLY) snapshot
-- ************************************************
----------------------------------------------------
SELECT
TRUNC(TX.ORIG_POST_DATE,'WW') AS WEEK,
SUM(TX.AMOUNT) AS OUTSTANDING
FROM
TX
WHERE
--Transaction types
(TX.DETAIL_TYPE = "Charges" OR
TX.DETAIL_TYPE = "Payments" OR
TX.DETAIL_TYPE = "Adjustments")
GROUP BY
TRUNC(tx.ORIG_POST_DATE,'WW')
ORDER BY
TRUNC(tx.ORIG_POST_DATE,'WW')
Output Query 1:
WEEK OUTSTANDING
1/1/2012 18203.95
1/8/2012 17605
1/15/2012 19402.33
1/22/2012 18693.45
1/29/2012 19100
Query 2:
----------------------------------------------------
-- ************************************************
-- Weekly Charge AVG over previous 13 weeks based on WEEK above
-- ************************************************
----------------------------------------------------
SELECT
sum(tx.AMOUNT)/91
FROM
TX
WHERE
--Post date
TX.ORIG_POST_DATE <= WEEK AND
TX.ORIG_POST_DATE >= WEEK-91 AND
--Charges
(TX.DETAIL_TYPE = "Charge")
Output Query 2:
thirteen_Week_Avg
1890.15626
Desired Output
WEEK OUTSTANDING Thirteen_Week_Avg
1/1/2012 18203.95 1890.15626
1/8/2012 17605 1900.15626
1/15/2012 19402.33 1888.65132
1/22/2012 18693.45 1905.654
1/29/2012 19100 1900.564
Note the Thirteen_Week_Avg is 13 weeks prior to the "WEEK" Field. So it changes each week as the window of the average moves forward.
Also what tutorials do you guys know of that I could read to better understand the solution this type of question?
Try using an analytic function such as:
select WEEK, sum(OUTSTANDING) as OUTSTANDING, THIRTEEN_WEEK_AVG
from (select trunc(TX.ORIG_POST_DATE, 'WW') as WEEK
,AMOUNT as OUTSTANDING
,avg(
TX.AMOUNT)
over (order by trunc(TX.ORIG_POST_DATE, 'WW')
range numtodsinterval(7 * 13, 'day') preceding)
as THIRTEEN_WEEK_AVG
from TX
where (TX.DETAIL_TYPE = 'Charges'
or TX.DETAIL_TYPE = 'Payments'
or TX.DETAIL_TYPE = 'Adjustments'))
group by WEEK, THIRTEEN_WEEK_AVG
order by WEEK
An introduction to analytic functions can be found here. And how NUMTODSINTERVAL works is here.
My first thought is this is best handled by a stored procedure that sets two cursors, one for each of the queries and each cursor takes in a week parameter. You could call the first cursor that outputs the week and outstanding and have this loop through however many times and move back 1 week each time through. then pass that week to the thirteen week avg cursor and let it output the avg amount.
If you just want it on the screen you can use dbms_output.put_line. If you want to write it to a file such as a csv then you need to set a filehandler and all the associated plumbing to create/open/write/save a file.
O'reilly has a pretty good pl/sql book that explains procs and cursors well.