SQL calculate time difference with previous row - sql

I have code that provides the order number, the estimated time of delivery, the actual time of delivery and the difference between the two times.
If the order is late I need to take that difference and add it to the next order to display the new estimated time of delivery.
How can I have SQL reach back to the previous row and get the calculated difference to add to the estimated time of delivery? LAG is not available since we are using 2012 SQL Shell.

This gets the datediff of current datetime from previous record
WITH orders AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY datetimecolumn) AS rownum
FROM mytable
)
SELECT DATEDIFF(second, curr.est_tod, prev.act_tod)
FROM orders curr
INNER JOIN orders prev
ON curr.rownum = prev.rownum - 1

Related

Grouping by last day of each month—inefficient running

I am attempting to pull month end balances from all accounts a customer has for every month. Here is what I've written. This runs correctly and gives me what I want—but it also runs extremely slowly. How would you recommend speeding it up?
SELECT DISTINCT
[AccountNo]
,SourceDate
,[AccountType]
,[AccountSub]
,[Balance]
FROM [DW].[dbo].[dwFactAccount] AS fact
WHERE SourceDate IN (
SELECT MAX(SOURCEDATE)
FROM [DW].[dbo].[dwFactAccount]
GROUP BY MONTH(SOURCEDATE),
YEAR (SOURCEDATE)
)
ORDER BY SourceDate DESC
I'd try a window function:
SELECT * FROM (
SELECT
[AccountNo]
,[SourceDate]
,[AccountType]
,[AccountSub]
,[Balance]
,ROW_NUMBER() OVER(
PARTITION BY accountno, EOMONTH(sourcedate)
ORDER BY sourcedate DESC
) as rn
FROM [DW].[dbo].[dwFactAccount]
)x
WHERE x.rn = 1
The row number will establish an incrementing counter in order of sourcedate descending. The counter will restart from 1 when the month in sourcedate changes (or the account number changes) thanks to the EOMONTH function quantising any date in a given month to be the last date of the month (2020-03-9 12:34:56 becomes 2020-03-31, as do all other datetimes in March). Any similar tactic to quantise to a fixed date in the month would also work such as using YEAR(sourcedate), MONTH(sourcedate)
You need to build a dimension table for the date with Date as PK, and your SourceDate in the fact table ref. that date dimension table.
Date dimension table can have month, year, week, is_weekend, is_holiday, etc. columns. You join your fact table with the date dimension table and you can group data using any columns in date table you want.
Your absolute first step should be to view the execution plan for the query and determine why the query is slow.
The following explains how to see a graphical execution plan:
Display an Actual Execution Plan
The steps to interpreting the plan and optimizing the query are too much for an SO answer, but you should be able to find some good articles on the topic by Googling. You could also post the plan in an edit to your question and get some real feedback on what steps to take to improve query performance.

how would i get the average of a previous date and update it?

I want to write a query that will have the average(that wont be hard) but when I get that average I want to save it somewhere. Let's I have a average save from last month table_a.last_month_average. And now I run the query again and this would be the current_month_average. I want to compare this two columns and see if the current_month_average increase from last_month_average.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
Is this possible in sql? or maybe there is a better way to do this?any suggestions will help.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
By my understanding, this operation is to select maximum month_average from all history records. So you don't need to keep a record of current_month_average and last_month_average. Instead, a table of all history month average is helpful. Assume there is table named monthaverage with columns (Id, Month, Average), you can query
SELECT TOP 1 T1.*
, CASE WHEN
T1.Average > (SELECT TOP 1 T2.Average
FROM monthaverage T2
WHERE T2.Month < T1.Month
ORDER BY Month DESC)
THEN 'Increased'
ELSE 'Not Increased'
END
FROM monthaverage T1
ORDER BY T1.Average DESC
If you have chance to run it from SQL SERVER 2012, you can leverage LAST_VALUE function. Query is like
SELECT TOP 1 *, CASE WHEN Average > LAST_VALUE(Average) OVER (ORDER BY Month) THEN 'Increased' ELSE 'Not Increased' END
FROM monthaverage
ORDER BY Average DESC

Get a Row if within certain time period of other row

I have a SQL statement that I am currently using to return a number of rows from a database:
SELECT
as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE
(ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
I am wanting to do a query that will get the row with the oldest DateScanned from this query and also get another row from the database if there was one that was within a certain period of time from this row (say 5 seconds for an example). The oldest record would be relatively simple by selecting the first record in a descending sort, but how would I also get the second record if it was within a certain time period of the first?
I know I could do this process with multiple queries, but is there any way to combine this process into one query?
The database that I am using is SQL Server 2008 R2.
Also please note that the DateScanned times are just placeholders and I am taking care of that in the application that will be using this query.
Here is a fairly general way to approach it. Get the oldest scan date using min() as a window function, then use date arithmetic to get any rows you want:
select t.* -- or whatever fields you want
from (SELECT as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID,
min(DateScanned) over () as minDateScanned, DateScanned
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
) t
where datediff(second, minDateScanned, DateScanned) <= 5;
I am not really sure of sql server syntax, but you can do something like this
SELECT * FROM (
SELECT
TOP 2
as1.AssetTagID,
as1.TagID,
as1.CategoryID,
as1.Description,
as1.HomeLocationID,
as1.ParentAssetTagID ,
ar.DateScanned,
LAG(ar.DateScanned) OVER (order by ar.DateScanned desc) AS lagging
FROM
Assets AS as1
INNER JOIN AssetsReads AS ar
ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
ORDER BY
ar.DateScanned DESC
)
WHERE
lagging IS NULL or DateScanned - lagging < '5 SECONDS'
I have tried to sort the results by DateScanned desc and then just the top most 2 rows. I have then used the lag() function on DateScanned field, to get the DateScanned value for the previous row. For the topmost row the DateScanned shall be null as its the first record, but for the second one it shall be value of the first row. You can then compare both of these values to determine whether you wish to display the second row or not
more info on the lagging function: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

How can I make this query run efficiently?

In BigQuery, we're trying to run:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT value, UTC_USEC_TO_DAY(timestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [Datastore.PerformanceDatum]
WHERE type = "MemoryPerf"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
which returns a relatively small amount of data. But we're getting the message:
Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby
What is making this query fail, the size of the subquery? Is there some equivalent query we can do which avoids the problem?
Edit in response to comments: If I add GROUP EACH BY (and drop the outer ORDER BY), the query fails, claiming GROUP EACH BY is here not parallelizable.
I wrote an equivalent query that works for me:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
If I run only the inner query, I get 3,660,624 results. Is your dataset bigger than that?
The outer select gives me only 4 results when grouped by day. I'll try a different grouping to see if I can hit a limit there:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, dtimestamp / 1000 as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
Runs too, now with 57,862 different groups.
I tried different combinations to get to the same error. I was able to get the same error as you doubling the amount of initial data. An easy "hack" to double the amount of data is changing:
FROM [io_sensor_data.moscone_io13]
To:
FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13]
Then I get the same error. How much data do you have? Can you apply an additional filter? As you are already partitioning the percentile_rank by day, can you add an additional query to only analyze a fraction of the days (for example, only last month)?

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data