SQL accommodating meter rollover in utility meter readings - sql

The job is actually a machine cycle count that rolls over to zero at 32,000 but the utility / electricity / odometer analogy gets the idea across.
Let's say we have a three digit meter. After 999 it will roll over to 0.
Reading Value Difference
1 990 -
2 992 2
3 997 5
4 003 6 *
5 008 5
I have a CTE query generating the difference between rows but the line
Cur.Value - Prv.Value as Difference
on reading 4 above returns -994 due to the clock rollover. (It should return '6'.)
Can anyone suggest an SQL trick to accommodate the rollover?
e.g., Here's a trick to get around SQL's lack of "GREATEST" function.
-- SQL doesn't have LEAST/GREATEST functions so we use a math trick
-- to return the greater number:
-- 0.5*((A+B) + abs(A-B))
0.5 * (Cur._VALUE - Prv._VALUE + ABS(Cur._VALUE - Prv._VALUE)) AS Difference
Can anyone suggest a similar trick for the rollover problem?
Fiddle: http://sqlfiddle.com/#!3/ce9d4/10

You could use a CASEstatement to detect the negative value-- which indicates a rollover condition-- and compensate for it:
--Create CTE
;WITH tblDifference AS
(
SELECT Row_Number()
OVER (ORDER BY Reading) AS RowNumber, Reading, Value
FROM t1
)
SELECT
Cur.Reading AS This,
Cur.Value AS ThisRead,
Prv.Value AS PrevRead,
CASE WHEN Cur.Value - Prv.Value < 0 -- this happens during a rollover
THEN Cur.Value - Prv.Value + 1000 -- compensate for the rollover
ELSE Cur.Value - Prv.Value
END as Difference
FROM
tblDifference Cur
LEFT OUTER JOIN tblDifference Prv
ON Cur.RowNumber=Prv.RowNumber+1
ORDER BY Cur.Reading

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

How to isolate a row that contains a value unlike other values in that column in SQL Server query?

I am trying to write a query in SSMS 2016 that will isolate the value(s) for a group that are unlike the other values within a column. I can explain better with an example:
Each piece of equipment in our fleet has an hour meter reading that gets recorded from a handheld device. Sometimes people in the field enter in a typo meter reading which skews our hourly readings.
So a unit's meter history may look like this:
10/1/2019: 2000
10/2/2019: 2208
10/4/2019: 2208
10/7/2019: 2212
10/8/2019: 2
10/8/2019: 2225
...etc.
It's obvious that the "2" is a bad record because an hour meter can never decrease. edit: Sometimes the opposite extreme may occur, where they enter a reading like "22155" and then I would need the query to adapt to find values that are too high and isolate those as well. This data is stored in a meter history table where there is a single row for each meter reading. I am tasked with creating some type of procedure that will automatically isolate the bad data and delete those rows from the table. How can I write a query that understands the context of the meter history and knows that the 2 is bad?
Any tips welcome, thanks in advance.
You can use filter to get rid of "decreases":
select t.*
from (select t.*, lag(col2) over (order by col1) as prev_col2
from t
) t
where prev_col2 < col2;
I would not advise "automatically deleting" such records.
Automatically deleting data is risky, so I'm not certain I'd recommend unleashing that without some serious thought, but here's my idea based on your sample data showing that it's usually a pretty consistent number.
DECLARE #Median numeric(22,0);
;with CTE as
(
select t.*, row_number() over (order by t.value) as "rn" from t
)
select #Median = cte.value
where cte.rn = (select (SUM( MAX(RN) + MIN(RN)) / 2 from cte); -- floors if dividing an odd number
select * from dataReadings where reading_value < (0.8 * #median) OR reading_value > (1.2 * #median);
The goal of this is to give you a +/- 20% range of the median value, which shouldn't be as skewed by mistakes as an average would be. Again, this assumes that your values should fall into an acceptable range.
If this is meant to be an always-increasing reading and you shouldn't ever encounter lower values, Gordon's answer is perfect.
I would think to look at the variation of each reading from the mean reading value. (I picked up the lag() check from #Gordon Linoff's reply too.) For example:
create table #test (the_date date, reading int)
insert #test (the_date, reading) values ('10/1/2019', 2000)
, ('10/2/2019', 2208)
, ('10/4/2019', 2208)
, ('10/7/2019', 2212)
, ('10/8/2019', 2)
, ('10/8/2019', 2225)
, ('10/8/2019', 2224)
, ('10/9/2019', 22155)
declare #avg int, #stdev float
select #avg = avg(reading)
, #stdev = stdev(reading) * 0.5
from #test
select t.*
, case when reading < #avg - #stdev then 'SUSPICIOUS - too low'
when reading > #avg + #stdev then 'SUSPICIOUS - too high'
when reading < prev_reading then 'SUSPICIOUS - decrease'
end Comment
from (select t.*, lag(reading) over (order by the_date) as prev_reading
from #test t
) t
Which results in:
the_date reading prev_reading Comment
2019-10-01 2000 NULL NULL
2019-10-02 2208 2000 NULL
2019-10-04 2208 2208 NULL
2019-10-07 2212 2208 NULL
2019-10-08 2 2212 SUSPICIOUS - too low
2019-10-08 2225 2 NULL
2019-10-08 2224 2225 SUSPICIOUS - decrease
2019-10-09 22155 2224 SUSPICIOUS - too high

SQL Server query order by sequence serie

I am writing a query and I want it to do a order by a series. The first seven records should be ordered by 1,2,3,4,5,6 and 7. And then it should start all over.
I have tried over partition, last_value but I cant figure it out.
This is the SQL code:
set language swedish;
select
tblridgruppevent.id,
datepart(dw,date) as daynumber,
tblRidgrupper.name
from
tblRidgruppEvent
join
tblRidgrupper on tblRidgrupper.id = tblRidgruppEvent.ridgruppid
where
ridgruppid in (select id from tblRidgrupper
where corporationID = 309 and Removeddate is null)
and tblridgruppevent.terminID = (select id from tblTermin
where corporationID = 309 and removedDate is null and isActive = 1)
and tblridgrupper.removeddate is null
order by
datepart(dw, date)
and this is a example the result:
5887 1 J2
5916 1 J5
6555 2 Junior nybörjare
6004 2 Morgonridning
5911 3 J2
6467 3 J5
and this is what I would expect:
5887 1 J2
6555 2 Junior nybörjare
5911 3 J2
5916 1 J5
6004 2 Morgonridning
6467 3 J5
You might get some value by zooming out a little further and consider what you're trying to do and how else you might do it. SQL tends to perform very poorly with row by row processing as well as operations where a row borrows details from the row before it. You also could run into problems if you need to change what range you repeat at (switching from 7 to 10 or 4 etc).
If you need a number there somewhat arbitrarily still, you could add ROW_NUMBER combined with a modulo to get a repeating increment, then add it to your select/where criteria. It would look something like this:
((ROW_NUMBER() OVER(ORDER BY column ASC) -1) % 7) + 1 AS Number
The outer +1 is to display the results as 1-7 instead of 0-6, and the inner -1 deals with the off by one issue (the column starting at 2 instead of 1). I feel like there's a better way to deal with that, but it's not coming to me at the moment.
edit: Looking over your post again, it looks like you're dealing with days of the week. You can order by Date even if it's not shown in the select statement, that might be all you need to get this working.
The first seven records should be ordererd by 1,2,3,4,5,6 and 7. And then it should start all over.
You can use row_number():
order by row_number() over (partition by DATEPART(dw, date) order by tblridgruppevent.id),
datepart(dw, date)
The second key keeps the order within a group.
You don't specify how the rows should be chosen for each group. It is not clear from the question.

How to find daily differences over a flexible time period?

I have a very set of data as follows:
CustomerId char(6)
Points int
PointsDate date
with example data such as:
000021 0 01-JAN-2014
000021 10 02-JAN-2014
000021 20 03-JAN-2014
000021 30 06-JAN-2014
000021 40 07-JAN-2014
000021 10 12-JAN-2014
000034 0 04-JAN-2014
000034 40 05-JAN-2014
000034 20 06-JAN-2014
000034 40 08-JAN-2014
000034 60 10-JAN-2014
000034 80 21-JAN-2014
000034 10 22-JAN-2014
So, the PointsDate component is NOT consistent, nor is it contiguous (it's based around some "activity" happening)
I am trying to get, for each customer, the total amount of positive and negative differences in points, the number of positive and negative changes, as well as Max and Min...but ignoring the very first instance of the customer - which will always be zero.
e.g.
CustomerId Pos Neg Count(pos) Count(neg) Max Min
000021 40 30 3 1 40 10
000034 100 90 4 2 80 10
...but I have not a single clue how to achieve this!
I would put it in a cube, but a) there is only a single table and no other references and b) I know almost nothing about cubes!
The problem can be solved in regular TSQL with a common table expression that numbers the lines per customer, along with an outer self join that compares each row with the previous one;
WITH cte AS (
SELECT customerid, points,
ROW_NUMBER() OVER (PARTITION BY customerid ORDER BY pointsdate) rn
FROM mytable
)
SELECT cte.customerid,
SUM(CASE WHEN cte.points > old.points THEN cte.points - old.points ELSE 0 END) pos,
SUM(CASE WHEN cte.points < old.points THEN old.points - cte.points ELSE 0 END) neg,
SUM(CASE WHEN cte.points > old.points THEN 1 ELSE 0 END) [Count(pos)],
SUM(CASE WHEN cte.points < old.points THEN 1 ELSE 0 END) [Count(neg)],
MAX(cte.points) max,
MIN(cte.points) min
FROM cte
JOIN cte old
ON cte.rn = old.rn + 1
AND cte.customerid = old.customerid
GROUP BY cte.customerid
An SQLfiddle to test with.
The query would have been somewhat simplified using SQL Server 2012's more extensive analytic functions.
An approach similar to the one of Joachim Isaksson, but with more work in the CTE and less on the main query
WITH A AS (
SELECT c.CustomerID, c.Points, c.PointsDate
, Diff = c.Points - l.Points
, l.PointsDate lPointsDate
FROM Customer c
CROSS APPLY (SELECT TOP 1
Points, PointsDate
FROM Customer cu
WHERE c.CustomerID = cu.CustomerID
AND c.PointsDate > cu.PointsDate
ORDER BY cu.PointsDate Desc) l
)
SELECT CustomerID
, Pos = SUM(Diff * CAST(Sign(Diff) + 1 AS BIT))
, Neg = SUM(Diff * (1 - CAST(Sign(Diff) + 1 AS BIT)))
, [Count(pos)] = SUM(0 + CAST(Sign(Diff) + 1 AS BIT))
, [Count(neg)] = SUM(1 - CAST(Sign(Diff) + 1 AS BIT))
, Max(Points) [Max], Min(Points) [Min]
FROM A
GROUP BY CustomerID
SQLFiddle Demo
The condition that remove the first day is the JOIN (CROSS APPLY) in the CTE: the first day as no previous day, so is filtered out.
In the main query instead of using a CASE to filter the positive and negative difference I preferred the SIGN function:
this function return -1 for negative, 0 for zero and +1 for positive
shifting the value with Sign(Diff) + 1 mean that the new return values are 0, 1 and 2
the CAST to bit compress those to 0 for negative and 1 for zero or positive.
The 0 + in the definition of the [Count(pos)] create a implicit conversion to an integer value as BIT cannot be summed.
The 1 - to SUM and COUNT the negative difference is equivalent to a NOT: it invert the values of the BIT SIGN to 1 for negative and 0 for zero of positive.
I'll copy my comment from above: I know literally nothing about cubes, but it sounds like what you're looking for is just a cursor, is it not? I know everyone hates cursors, but that's the best way I know to compare consecutive rows without loading it down onto a client machine (which is obviously worse).
I see you mentioned in your response to me that you'd be okay setting it off to run overnight, so if you're willing to accept that sort of performance, I definitely think a cursor will be the easiest and quickest to implement. If this is just something you do here or there, I'd definitely do that. It's nobody's favorite solution, but it'd get the job done.
Unfortunately, yeah, at twelve million records, you'll definitely want to spend some time optimizing your cursor. I work frequently with a database that's around that size, and I can only imagine how long it'd take. Although depending on your usage, you might want to filter based on user, in which case the cursor will be easier to write, and I doubt you'll be facing enough records to cause much of a problem. For instance, you could just look at the top twenty users and test their records, then do more as needed.

Selecting SUM of TOP 2 values within a table with multiple GROUP in SQL

I've been playing with sets in SQL Server 2000 and have the following table structure for one of my temp tables (#Periods):
RestCTR HoursCTR Duration Rest
----------------------------------------
1 337 2 0
2 337 46 1
3 337 2 0
4 337 46 1
5 338 1 0
6 338 46 1
7 338 2 0
8 338 46 1
9 338 1 0
10 339 46 1
...
What I'd like to do is to calculate the Sum of the 2 longest Rest periods for each HoursCTR, preferably using sets and temp tables (rather than cursors, or nested subqueries).
Here's the dream query that just won't work in SQL (no matter how many times I run it):
Select HoursCTR, SUM ( TOP 2 Duration ) as LongestBreaks
FROM #Periods
WHERE Rest = 1
Group By HoursCTR
The HoursCTR can have any number of Rest periods (including none).
My current solution is not very elegant and basically involves the following steps:
Get the max duration of rest, group by HoursCTR
Select the first (min) RestCTR row that returns this max duration for each HoursCTR
Repeat step 1 (excluding the rows already collected in step 2)
Repeat step 2 (again, excluding rows collected in step 2)
Combine the RestCTR rows (from step 2 and 4) into single table
Get SUM of the Duration pointed to by the rows in step 5, grouped by HoursCTR
If there are any set functions that cut this process down, they would be very welcome.
The best way to do this in SQL Server is with a common table expression, numbering the rows in each group with the windowing function ROW_NUMBER():
WITH NumberedPeriods AS (
SELECT HoursCTR, Duration, ROW_NUMBER()
OVER (PARTITION BY HoursCTR ORDER BY Duration DESC) AS RN
FROM #Periods
WHERE Rest = 1
)
SELECT HoursCTR, SUM(Duration) AS LongestBreaks
FROM NumberedPeriods
WHERE RN <= 2
GROUP BY HoursCTR
edit: I've added an ORDER BY clause in the partitioning, to get the two longest rests.
Mea culpa, I did not notice that you need this to work in Microsoft SQL Server 2000. That version doesn't support CTE's or windowing functions. I'll leave the answer above in case it helps someone else.
In SQL Server 2000, the common advice is to use a correlated subquery:
SELECT p1.HoursCTR, (SELECT SUM(t.Duration) FROM
(SELECT TOP 2 p2.Duration FROM #Periods AS p2
WHERE p2.HoursCTR = p1.HoursCTR
ORDER BY p2.Duration DESC) AS t) AS LongestBreaks
FROM #Periods AS p1
SQL 2000 does not have CTE's, nor ROW_NUMBER().
Correlated subqueries can need an extra step when using group by.
This should work for you:
SELECT
F.HoursCTR,
MAX (F.LongestBreaks) AS LongestBreaks -- Dummy max() so that groupby can be used.
FROM
(
SELECT
Pm.HoursCTR,
(
SELECT
COALESCE (SUM (S.Duration), 0)
FROM
(
SELECT TOP 2 T.Duration
FROM #Periods AS T
WHERE T.HoursCTR = Pm.HoursCTR
AND T.Rest = 1
ORDER BY T.Duration DESC
) AS S
) AS LongestBreaks
FROM
#Periods AS Pm
) AS F
GROUP BY
F.HoursCTR
Unfortunately for you, Alex, you've got the right solution: correlated subqueries, depending upon how they're structured, will end up firing multiple times, potentially giving you hundreds of individual query executions.
Put your current solution into the Query Analyzer, enable "Show Execution Plan" (Ctrl+K), and run it. You'll have an extra tab at the bottom which will show you how the engine went about the process of gathering your results. If you do the same with the correlated subquery, you'll see what that option does.
I believe that it's likely to hammer the #Periods table about as many times as you have individual rows in that table.
Also - something's off about the correlated subquery, seems to me. Since I avoid them like the plague, knowing that they're evil, I'm not sure how to go about fixing it up.