Split float between list of numbers - sql

I have problem with splitting 0.00xxx float values between numbers.
Here is example of input data
0 is sum of 1-3 float numbers.
As result I want to see rounded numbers without loosing sum of 1-3:
IN:
0 313.726
1 216.412
2 48.659
3 48.655
OUT:
0 313.73
1 216.41
2 48.66
3 48.66
How it should work:
Idea is to split the lowest rest(in our example it's 0.002 from value 216.412) between highest. 0.001 to 48.659 = 48.66 and 0.001 to 48.655 = 48.656 after this we can round numbers without loosing data.
After sitting on this problem yesterday I found the solution. The query as I think should look like this.
select test.*,
sum(value - trunc(value, 2)) over (partition by case when id = 0 then 0 else 1 end) part,
row_number() over(partition by case when id = 0 then 0 else 1 end order by value - trunc(value, 2) desc) rn,
case when row_number() over(partition by case when id = 0 then 0 else 1 end order by value - trunc(value, 2) desc) / 100 <=
round(sum(value - trunc(value, 2)) over (partition by case when id = 0 then 0 else 1 end), 2) then trunc(value, 2) + 0.01 else trunc(value, 2) end result
from test;
But still for me it's strange to add const value "0.01" while getting the result.
Any ideas to improve this query?

You could use the round() sql function when presenting results. Round()'s second argument is the number of significant digits you want to round the number to. Issuing this select on the test table:
select id, round(value, 2) from test;
gives you the following result
0 313.73
1 216.41
2 48.66
3 48.65
Generally, you can use the stored numbers for summations and then use the round() function for presentation of the results: Here is a way to do the sum with the full significant digits and then use the round() function for presenting the final result:
select sum(value) from test where id != 0
gives the result: 313.726
select round(sum(value), 2) from test where id != 0
gives the result: 313.73
By the way allow me two observations:
1) the rounding you give for id = 3 is confusing to me: 48.654 rounds to 48.65 rather than 48.66 in two significant digits. Am I missing something?
2) Strictly speaking this issue is not a pl/sql issue as labeled. It is totally in the realm of sql. However there is a round() function in pl/sql as well and the same principles apply.

select id, value,
case when id <> max(id) over () then round(value, 2)
else round(value, 2) - sum(round(value, 2)) over () +
round(first_value(value) over (order by id), 2) * 2
end val_rnd
from test
Output:
ID VALUE VAL_RND
------ ---------- ----------
0 313.726 313.73
1 216.413 216.41
2 48.659 48.66
3 48.654 48.66
Above query works, but it moves all difference to last row. And this is not "honest" and maybe not what you are after for other scenarios.
The most "unhonest" behavior is observable with big number of values, all equal 0.005.
To make full distribution you need to:
sum all original values in sub-rows and subtract rounded total value from row with id 0,
use row_number() to sort sub-rows in order of difference between rounded value and original value (maybe descending, it depends on sign of difference, use sign(), abs),
assign to each row value increased by .01 (or decreased if difference < 0 ) until it reaches difference/.01 (use case when ),
union row with id = 0 containing rounded sum
optionally sort results.
It's hard (but achievable) in one query. Alternative is some PL/SQL procedure or function, which might be more readable.

If I get you correct, you don't want to use round because rounding the partial numbers don't match the rounded total.
In this case simple trick is applied. You use round for all but the last number. The last fraction is calculated as a difference between the rounded sum and the rounded parts so far (all but the last one).
You may express this with analytical function as follows
WITH total AS
(SELECT id, value, ROUND(value,2) value_rounded FROM test WHERE id = 0
),
rounded AS
( SELECT id, value, ROUND(value,2) value_rounded FROM test WHERE id != 0
)
SELECT id, value_rounded FROM total
UNION ALL
SELECT id,
CASE
WHEN row_number() over (order by id) != COUNT(*) over ()
THEN
/* not the last row - regular result */
value_rounded
ELSE
/* last row - corrected result */
(select value_rounded from total) - SUM(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
END AS value
FROM rounded
ORDER BY id;
Note that this is the test for the last numer
row_number() over (order by id) != COUNT(*) over ()
and this is the sum of all parts from begin (UNBOUNDED PRECEDING) up to the one but last ( 1 PRECEDING)
SUM(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
I splitted your data in two source total - one row with the total and and rounded parts.
UPDATE
In some case the last corrected number shows an "ugly" large difference to the original value,
as the differences in one rounding direction are higher that in the opposite one.
The following select takes this in account and distributes the difference between the parts.
The example bellow illustrated this on teh example with lot of 0.05s
WITH nums AS
(SELECT rownum id, 0.005 value FROM dual connect by level <= 5
),
rounded AS
( SELECT id, value, ROUND(value,2) value_rounded FROM nums
),
with_diff as
(SELECT id, value, value_rounded,
-- difference so far - between the exact SUM and SUM of rounded parts
-- cut to two decimal points
floor(100* (
sum(value) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -
sum(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))
/ 100 diff_so_far
FROM rounded),
delta_diff as
(select id, value, value_rounded,DIFF_SO_FAR,
DIFF_SO_FAR - LAG(DIFF_SO_FAR,1,0) over (order by ID) as diff_delta
from with_diff)
SELECT id, value,
CASE
WHEN row_number() over (order by id) != COUNT(*) over ()
THEN
/* not the last row - take the rounded value and ... */
value_rounded +
/* ... add or subtract the delta difference */
diff_delta
ELSE
/* last row - corrected result */
round(sum(value) over(),2) - SUM(value_rounded + diff_delta) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
END AS value_rounded, diff_delta
FROM delta_diff
ORDER BY id;
ID VALUE VALUE_ROUNDED DIFF_DELTA
---------- ---------- ------------- ----------
1 ,005 0 -0,01
2 ,005 ,01 0
3 ,005 0 -0,01
4 ,005 ,01 0
5 ,005 ,01 -0,01

pragamtic solution based on following rules:
1) check the difference between the rounded sum and sum of rounded parts.
select round(sum(value),2) - sum(round(value,2)) from test where id != 0;
2) apply this difference
e.g. if you get 0.01, this means one rounded part must be increased by 0.01
if you get -.02, it means two rounded parts must be decreased by 0.01
The query below simple correct the last N parts:
with diff as (
select round(sum(value),2) - sum(round(value,2)) diff from test where id != 0
), diff_values as
(select sign(diff)*.01 diff_value, abs(100*diff) corr_cnt
from diff)
select id, round(value,2)
+ case when row_number() over (order by id desc) <= corr_cnt then diff_value else 0 end result
from test, diff_values where id != 0
order by id;
ID RESULT
---------- ----------
1 216,41
2 48,66
3 48,66
If the number of corrected records in much higher than two, check the data and the rounding precision.

Related

Get column sum and use to calculate percent of total, why doesn't work with CTEs

I did this following query, however it gave the the result of 0 for each orderStatusName, does anyone know where is the problem?
with tbl as (
select s.orderstatusName, c.orderStatusId,count(c.orderId) counts
from [dbo].[ci_orders] c left join
[dbo].[ci_orderStatus] s
on s.orderStatusId = c.orderStatusId
where orderedDate between '2018-10-01' and '2018-10-29'
group by orderStatusName, c.orderStatusId
)
select orderstatusName, counts/(select sum(counts) from tbl as PofTotal) from tbl
the result is :0
You're using what is known as integer math. When using 2 integers in SQL (Server) the return value is an integer as well. For example, 2 + 2 = 4, 5 * 5 = 25. The same applies to division 8 / 10 = 0. That's because 0.8 isn't an integer, but the return value will be one (so the decimal points are lost).
The common way to change this behaviour is to multiply one of the expressions by 1.0. For example:
counts/(select sum(counts) * 1.0 from tbl) as PofTotal
If you need more precision, you can increase the precision of the decimal value of 1.0 (i.e. to 1.000, 1.0000000, etc).
Use window functions and proper division:
select orderstatusName, counts * 1.0 / total_counts
from (select t.*, sum(counts) over () as total_counts
from tbl
) t;
The reason you are getting 0 is because SQL Server does integer division when the operands are integers. So, 1/2 = 0, not 0.5.

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Case statement not supporting horizontal search with column name in query

I am new to ORACLE SQL and I am trying to learn it quickly.
I have following table definition:
Create table Sales_Biodata
(
Saler_Id INTEGER NOT NULL UNIQUE,
Jan_Sales INTEGER NOT NULL,
Feb_Sales INTEGER NOT NULL,
March_Sales INTEGER NOT NULL
);
Insert into Sales_Biodata (SALER_ID,JAN_SALES,Feb_Sales,March_Sales)
values ('101',22,525,255);
Insert into Sales_Biodata (SALER_ID,JAN_SALES,Feb_Sales,March_Sales)
values ('102',22,55,25);
Insert into Sales_Biodata (SALER_ID,JAN_SALES,Feb_Sales,March_Sales)
values ('103',45545,5125,2865);
My objective is the following:
1- Searching the highest sales and second highest sales against each saler_id.
For example in our above case:
For saler_id =101 highest sales is 525 and second highest sales is 255
similary for saler_id=102 highest sales is 55 and second highest sales is 25
For my above approach I am using the following query:
Select Saler_Id,
(
CASE
WHEN JAN_SALES>FEB_SALES AND JAN_SALES>MARCH_SALES THEN JAN_SALES
WHEN FEB_SALES>JAN_SALES AND FEB_SALES>MARCH_SALES THEN FEB_SALES
WHEN MARCH_SALES>JAN_SALES AND MARCH_SALES>FEB_SALES THEN MARCH_SALES
WHEN JAN_SALES=FEB_SALES AND JAN_SALES=MARCH_SALES THEN JAN_SALES
WHEN JAN_SALES=FEB_SALES AND JAN_SALES>MARCH_SALES THEN JAN_SALES
WHEN JAN_SALES=MARCH_SALES AND JAN_SALES>FEB_SALES THEN JAN_SALES
WHEN FEB_SALES=JAN_SALES AND FEB_SALES>MARCH_SALES THEN FEB_SALES
WHEN FEB_SALES=MARCH_SALES AND FEB_SALES>JAN_SALES THEN FEB_SALES
WHEN MARCH_SALES=JAN_SALES AND MARCH_SALES>FEB_SALES THEN MARCH_SALES
WHEN MARCH_SALES=FEB_SALES AND MARCH_SALES>JAN_SALES THEN MARCH_SALES
ELSE 'NEW_CASE_FOUND'
END
) FIRST_HIGHEST,
(
CASE
WHEN JAN_SALES>FEB_SALES AND FEB_SALES>MARCH_SALES THEN FEB_SALES
WHEN FEB_SALES>JAN_SALES AND JAN_SALES>MARCH_SALES THEN JAN_SALES
WHEN JAN_SALES>MARCH_SALES AND MARCH_SALES>FEB_SALES THEN MARCH_SALES
ELSE 'NEW_CASE_FOUND'
END
) SECOND_HIGHEST
from
Sales_Biodata;
but I am getting the following errors:
ORA-00932: inconsistent datatypes: expected NUMBER got CHAR
00932. 00000 - "inconsistent datatypes: expected %s got %s"
*Cause:
*Action:
Error at Line: 60 Column: 6
Please guide me on the following:
1- How to search the data horizontally for maximum and second maximum.
2- Please guide me on alternate approaches for searching data for a row horizontally.
Getting the maximum value is simply:
select greatest(jan_sales, feb_sales, mar_sales)
If you want the second value:
select (case when jan_sales = greatest(jan_sales, feb_sales, mar_sales)
then greatest(feb_sales, mar_sales)
when feb_sales = greatest(jan_sales, feb_sales, mar_sales)
then greatest(jan_sales, mar_sales)
else greatest(jan_sales, feb_sales)
end)
However, this is the wrong approach to the whole problem. The main issues is that you have the wrong data structure. Store values in rows not columns. So, you need to unpivot your data and re-aggregation, such as:
select saler_id,
max(case when seqnum = 1 then sales end) as sales_1,
max(case when seqnum = 2 then sales end) as sales_2,
max(case when seqnum = 3 then sales end) as sales_3
from (select s.*, dense_rank() over (partition by saler_id order by sales desc) as seqnum
from (select saler_id, jan_sales as sales Sales_Biodata union all
select saler_id, feb_sales Sales_Biodata union all
select saler_id, mar_sales Sales_Biodata
) s
) s
group by saler_id;
Your data model is wrong.
The first thing I would do is to unpivot data using this query:
select * from sales_biodata
unpivot (
val for mon in ( JAN_SALES,FEB_SALES,MARCH_SALES )
)
;
and after this, getting two top values is relatively easy:
SELECT *
FROM (
SELECT t.*,
dense_rank() over (partition by saler_id order by val desc ) x
FROM (
select * from sales_biodata
unpivot (
val for mon in ( JAN_SALES,FEB_SALES,MARCH_SALES )
)
) t
)
WHERE x <= 2
the above query will give a result in this format:
SALER_ID MON VAL X
---------- ----------- ---------- ----------
101 FEB_SALES 525 1
101 MARCH_SALES 255 2
102 FEB_SALES 55 1
102 MARCH_SALES 25 2
103 JAN_SALES 45545 1
103 FEB_SALES 5125 2
If you have more month than 3 months, you can easily extend this query changing this part:
val for mon in ( JAN_SALES,FEB_SALES,MARCH_SALES, April_sales, MAY_SALES, JUNE_SALES, JULY_SALES, ...... NOVEMBER_SALES, DECEMBER_SALES )
If you want both two values in one row, you need to pivot data back:
WITH src_data AS(
SELECT saler_id, val, x
FROM (
SELECT t.*,
dense_rank() over (partition by saler_id order by val desc ) x
FROM (
select * from sales_biodata
unpivot (
val for mon in ( JAN_SALES,FEB_SALES,MARCH_SALES )
)
) t
)
WHERE x <= 2
)
SELECT *
FROM src_data
PIVOT(
max(val) FOR x IN ( 1 As "First value", 2 As "Second value" )
);
This gives a result in this form:
SALER_ID First value Second value
---------- ----------- ------------
101 525 255
102 55 25
103 45545 5125
EDIT - why MAX is used in the PIVOT query
The short answer is: because the syntax reuires an aggregate function here.
See this link for the syntax: http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_10002.htm#CHDCEJJE
A broader answer:
The PIVOT clause is only a syntactic sugar that simplifies a general "classic" pivot query which is using aggregate function and GROUP BY clause, like this:
SELECT id,
max( CASE WHEN some_column = 'X' THEN value END ) As x,
max( CASE WHEN some_column = 'Y' THEN value END ) As y,
max( CASE WHEN some_column = 'Z' THEN value END ) As z
FROM table11
GROUP BY id
More on PIVOT queries you can find on the net, there is a lot of excelent explanations how the pivot query works.
The above pivot query, written in "standard" SQL, is equivalent to this Oracle's query:
SELECT *
FROM table11
PIVOT (
max(value) FOR some_column IN ( 'X', 'Y', 'Z' )
)
These PIVOT queries transform records like this:
ID SOME_COLUMN VALUE
---------- ----------- ----------
1 X 10
1 X 15
1 Y 20
1 Z 30
into one record (for each id) like this:
ID 'X' 'Y' 'Z'
---------- ---------- ---------- ----------
1 15 20 30
Please note, that the source table contains two values for id=1 and some_column='X' -> 10 and 15. PIVOT queries uses aggregate function to support that "general" case, where there could be many source records for one record in the output. In this example 'MAX' function is used to pick greater value 15.
However PIVOT queries supports also your specific case where there is only one source record for each value in the result.
You are coming across the error as the string 'new case found' is added in the else part and the rest of the case statement deals with number . data type in the when and else clause should match.
Coming to alternate approaches you may use unpivot and get the months sales data into a single row and use analytical functions to get the 1st highest or second highest.
As others have said, the problem is that the WHEN clauses in your CASE statement are returning INTEGER values, but the ELSE is returning a character string. I completely agree with the comments regarding normalization but if you really just want to make this query work you'll need to convert the results of each WHEN clause to character, as in:
Select Saler_Id,
(
CASE
WHEN JAN_SALES>FEB_SALES AND JAN_SALES>MARCH_SALES THEN TO_CHAR(JAN_SALES)
WHEN FEB_SALES>JAN_SALES AND FEB_SALES>MARCH_SALES THEN TO_CHAR(FEB_SALES)
WHEN MARCH_SALES>JAN_SALES AND MARCH_SALES>FEB_SALES THEN TO_CHAR(MARCH_SALES)
WHEN JAN_SALES=FEB_SALES AND JAN_SALES=MARCH_SALES THEN TO_CHAR(JAN_SALES)
WHEN JAN_SALES=FEB_SALES AND JAN_SALES>MARCH_SALES THEN TO_CHAR(JAN_SALES)
WHEN JAN_SALES=MARCH_SALES AND JAN_SALES>FEB_SALES THEN TO_CHAR(JAN_SALES)
WHEN FEB_SALES=JAN_SALES AND FEB_SALES>MARCH_SALES THEN TO_CHAR(FEB_SALES)
WHEN FEB_SALES=MARCH_SALES AND FEB_SALES>JAN_SALES THEN TO_CHAR(FEB_SALES)
WHEN MARCH_SALES=JAN_SALES AND MARCH_SALES>FEB_SALES THEN TO_CHAR(MARCH_SALES)
WHEN MARCH_SALES=FEB_SALES AND MARCH_SALES>JAN_SALES THEN TO_CHAR(MARCH_SALES)
ELSE 'NEW_CASE_FOUND'
END
) FIRST_HIGHEST,
(
CASE
WHEN JAN_SALES>FEB_SALES AND FEB_SALES>MARCH_SALES THEN TO_CHAR(FEB_SALES)
WHEN FEB_SALES>JAN_SALES AND JAN_SALES>MARCH_SALES THEN TO_CHAR(JAN_SALES)
WHEN JAN_SALES>MARCH_SALES AND MARCH_SALES>FEB_SALES THEN TO_CHAR(MARCH_SALES)
ELSE 'NEW_CASE_FOUND'
END
) SECOND_HIGHEST
from
Sales_Biodata;
Best of luck.

MYSQL query to get 'n' rows nearby given row

I have a MySQL table by name 'videos', where one of the column is 'cat' (INT) and 'id' is the PRIMARY KEY.
So, if 'x' is the row number,and 'n' is the category id, I need to get nearby 15 rows
Case 1: There are many rows in the category before and after 'x'.. Just get 7 each rows before and after 'x'
SELECT * FROM videos WHERE cat=n AND id<x ORDER BY id DESC LIMIT 0,7
SELECT * FROM videos WHERE cat=n AND id>x LIMIT 0,7
Case 2: If 'x' is in the beginning/end of the the table -> Print all (suppose 'y' rows) the rows before/after 'x' and later print 15-y rows after/before 'x'
Case 1 is not a problem but I am stuck with Case 2. Is there any generic method to get 'p' rows nearby a row 'x' ??
This query will always position N (exact id match) at the centre of the data, unless there are no more rows (in either direction), in which case rows will be added from the prior/next sections as required, while still preserving data from prior/next (as much as available).
set #n := 28;
SELECT * FROM
(
SELECT * FROM
(
(SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n)
union all
(SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15)
union all
(SELECT v.*, #rn2:=#rn2+1 FROM (select #rn2:=0) y, videos v WHERE cat=1 AND id > #n ORDER BY id LIMIT 15)
) z
ORDER BY prox
LIMIT 15
) w
order by id
For example, if you had 30 ids for cat=1, and you were looking at item #28, it will show items 16 through 30, #28 is the 3rd row from the bottom.
Some explanation:
SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n
v.* means to select all columns in the table/alias v. In this case, v is the alias for the table videos.
0 as prox means to create a column named prox, and it will contain just the value 0
The next query:
SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15
v.* - as above
#rn1:=#rn1+1 uses a variable to return a sequence number for each record in this subquery. It starts with 1 and for each record, following the ORDER BY id DESC, it will be numbered 2, then 3 etc.
(select #rn1:=0) x This creates a subquery aliased as x, all it does is ensures the variable #rn1 starts with the value 1 for the first row.
The end result is that the variable and 0 as prox ranks each row based on how close it is to the value #n. The clause order by prox limit 15 takes the 15 that are closest to N.

How to find the average time difference between rows in a table?

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.
Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...
Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)
Adapted for SQL Server from this discussion.
Essential columns used are:
cmis_load_date: A date/time stamp associated with each record.
extract_file: The full path to a file from which the record was loaded.
Comments:
There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber