Percentiles in Oracle - sql

I have the following query, in which I'm attempting to work out percentiles for the days between a letter being sent and today's date:
SELECT PERCENTILE_DISC(0.1) WITHIN GROUP (ORDER BY SUM(TRUNC(SYSDATE) -
( TO_DATE( SUBSTR(M.LETTER_SENT, 1, 11), 'YYYY-MON-DD') )) ASC) AS PERCENTILE_10,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY SUM(TRUNC(SYSDATE) -
( TO_DATE( SUBSTR(M.LETTER_SENT, 1, 11), 'YYYY-MON-DD') )) ASC) AS PERCENTILE_90
FROM MV_TABLE M8
WHERE M8.LETTER_SENT != 'N'
GROUP BY M8.LETTER_SENT;
I am perhaps wrong in thinking that, it should return the 10th and 90th percentile for the result set?
M.LETTER_SENT is in the format YYYY-MON-DD: USER_ID. So my query uses SYSDATE - TO_DATE(SUBSTR(M.LETTER_SENT,1, 11), 'YYYY-MON-DD') to work out the number of days between.
So the M.LETTER_SENT actual value for the result set I list below of 4 days 2015-Feb-27: rstone
That query returns the following result set:
242
4
4
4
39
11
18
361
My understanding of percentiles is that if you want the 90th percentile the following occurs.
number of records * percentile = percentile => (round up) = index value
So in my situation it's:
8 * 0.1 = 0.8 => (round up) = 1
8 * 0.9 = 7.2 => (round up) = 8
The 1st value on the ordered result set is: 4
The 8th value on the ordered result set is: 361
Oracle for me returns: 11 as the 10th percentile though?
When I do 0.2 + 0.8 percentiles I get 12, 242 respectively. I always understood there to be a few different ways to calculate percentiles. So how does Oracle calculate these results am I wrong in my thoughts of what the percentiles should be?

Related

SQL query compare value with average of similiar records

The table has 3 columns : Category, Value(int), Date
What I want the SQL query to do is check for each record belonging to a specific category, if the value lies within a specific tolerance range (say t) of the average of value over last 100 records which have the same weekday (monday, tuesday, etc) and same category as that of the concerned record.
I was able to implement this partially, as I know the Category before hand, but the weekday depends on the record which is queried. Also, currently I am just checking if the value is greater than the average, instead of which I need to check if it lies within a certain tolerance.
SELECT Value, Date,
CASE WHEN
value > (SELECT AVG(value) FROM Table WHERE Category = 'CategoryX' and Date BETWEEN current_date - 700 and current_date - 1) THEN 1
ELSE 0
END AS check_avg
FROM Table
WHERE Category = 'CategoryX'
Sample :
Category
Value
Date
CategoryX
5000
2022-06-29
CategoryX
4500
2022-06-27
CategoryX
1000
2022-06-22
CategoryY
4500
2022-06-15
CategoryX
2000
2022-06-15
CategoryX
3000
2022-06-08
Expected Result :
Value in Record with today's date : 5000.
Average of values in records with same weekday and same category : 1000 + 2000 + 3000 / 3 = 2000.
If tolerance is 50%, then allowed value should be between 1000 - 3000.
So result should be 0
Validate that in both queries you are evaluating the same category and same weekday. Then sort the values that will be used to compute the average by date, and getting only the inmediate previous 100 records. Finally, check the difference between current value and average is below the tolerance interval epsilon.
SELECT Value, Date,
CASE WHEN
ABS(value - (SELECT AVG(Value) FROM (SELECT TOP 100 Value FROM Table WHERE Category = t.Category and DATEPART(WEEKDAY, Date)=DATEPART(WEEKDAY, t.Date) AND Date <= t.Date ORDER BY Date DESC ))) < epsilon THEN 1
ELSE 0
END AS check_avg
FROM Table t
WHERE Category = 'CategoryX'

Query using group by with steps/range over large data

I have a table that stores a sensor temperature readings every few seconds
Sample data looks like this
nId nOperationId strDeviceIp nIfIndex nValue nTimestamp
97 2 192.168.99.252 1 26502328 1593828551
158 2 192.168.99.252 1 26501704 1593828667
256 2 192.168.99.252 1 26501860 1593828788
354 2 192.168.99.250 1 26501704 1593828908
452 2 192.168.99.250 1 26501692 1593829029
I want to have the average temperature per device so I ran the following query
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1
from myTable
where nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by strSwitchIp;
Where I can pass the time range I want.
My issue is that this gives me total average but I want steps or range
I want to achieve that instead of one line I'll get all the values in a range/step of say 5 minutes as a row.
P.S. I'm trying to show a graph.
Running the following query I get
strSwitchIp average
192.168.99.252 26501731
But I would like to get
strSwitchIp average timestamp
192.168.99.252 26201731 1593828600
192.168.99.252 26532731 1593828900
192.168.99.252 24501721 1593829200
192.168.99.252 26506531 1593829500
In this example I would like to get a row every 300 seconds, per device.
Since your nTimestamp is number of seconds, you can simply add it to the GROUP BY. Division by 300 gives you 300 second (5 minute) intervals. In SQL Server / is integer division, which discards the fractional part.
select
strSwitchIp
,AVG(CAST(nValue as bigint)) as val1
,(nTimestamp / 300) * 300 AS Timestamp
from myTable
where
nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by
strSwitchIp
,nTimestamp / 300
;
nTimestamp / 300 gives an integer, a number of 5-minute intervals since 1970. / discards here the fractional part.
When this number is multiplied back by 300, it becomes again the number of seconds since 1970, but "rounded" to the nearest 5-minute interval. Just as you showed in the question in the expected result.
For example:
1593828667 / 300 = 5312762.2233333333333333333333333
discard fractional part
1593828667 / 300 = 5312762
5312762 * 300 = 1593828600
So, all timestamps between 1593828600 and 1593828899 become 1593828600 and all values for these timestamps are grouped into one row and averaged.
you ca use partition like this:
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1,
ROW_NUMBER() over(partition by nTimestamp order by nTimestamp desc) as ROW_NO from AmyTable) Q where q.ROW_NO%5=0
....

Total percentage of a series of positive and negative percentages

Postgres 9.6.6, latest Ubuntu LTS.
I have a column with a daily grow (+-) percentages, like:
Trader_Id Date 8_AM 8_PM Growth%
1 1/1 290 248 -14,48
1 2/1 225 880 291,11
1 3/1 732 512 -30,05
1 4/1 621 602 -3,06
1 5/1 314 314 0,0
1 6/1 0 0 0,0
1 7/1 294 95 -67,69
What is the correct query to sum and subtract a sequence of percentages to get the total percentage of growth(+-) of the selected trader?
In that case, select a Trader_Id, sort by Date ASC and calculate a total growth percentage from the first day available.
This is the sequence of manual calculations:
Growth% Calculation Result
-14,48 1+(-14,48/100) 0,8552
291,11 0,8552+(291,11/100*0,8552) 3,34477272
-30,05 3,34477272+(-30,05/100*3,34477272) 2,339668518
-3,06 2,33966851764+(-3,06/100*2,33966851764) 2,268074661
0 2,26807466100022+(0/100*2,26807466100022) 2,268074661
0 2,26807466100022+(0/100*2,26807466100022) 2,268074661
-67,69 2,26807466100022+(-67,69/100*2,26807466100022) 0,732814923
(0,73281492296917-1)*100 -26,7185077
The final expected result of SELECT SOMETHING(Growth% ORDER BY Date) is -26,72%
Figured out the correct formula to do that:
Now the remaining problem is how to traduce in a correct SQL...
For lack of information assuming your column growth is type numeric and you want numeric precision for calculation as well.
(Calculating with double precision is cheaper, but may increase rounding errors.)
Create an aggregate function to generate a serial product once:
CREATE AGGREGATE numeric_mul(numeric) (
sfunc = numeric_mul,
stype = numeric
);
Then the query is as simple as:
SELECT 100 * numeric_mul(growth *.01 + 1) - 100
FROM tbl;
db<>fiddle here
The order of input rows has no bearing on the result.
Related:
Product Aggregate in PostgreSQL
I believe you have a column referring to the day and you would like to know the sum of percentages for the current week.
You could do
SELECT SUM(day_percent)
FROM t
WHERE calendar_day BETWEEN
date_trunc('week',calendar_day) AND
date_trunc('week',calendar_day) + INTERVAL '7 DAYS';
Demo

Access query to partition data and sum each partition?

I have a query with the fields date hour and value.
It looks something like this
date hour value
xx/xx/xx 15 100
xx/xx/xx 30 122
xx/xx/xx 45 50
... 100 100
... 115 23
... ... ...
... ... ...
... 2400 400
... 15 23
Basically, date is the date, hour is the hour, and value is the value for that particular 15 minute interval. What I have been trying to figure out is a way to take each hour (so 15, 30, 45, and 100) or (1015, 1030, 1045, 1100) [As you can see hours are military-esque 1:00pm is 1300 and midnight 2400], and sum their values together. So i am looking to return something like this:
xx/xx/xx 100 372
xx/xx/xx 200 23 + (130 data) + (145 data) + (200 data)
And so on...
The table has on average around 100 days and they all start from 15 to 2400 incrementing by 15 with varying numbers for the value column.
I have thought about using a partition, group by, etc. with no real ideas how to tackle it. Essentially I have to take 4 rows (an hour), sum their values, spit out the date, hour, and summed value then repeat for every day. I am not asking for code, just some help with what i should be using since this seems like a simple problem minus the key to solving it.
Any help is greatly appreciated, Thank you!
Grouping by Hour/100 will almost get you there - subtract 1 from the hour will make 1 AM fall to 99, and get included in the grouping. This will give a query that looks like this:
SELECT Table1.Dte, Int(([tme]-1)/100) AS Hr, Sum(Table1.Val) AS TotVal
FROM Table1
GROUP BY Table1.Dte, Int(([tme]-1)/100);
I may have misremembered how you cast to int in Access, but this might work:
Select
[date],
100 * (1 + Cint(([Hour] - 1) / 100)),
Sum(Value)
From
Query
Group By
[date],
100 * (1 + Cint(([Hour] - 1) / 100))
Order By
1, 2
SELECT
DateCol,
Int(HourCol \ -100) * -100 AS Hr,
Sum(Value) AS Value
FROM
YourTable
GROUP BY
DateCol,
Int(HourCol \ -100) * -100
Or you can use ((Hr + 99) \ 100) * 100.

Group DateTime into 5,15,30 and 60 minute intervals

I am trying to group some records into 5-, 15-, 30- and 60-minute intervals:
SELECT AVG(value) as "AvgValue",
sample_date/(5*60) as "TimeFive"
FROM DATA
WHERE id = 123 AND sample_date >= 3/21/2012
i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this:
AvgValue TimeFive
6.90 1995-01-01 00:05:00
7.15 1995-01-01 00:10:00
8.25 1995-01-01 00:15:00
The 30-min query would result in this:
AvgValue TimeThirty
6.95 1995-01-01 00:30:00
7.40 1995-01-01 01:00:00
The datetime column is in yyyy-mm-dd hh:mm:ss format
I am getting implicit conversion errors of my datetime column. Any help is much appreciated!
Using
datediff(minute, '1990-01-01T00:00:00', yourDatetime)
will give you the number of minutes since 1990-1-1 (you can use the desired base date).
Then you can divide by 5, 15, 30 or 60, and group by the result of this division.
I've cheked it will be evaluated as an integer division, so you'll get an integer number you can use to group by.
i.e.
group by datediff(minute, '1990-01-01T00:00:00', yourDatetime) /5
UPDATE As the original question was edited to require the data to be shown in date-time format after the grouping, I've added this simple query that will do what the OP wants:
-- This convert the period to date-time format
SELECT
-- note the 5, the "minute", and the starting point to convert the
-- period back to original time
DATEADD(minute, AP.FiveMinutesPeriod * 5, '2010-01-01T00:00:00') AS Period,
AP.AvgValue
FROM
-- this groups by the period and gets the average
(SELECT
P.FiveMinutesPeriod,
AVG(P.Value) AS AvgValue
FROM
-- This calculates the period (five minutes in this instance)
(SELECT
-- note the division by 5 and the "minute" to build the 5 minute periods
-- the '2010-01-01T00:00:00' is the starting point for the periods
datediff(minute, '2010-01-01T00:00:00', T.Time)/5 AS FiveMinutesPeriod,
T.Value
FROM Test T) AS P
GROUP BY P.FiveMinutesPeriod) AP
NOTE: I've divided this in 3 subqueries for clarity. You should read it from inside out. It could, of course, be written as a single, compact query
NOTE: if you change the period and the starting date-time you can get any interval you need, like weeks starting from a given day, or whatever you can need
If you want to generate test data for this query use this:
CREATE TABLE Test
( Id INT IDENTITY PRIMARY KEY,
Time DATETIME,
Value FLOAT)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:00:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:03:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:04:45', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:07:21', 20)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:10:25', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:11:22', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:14:47', 30)
The result of executing the query is this:
Period AvgValue
2012-03-22 00:00:00.000 10
2012-03-22 00:05:00.000 20
2012-03-22 00:10:00.000 30
Building on #JotaBe's answer (to which I cannot comment on--otherwise I would), you could also try something like this which would not require a subquery.
SELECT
AVG(value) AS 'AvgValue',
-- Add the rounded seconds back onto epoch to get rounded time
DATEADD(
MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30) * 30,
'1990-01-01T00:00:00'
) AS 'TimeThirty'
FROM YourTable
-- WHERE your_date > some max lookback period
GROUP BY
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30)
This change removes temp tables and subqueries. It uses the same core logic for grouping by 30 minute intervals but, when presenting the data back as part of the result I'm just reversing the interval calculation to get the rounded date & time.
So, in case you googled this, but you need to do it in mysql, which was my case:
In MySQL you can do
GROUP BY
CONCAT(
DATE_FORMAT(`timestamp`,'%m-%d-%Y %H:'),
FLOOR(DATE_FORMAT(`timestamp`,'%i')/5)*5
)
In the new SQL Server 2022, you can use DATE_BUCKET, this rounds it down to the nearest interval specified.
SELECT
DATE_BUCKET(minute, 5, d.sample_date) AS TimeFive,
AVG(d.value) AS AvgValue
FROM DATA d
WHERE d.id = 123
AND d.sample_date >= '20121203'
GROUP BY
DATE_BUCKET(minute, 5, d.sample_date);
You can use the following statement, this removed the second component and calculates the number of minutes away from the five minute mark and uses this to round down to the time block. This is ideal if you want to change your window, you can simply change the mod value.
select dateadd(minute, - datepart(minute, [YOURDATE]) % 5, dateadd(minute, datediff(minute, 0, [YOURDATE]), 0)) as [TimeBlock]
This will help exactly what you want
replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase
SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, ( SELECT r.c FROM astro_transit1 ra WHERE ra.dt = r.dt ORDER BY ra.dt DESC LIMIT 1 ) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt ) DIV 300 LIMIT 0 , 30