SQL find median of col2 for every distinct value of col1 - sql

I'm trying to calculate median of time for every distinct value in column1, which stores some kind of id's. The second column stores time in miliseconds. I want to calculate median of records for every id. I have this:
Declare #Median varchar(max)
SELECT #Median = PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY ExecTime) OVER ()
FROM
(
SELECT ExecTime
FROM logs
WHERE Message Like '<%'
) AS median
SELECT #Median as Median --, Name
Which calculates median of all values in the col2 (I deleted extra conditions which are not relevant at this point). I think it's just one step away from the solution but I can't catch it.

I think you are looking for the partition by clause:
SELECT DISTINCT column1,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ExecTime) OVER (PARTITION BY column1) as median
FROM logs l
WHERE l.Message Like '<%';

Related

How do I reset a sum() over () in a SQL Server query?

I have a derived table that looks like this example:
{select * from tb_data}
I want the results to have and additional summation column, the catch is I need the summation column to reset the working value if the info column value = 'reset'
{select *, (I assume some variation on sum(number) over (partition by id order by date desc)) as summation from tb_data}
and here's what the output should look like:
The actual derived table covers thousands of ids which is why it needs to be partitioned by the id and ordered by date desc and each has a different number of reset points.
What SQL query will get me the output I need?
You could first do a conditional window sum to define the groups: everytime a reset is found, a new group starts. Then you can simply do a window sum of numbers within the groups.
select
id,
date,
info,
number,
sum(number) over(partition by id, grp order by date) summation
from (
select
t.*,
sum(case when info = 'reset' then 1 else 0 end)
over(partition by id order by date) grp
from mytable t
) t

How to SELECT TOP 95% of the row in a table

I want to create a performance report based on table data.
I dont know how many rows are there in the table, I would like to have Top 95% (Percent) of the rows based on some where condition.
Table Structure -
Column Name - txid , start_time, end_time
For my Performance report I need to get the average of end_time - start_time. The common value of (end_time - start_time) ranges from 100ms to 1 sec.
However there are few transaction (less than 2% ) that took around 100-2K sec due to some or the other technical error.
I want to avoid those rows to get a fair average report. Including those rows in my Report raises a huge concern.
You can use a subquery. I would just go for row_number() and count(*), although other window functions such as ntile(), percentile_cont(), and percentile_disc() could be used for this purpose:
select t.*
from (select t.*,
row_number() over (order by <ordering col>) as seqnum,
count(*) over () as cnt
from t
where . . .
) t
where seqnum <= 0.95 * cnt;
Supposing you have a table TABLE with a field id:
select top (
(select count(Id) FROM [TABLE])*95/100
) id FROM [TABLE]
In TSQL:
DECLARE #ourCount as Int
DECLARE #topNinetyFive as Int
Select #ourCount = count(1) FROM [ourDatabase].[dbo].[ourTable]
Set #topNinetyFive = round(0.95 * #ourCount, 0)
Select TOP (#topNinetyFive) * FROM [ourDatabase].[dbo].[ourTable]
-- NOTE: a more meaningful criteria could be based on one of the columns with a 'where' clause

SQL select segment

I'm using SQL Server 2008.
I have a table with x amount of rows. I would like to always divide x by 5 and select the 3rd group of records.
Let's say there are 100 records in the table:
100 / 5 = 20
the 3rd segment will be record 41 to 60.
How will I be able in SQL to calculate and select this 3rd segment only?
Thanks.
You can use NTILE.
Distributes the rows in an ordered partition into a specified number of groups.
Example:
SELECT col1, col2, ..., coln
FROM
(
SELECT
col1, col2, ..., coln,
NTILE(5) OVER (ORDER BY id) AS groupno
FROM yourtable
)
WHERE groupno = 3
That's a perfect use for the NTILE ranking function.
Basically, you define your query inside a CTE and add an NTILE to your rows - a number going from 1 to n (the argument to NTILE). You order your rows by some column, and then you get the n groups of rows you're looking for, and you can operate on any one of those "groups" of data.
So try something like this:
;WITH SegmentedData AS
(
SELECT
(list of your columns),
GroupNo = NTILE(5) OVER (ORDER BY SomeColumnOfYours)
FROM dbo.YourTable
)
SELECT *
FROM SegmentedData
WHERE GroupNo = 3
Of course, you can also use an UPDATE statement after the CTE to update those rows.

Evaluating the mean absolute deviation of a set of numbers in Oracle

I'm trying to implement a procedure to evaluate the median absolute deviation of a set of numbers (usually obtained via a GROUP BY clause).
An example of a query where I'd like to use this is:
select id, mad(values) from mytable group by id;
I'm going by the aggregate function example but am a little confused since the function needs to know the median of all the numbers before all the iterations are done.
Any pointers to how such a function could be implemented would be much appreciated.
In Oracle 10g+:
SELECT MEDIAN(ABS(value - med))
FROM (
SELECT value, MEDIAN(value) OVER() AS med
FROM mytable
)
, or the same with the GROUP BY:
SELECT id, MEDIAN(ABS(value - med))
FROM (
SELECT id, value, MEDIAN(value) OVER(PARTITION BY id) AS med
FROM mytable
)
GROUP BY
id

SQL Command for the following table

I have a table named with "Sales" having the following columns:
Sales_ID|Product_Code|Zone|District|State|Distributor|Total_Sales
Now i want to generate a sales summary to view the total sales by zone and then by district and then by State by which distributor for the last/past month period.
How can i write a Sql Statement to do this? Can anyone help me Plz. Thanks in advance.
And i have another question that, how can i select the second largest or third largest values from any column of a table.
Have a look at using the ROLLUP GROUP BY option.
Generates the simple GROUP BY aggregate rows, plus subtotal or super-aggregate rows,
and also a grand total row.
The number of groupings that is returned equals the number of expressions
in the <composite element list> plus one. For example, consider the following statement.
Copy Code
SELECT a, b, c, SUM ( <expression> )
FROM T
GROUP BY ROLLUP (a,b,c)
One row with a subtotal is generated for each unique combination of values of
(a, b, c), (a, b), and (a). A grand total row is also calculated.
Columns are rolled up from right to left.
The column order affects the output groupings of ROLLUP and can affect the number
of rows in the result set.
Something like
DECLARE #Table TABLE(
Zone VARCHAR(10),
District VARCHAR(10),
State VARCHAR(10),
Sales FLOAT
)
INSERT INTO #Table SELECT 'A','A','A',1
INSERT INTO #Table SELECT 'A','A','B',1
INSERT INTO #Table SELECT 'A','B','A',1
INSERT INTO #Table SELECT 'B','A','A',1
SELECT Zone,
District,
State,
SUM(Sales)
FROM #Table
WHERE <Your Condition here> --THIS IS WHERE YOU USE THE WHERE CLAUSE
GROUP BY ROLLUP (Zone,District,State)
To Get the second and 3rd largets, you can use either (ROW_NUMBER (Transact-SQL))
;WITH Vals AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY RequiredCol DESC) RowNum
FROM YourTable
)
SELECT *
FROM Vals
WHERE RowNum IN (2,3)
or
SELECT TOP 2
*
FROM (
SELECT TOP 3
*
FROM YourTable
ORDER BY RequiredCol DESC
) sub
ORDER BY RequiredCol
SELECT SUM(Total_Sales) FROM sales GROUP BY (X)
Replace X with Zone, District, State or Distributor.