how to calculate count(*) in various percentiles

how to calculate count(*) in various percentiles - sql

Say, I have a table holding integer values from 0 up to 9,999 and I want to make a distribution plot of the population of values in each percentile.
Below is what comes to mind. Is there a better way?
CREATE TABLE A(x INTEGER);
SELECT
(SELECT COUNT(*) FROM A WHERE x>=0 AND x<10) AS prcntl_01,
(SELECT COUNT(*) FROM A WHERE x>=10 AND x<20) AS prcntl_02,
(SELECT COUNT(*) FROM A WHERE x>=20 AND x<30) AS prcntl_03,
(SELECT COUNT(*) FROM A WHERE x>=30 AND x<40) AS prcntl_04,
(SELECT COUNT(*) FROM A WHERE x>=40 AND x<50) AS prcntl_05,
...
(SELECT COUNT(*) FROM A WHERE x>=990 AND x<1000) AS prcntl_100,
The size of the SQL statement is not a consideration as I can generate it on the fly. I am just wondering if there is an idiomatic way to get population counts in each percentile.

Use conditional aggregation instead of multiple queries:
SELECT sum(case when x >= 0 AND x < 10 then 1 else 0 end) as prcntl_01,
sum(case when x >= 10 AND x < 20 then 1 else 0 end) as prcntl_02,
. . .
sum(case when x >= 990 AND x < 1000 then 1 else 0 end) as prcntl_100
FROM A;
If you want the values in separate rows rather than columns, you can simply do:
select n as which,
sum(case when x >= (n - 1)*10 and x < n*10 - 1 then 1 else 0 end) as percentile
from A cross join
generate_series(1, 100) as n
group by n;
This limits the amount of code you have to write.

Related

Optimize SQL query for given Conditions

I have a query of the form:
select SUM(some_column) from (table)
where
IF x then a
ELSE y then b
ELSE z then c
...
Now in my JAVA code,i call this query for every different value(x,y,z,...),which returns me required sum.My objective is to calculate the Total sum for all those values,i.e,
Total = SUM_for_x + SUM_for_y + SUM_for_z + ....
Now,off course,I am hitting the DB for every such value,which is costly.Can i optimize this in 1 single query which does the job for me,hitting the DB just once ?

Assuming that you are only interested in the total sum and not in the partial sums and the conditions are mutually exclusive, you can do this:
SELECT SUM(some_column)
FROM (table)
WHERE (a)
OR (b)
OR (c)
OR ...
An other way (if your conditions are not mutually exclusive) would be:
SELECT SUM(some_column)
FROM(SELECT SUM(some_column) AS some_column FROM (table) WHERE (a) UNION ALL
SELECT SUM(some_column) AS some_column FROM (table) WHERE (b) UNION ALL
SELECT SUM(some_column) AS some_column FROM (table) WHERE (c) -- UNION ALL a.s.o.
)

SELECT SUM(case when x then a end) sum_a,
SUM(case when y then b end) sum_b,
SUM(case when z then c end) sum_c
FROM <table>
and sum up in Java
or get total in SQL:
SELECT SUM(case when x then a end) sum_a +
SUM(case when y then b end) sum_b +
SUM(case when z then c end) sum_c
FROM <table>

SQL - subtract value from same column

I have a table as follows
fab_id x y z m
12 14 10 3 5
12 10 10 3 4
Here im using group by clause on id .Now i want to subtract those column values which have similar id.
e.g group by on id (12). Now to subtract (14-10)X, (10-10)Y, (3-3)z, (5-4)m
I know there is a aggregate function sum for addition but is there any function which i can use to subtract this value.
Or is there any other method to achieve the results.
Note- There may be a change that value may come in -ve. So any function handle this?
one more example - (order by correction_date desc so result will show recent correction first)
fab_id x y z m correction_date
14 20 12 4 4 2014-05-05 09:03
14 24 12 4 3 2014-05-05 08:05
14 26 12 4 6 2014-05-05 07:12
so result to achieve group by on id (14). Now to subtract (26-20)X, (12-12)Y, (4-4)z, (6-4)m

Now, that you have given more information on how to deal with more records and that you revealed that there is a time column involved, here is a possible solution. The query selects the first and last record per fab_id and subtracts the values:
select
fab_info.fab_id,
earliest_fab.x - latest_fab.x,
earliest_fab.y - latest_fab.y,
earliest_fab.z - latest_fab.z,
earliest_fab.m - latest_fab.m
from
(
select
fab_id,
min(correction_date) as min_correction_date,
max(correction_date) as max_correction_date
from fab
group by fab_id
) as fab_info
inner join fab as earliest_fab on
earliest_fab.fab_id = fab_info.fab_id and
earliest_fab.min_correction_date = fab_info.min_correction_date
inner join fab as latest_fab on
latest_fab.fab_id = fab_info.fab_id and
latest_fab.min_correction_date = fab_info.max_correction_date;

Provided you always want to subtract the least value from the greatest value:
select
fab_id,
max(x) - min(x),
max(y) - min(y),
max(z) - min(z),
max(m) - min(m)
from fab
group by fab_id;

Seeing as you say there will always be two rows, you can simply do a 'self join' and subtract the values from each other:
SELECT t1.fab_id, t1.x - t2.x as diffx, t1.y - t2.y as diffy, <remainder columns here>
from <table> t1
inner join <table> t2 on t1.fab_id = t2.fab_id and t1.correctiondate > t2.correctiondate
If you have more than two rows, then you'll need to make subqueries or use window ranking functions to figure out the largest and smallest correctiondate for each fab_id and then you can do the very same as above by joining those two subqueries together instead of

Unfortunately, it's SQL Server 2012 that has the handy FIRST_VALUE()/LAST_VALUE() OLAP functions, so in the case of more than 2 rows we have to do something a little different:
SELECT fab_id, SUM(CASE WHEN latest = 1 THEN -x ELSE x END) AS x,
SUM(CASE WHEN latest = 1 THEN -y ELSE y END) AS y,
SUM(CASE WHEN latest = 1 THEN -z ELSE z END) AS z,
SUM(CASE WHEN latest = 1 THEN -m ELSE m END) AS m
FROM (SELECT fab_id, x, y, z, m,
ROW_NUMBER() OVER(PARTITION BY fab_id
ORDER BY correction_date ASC) AS earliest,
ROW_NUMBER() OVER(PARTITION BY fab_id
ORDER BY correction_date DESC) AS latest
FROM myTable) fab
WHERE earliest = 1
OR latest = 1
GROUP BY fab_id
HAVING COUNT(*) >= 2
(and working fiddle. Thanks to #AK47 for the initial setup.)
Which yields the expected:
FAB_ID X Y Z M
12 4 0 0 1
14 6 0 0 2
Note that HAVING COUNT(*) >= 2 is so that only rows with changes are considered (you'd get some null result columns otherwise).

;with Ordered as
(
select
fab_id,x,y,z,m,date,
row_Number() over (partition by fab_id order by date desc) as Latest,
row_Number() over (partition by fab_id order by date) as Oldest
from fab
)
select
O1.fab_id,
O1.x-O2.x,
O1.y-O2.y,
O1.z-O2.z,
O1.m-O2.m
from Ordered O1
join Ordered O2 on
O1.fab_id = O2.fab_id
where O1.latest = 1 and O2.oldest = 1

I think if you have consistent set or two rows, then following code should work for you.
select fab_id ,max(x) - min(x) as x
,max(y) - min(y) as y
,max(z) - min(z) as z
,max(m) - main(m) as m
from Mytable
group by fab_id
It will work, even if you get more than 2 rows in a group, but subtraction will be from max value of min value. hope it helps you.
EDIT : SQL Fiddle DEMO

A CTE could help:
WITH cte AS (
SELECT
-- Get the row numbers per fab_id ordered by the correction date
ROW_NUMBER() OVER (PARTITION BY fab_id ORDER BY correction_date ASC) AS rid
, fab_id, x, y, z, m
FROM
YourTable
)
SELECT
fab_id
-- If the row number is 1 then, this is our base value
-- If the row number is not 1 then, we want to subtract it (or add the negative value)
, SUM(CASE WHEN rid = 1 THEN x ELSE x * -1 END) AS x
, SUM(CASE WHEN rid = 1 THEN y ELSE y * -1 END) AS y
, SUM(CASE WHEN rid = 1 THEN z ELSE z * -1 END) AS z
, SUM(CASE WHEN rid = 1 THEN m ELSE m * -1 END) AS m
FROM
cte
GROUP BY
fab_id
Remember, 40-10-20 equals to 40 + (-10) + (-20)

Get ratio between the length of a table and one of its subsets via SQL

I have a table named A that contains a column named x. What I'm trying to do is to count the number of items that belong to a certain subset of A (more precisely, the ones that satisfy the x > 4 condition) via a single SELECT query, for example:
SELECT COUNT(*)
FROM A
WHERE x > 4;
From thereon, I'd like to calculate the ratio between the size of this particular subset of A and A as a whole, i.e. perform the following division:
size_subset / size_A
My question is - how would I combine all of these pieces into a single SQL SELECT query?

My server is down, not able to get sure of the answer below:
SELECT count(case when x > 4 then x else null end) / COUNT(*) FROM A;
Is a slight better because its just a count, not a sum (nulls ill not be accounted)
but i prefer to do:
select (SELECT count(*) FROM A where x > 4)/(SELECT count(*) FROM A);
As I guess it can do faster

You want conditional aggregation:
SELECT sum(case when x > 4 then 1 else 0 end) / COUNT(*)
FROM A;

There's probably a less clunky way of doing this, but:
SELECT SUM(CASE WHEN x > 4 THEN 1 ELSE 0 END) / COUNT(*) FROM A

Get the distinct count of values from a table with multiple where clauses

My table structure is this
id last_mod_dt nr is_u is_rog is_ror is_unv
1 x uuid1 1 1 1 0
2 y uuid1 1 0 1 1
3 z uuid2 1 1 1 1
I want the count of rows with:
is_ror=1 or is_rog =1
is_u=1
is_unv=1
All in a single query. Is it possible?
The problem I am facing is that there can be same values for nr as is the case in the table above.

Case statments provide mondo flexibility...
SELECT
sum(case
when is_ror = 1 or is_rog = 1 then 1
else 0
end) FirstCount
,sum(case
when is_u = 1 then 1
else 0
end) SecondCount
,sum(case
when is_unv = 1 then 1
else 0
end) ThirdCount
from MyTable

you can use union to get multiple results e.g.
select count(*) from table with is_ror=1 or is_rog =1
union
select count(*) from table with is_u=1
union
select count(*) from table with is_unv=1
Then the result set will contain three rows each with one of the counts.

Sounds pretty simple if "all in a single query" does not disqualify subselects;
SELECT
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_ror=1 OR is_rog=1) cnt_ror_reg,
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_u=1) cnt_u,
(SELECT COUNT(DISTINCT nr) FROM table1 WHERE is_unv=1) cnt_unv;

how about something like
SELECT
SUM(IF(is_u > 0 AND is_rog > 0, 1, 0)) AS count_something,
...
from table
group by nr
I think it will do the trick
I am of course not sure what you want exactly, but I believe you can use the logic to produce your desired result.

SQL query to test my programming capabilities in different way

I have one table and one column in it. There is 15 rows (integers). I want to count
the positive numbers and negative numbers, and also sum of total numbers in one query.
Can any one help me?

Or...
SELECT
COUNT(CASE WHEN Col > 0 THEN 1 END) AS NumPositives,
COUNT(CASE WHEN Col < 0 THEN 1 END) AS NumNegatives,
SUM(Col) AS Tot
FROM TableName;
Or you could consider using SIGN(Col), which gives 1 for positive numbers and -1 for negative numbers.

I'll give you psudeo code to help you with your homework.
3 aggregates:
SUM
SUM (CASE < 0)
SUM (CASE > 0)

select (select sum(mycolumn) from mytable where mycolumn > 0) as positive_sum,
(select sum(mycolumn) from mytable where mycolumn < 0) as negative_sum,
sum(mycolumn) as total_sum
from mytable

Try this
SELECT SUM(CASE WHEN Col > 0 THEN 1 ELSE 0 END) AS Pos,
SUM(CASE WHEN Col < 0 THEN 1 ELSE 0 END) AS Neg,
SUM(Col) AS Tot
FROM Table

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to calculate count(*) in various percentiles - sql

Related

Optimize SQL query for given Conditions

SQL - subtract value from same column

Get ratio between the length of a table and one of its subsets via SQL

Get the distinct count of values from a table with multiple where clauses

SQL query to test my programming capabilities in different way

Categories

Resources