Count rows based on row value - sql

I am calling a stored procedure that returns a table with two columns, an ident (integer) and a score (float4). The integer column will be unique values. I now want to know how many rows in the table have a larger/smaller score than the ident with a given value. I'm struggling to figure out how to do that in SQL. If it were something like PHP I'd sort the returned data by score, find the index of the row that has the ident I am looking for, and then subtract that from the total # of rows, for example. In PostgreSQL 9.1.15, I'm not sure how to do that.
SELECT COUNT(*)
FROM my_stored_proc()
WHERE score > *Score of person with given ident*
ORDER BY score;

If you only care about ident = 2, you can do:
select sum(case when t.score < t2.score then 1 else 0 end) as LessThan,
sum(case when t.score > t2.score then 1 else 0 end) as GreaterThan
from table t cross join
(select t.* from table where ident = 2) t2;
If you only want to reference the table once (as you would if accessing it were expensive), you could do the above with a CTE or you could do:
select sum(case when score < score2 then 1 else 0 end) as LessThan,
sum(case when score > score2 then 1 else 0 end) as GreaterThan
from (select t.*,
max(case when ident = 2 then score end) over () as score2
from table t
) t

Use window functions:
SELECT worse, better
FROM (
SELECT
ident,
COUNT(*) OVER (ORDER BY score ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) worse,
COUNT(*) OVER (ORDER BY score ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) better,
FROM my_stored_proc()
) t
WHERE ident = 2; -- replace with the "ident" you care about
This will simply count the number of rows in the result set that are above or below the current row if ordered by score.
Anyway, Gordon's solution might be slightly better, as it takes into account the possibility of an ident being returned more than once from your my_stored_proc(), and taking each ident's maximum score.

Related

calculate 2 cumulative sums for 2 different groups

i have a table that looks like this:
id position value
5 senior 10000
6 senior 20000
8 senior 30000
9 junior 5000
4 junior 7000
3 junior 10000
It is sorted by position and value (asc) already. I want to calculate the number of seniors and juniors that can fit in a budget of 50,000 such that preference is given to seniors.
So for example, here 2 seniors (first and second) + 3 juniors can fit in the budget of 50,000.
id position value cum_sum
5 senior 10000 10000
6 senior 20000 30000
8 senior 30000 60000 ----not possible because it is more than 50000
----------------------------------- --- so out of 50k, 30k is used for 2 seniors.
9 junior 5000 5000
4 junior 7000 12000
1 junior 7000 19000 ---with the remaining 20k, these 3 juniors can also fit
3 junior 10000 29000
so the output should look like this:
juniors seniors
3 2
how can i achieve this in sql?
Here's one possible solution: DB Fiddle
with seniorsCte as (
select id, position, value, total
from budget b
inner join (
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'senior'
) as s
on s.total <= b.amount
)
, juniorsCte as (
select j.id, j.position, j.value, j.total + r.seniorsTotal
from (
select coalesce(max(total), 0) seniorsTotal
, max(b.amount) - coalesce(max(total), 0) remainingAmount
from budget b
cross join seniorsCte
) as r
inner join (
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'junior'
) as j
on j.total <= r.remainingAmount
)
/* use this if you want the specific records
select *
from seniorsCte
union all
select *
from juniorsCte
*/
select (select count(1) from seniorsCte) seniors
, (select count(1) from juniorsCte) juniors
From your question I suspect you're familiar with window functions; but in case not; the below query pulls back all rows from the people table where the position is senior, and creates a column, total which is our cumulative total of the value of the rows returned, starting with the lowest value, ascending (then sorting by id to ensure consistent behaviour if there's multiple rows with the same value; though that's not strictly required if we're happy to get those in an arbitrary order).
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'senior'
The budget table I just use to hold a single row/value saying what our cutoff is; i.e. this avoids hardcoding the 50k value you mentioned, so we can easily amend it as required.
The common table expressions (CTEs) I've used to allow us to filter our juniors subquery based on the output of our seniors subquery (i.e. as we only want those juniors up to the difference between the budget and the senior's total), whilst allowing us to return the results of juniors and seniors independently (i.e. if we wanted to return the actual rows, rather than just totals, this allows us to perform a union all between the two sets; as demonstrated in the commented out code.
For it to work, the sum has to be not only cumulative, but also selective. As mentioned in the comment, you can achieve that with a recursive cte: online demo
with recursive
ordered as --this will be fed to the actual recursive cte
( select *,
row_number() over (order by position desc,value asc)
from test_table)
,recursive_cte as
( select id,
position,
value,
value*(value<50000)::int as cum_sum,
value<50000 as is_hired,
2 as next_i
from ordered
where row_number=1
union
select o.id,
o.position,
o.value,
case when o.value+r.cum_sum<50000 then o.value+r.cum_sum else r.cum_sum end,
(o.value+r.cum_sum)<50000 as is_hired,
r.next_i+1 as next_i
from recursive_cte r,
ordered o
where o.row_number=next_i
)
select count(*) filter (where position='junior') as juniors,
count(*) filter (where position='senior') as seniors
from recursive_cte
where is_hired;
row_number() over () is a window function
count(*) filter (where...) is an aggregate filter. It's a faster variant of the sum(case when expr then a else 0 end) or count(nullif(expr)) approach, for when you only wish to sum a specific subset of values. That's just to put those in columns as you did in your expected result, but it could be done with a select position, count(*) from recursive_cte where is_hired group by position, stacked.
All it does is order your list according to your priorities in the first cte, then go through it row by row in the second one, collecting the cumulative sum, based on whether it's still below your limit/budget.
postgresql supports window SUM(col) OVER()
with cte as (
SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
FROM mytable
)
select position, count(1)
from cte
where cumulative_sum < 50000
group by position
An other way to do it to get results in one row :
with cte as (
SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
FROM mytable
),
cte2 as (
select position, count(1) as _count
from cte
where cumulative_sum < 50000
group by position
)
select
sum(case when position = 'junior' then _count else null end) juniors,
sum(case when position = 'senior' then _count else null end) seniors
from cte2
Demo here
This example of using a running total:
select
count(case when chek_sum_jun > 0 and position = 'junior' then position else null end) chek_jun,
count(case when chek_sum_sen > 0 and position = 'senior' then position else null end) chek_sen
from (
select position,
20000 - sum(case when position = 'junior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row ) chek_sum_jun,
50000 - sum(case when position = 'senior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row ) chek_sum_sen
from test_table) x
demo : https://dbfiddle.uk/ZgOoSzF0

Sum by Groups Before and after appear Number in column SQL

I have a group of records by id ordered by date ('Date'), which I want to sum up the amounts or, in 2 groups, 1 before any number appears ('Condition'), and another group after the first number,It doesn't matter if a 0 appears after a number, add before any number appears, and add after.
You can use conditional aggregation. For example:
select
id,
sum(case when s = 0 then amount else 0 end) as amount_before,
sum(case when s <> 0 then amount else 0 end) as amount_after
from (
select t.*,
sum(abs(condition)) over(partition by id order by date) as s
from t
) x
group by id

Select the unique records from duplicates record using group by

I have a table with duplicate records say example multiple records with same account number. like this
Now I want to select only those records id which satisfies below condition priority wise:
Select the account number for which the prim_cust is X
If X is null than select account number which is having dept_id not null.
Both null than we should select the min(id).
Here we will have to group the account number and perform the above conditions.
I just want single record with unique account number with above conditions satisfied.
The condition should follow the priority
I think you have a prioritization query, where you want one row per acct_nbr subject to your various rules.
For this type of problem, row_number() is quite handy:
select t.*
from (select t.*,
row_number() over (partition by acct_nbr
order by (case when prim_cust = 'X' then 1 else 2 end),
(case when dept_id is not null then 1 else 2 end),
id
) as seqnum
from t
) t
where seqnum = 1;

SQL QUERY to count repeats with 2 conditions

To find repeated items only when when it satisfies two conditions. In this example count repeats of item type for each customer_id only when it has order size "Big" and its corresponding date is before other instances. This first condition and repeats can be achieved by using this code.
Select Customer_id, Item_Type, COUNT(*)
from table
group by Customer_id, Item_Type
having count(*) > 1 and sum(case when Order_Size = 'Big' then 1 else 0 end) > 0;
how do I include date aspect as well to this?
I would do this as:
select t.customer_id, t.item_type, count(*)
from (select t.*,
min(case when OrderSize = 'Big' then date end) over (partition by customer_id, item_type) as min_big
from t
) t
where date > min_big
group by t.customer_id, t.item_type;
I believe you could use a window function in a subquery to decide which rows to count, then count them in your main query. Something like:
Select
customer_id, item_type, sum(count_pass) as Count
FROM
(
Select Customer_id,
Item_Type,
CASE
WHEN Order_Size = 'Big' THEN 0
WHEN MIN(Order_Size) OVER (PARTITION BY Customer_ID, Item_Type ORDER BY DateField ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) = 'BIG' THEN 1
ELSE 0
END as count_pass
FROM table
) subqry
GROUP BY 1,2
That big case statement breaks down like:
If this record is 'Big' then ignore it
If you order all the records by date for each group of customer_id/item_type and look at all the records that precede this record, and the min(order_size) in that group of records (sorted lexicographically) is 'Big' then you have a preceding date with big and can count this record
Otherwise... you can't count it. Which would just be records with order_size='small' without a preceding 'big'.

Aggregate function to detect trend in PostgreSQL

I'm using a psql DB to store a data structure like so:
datapoint(userId, rank, timestamp)
where timestamp is the Unix Epoch milliseconds timestamp.
In this structure I store the rank of each user each day, so it's like:
UserId Rank Timestamp
1 1 1435366459
1 2 1435366458
1 3 1435366457
2 8 1435366456
2 6 1435366455
2 7 1435366454
So, in the sample data above, userId 1 its improving it's rank with each measurement, which means it has a positive trend, while userId 2 is dropping in rank, which means it has a negative trend.
What I need to do is to detect all users that have a positive trend based on the last N measurements.
One approach would be to perform a linear regression on the each user's rank, and check if the slope is positive or negative. Luckily, PostgreSQL has a builtin function to do that - regr_slope:
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id
This query gives you the basic functionality. Now, you can dress it up a bit with case expressions if you like:
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id) t
Edit:
Unfortunately, regr_slope doesn't have a built in way to handle "top N" type requirements, so this should be handled separately, e.g., by a subquery with row_number:
-- Decoration outer query
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (-- Inner query to calculate the slope
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM (-- Inner query to get top N
SELECT user_id, rank1,
ROW_NUMER() OVER (PARTITION BY user_id
ORDER BY timestamp1 DESC) AS rn
FROM my_table) t
WHERE rn <= N -- Replace N with the number of rows you need
GROUP BY user_id) t2
You can use analytic functions for this. Overall approach:
compute the previous rank using lag()
use case to decide whether the trend is positive or not (0 or 1)
use min() to get the minimum trend over the preceding N rows; if the trend was positive for N rows, this returns 1, otherwise 0. To limit it to N rows, use the between N preceding and 0 following clause of the windowing function
Code:
select v2.*,
min(positive_trend) over (partition by userid order by timestamp1
rows between 3 preceding and 0 following) as trend_overall
from (
select v1.*,
(case when prev_rank < rank1 then 0 else 1 end) as positive_trend
from (
select userid,
rank1,
timestamp1,
lag(rank1) over (partition by userid order by timestamp1) as prev_rank
from t1
order by userid, timestamp1
) v1
) v2
SQL Fiddle
UPDATE
To only get the userid with the overall trend and the delta for the rank, you'll have to add another call to lag(.., N+1) to get the nth previous rank and row_number() to get a numbering within the same userid:
select v3.userid, v3.trend_overall, delta_rank
from (
select v2.*,
min(positive_trend) over (partition by userid order by timestamp1
rows between 3 preceding and 0 following) as trend_overall,
latest_rank - prev_N_rank as delta_rank
from (
select v1.*,
(case when prev_rank < rank1 then 0 else 1 end) as positive_trend,
max(case when v1.rn = 1 then rank1 else NULL end) over (partition by userid) as latest_rank
from (
select userid,
rank1,
timestamp1,
lag(rank1) over (partition by userid order by timestamp1) as prev_rank,
lag(rank1, 4) over (partition by userid order by timestamp1) as prev_N_rank,
row_number() over (partition by userid order by timestamp1 desc) as rn
from t1
order by userid, timestamp1
) v1
) v2
) v3
where rn = 1
group by userid, trend_overall, delta_rank
order by userid, trend_overall, delta_rank
Updated SQL Fiddle