Retrieve Result from comparing multiple colums in a single table - sql

FID RP Area Count
1 100 0.780 1
2 100 0.906 2
2 500 0.094 2
3 100 1.000 1
4 100 1.000 1
5 100 0.784 2
5 500 0.916 2
6 100 0.332 3
6 500 0.780 3
6 555 0.643 3
In the above table, i want to retrieve the columns where Area>0.4. This will retrieve 8 rows. But i want answer in other way.
Look at Case where FID =5. In this, the area of RP 100 and 500 satisfy the criteria, but the output should be given high weigtage for RP =100. For the case where FID =6, RP=100 did not satisfy the criteria, but RP=500 and RP=555 satisfies the criteria. I want the weigtage to be given to RP=500.
Required Result:
FID RP Area Count
1 100 0.78007 1
2 100 0.90626 2
3 100 1 1
4 100 1 1
5 100 0.7835 2
6 500 0.78 3

So, you want the first row for each id where the value of Area exceeds 0.4 and "first" is ordered by RP.
Window function provide the mechanism to do this. Most databases support row_number():
select FID, RP, Area, "Count"
from (select t.*,
row_number() over (partition by fid order by rp) as seqnum
from t
where Area > 0.4
) t
where seqnum = 1;
The subquery filters the rows so only rows with valid values of Area are included. The row_number() function assigns sequential values to the rows within an fid (because of the partition by clause). The values are assigned in order by rp (due to the order by clause).

Related

Select column's occurence order without group by

I currently have two tables, users and coupons
id
first_name
1
Roberta
2
Oliver
3
Shayna
4
Fechin
id
discount
user_id
1
20%
1
2
40%
2
3
15%
3
4
30%
1
5
10%
1
6
70%
4
What I want to do is select from the coupons table until I've selected X users.
so If I chose X = 2 the resulting table would be
id
discount
user_id
1
20%
1
2
40%
2
4
30%
1
5
10%
1
I've tried using both dense_rank and row_number but they return the count of occurrences of each user_id not it's order.
SELECT id,
discount,
user_id,
dense_rank() OVER (PARTITION BY user_id)
FROM coupons
I'm guessing I need to do it in multiple subqueries (which is fine) where the first subquery would return something like
id
discount
user_id
order_of_occurence
1
20%
1
1
2
40%
2
2
3
15%
3
3
4
30%
1
1
5
10%
1
1
6
70%
4
4
which I can then use to filter by what I need.
PS: I'm using postgresql.
You've stated that you want to parameterize the query so that you can retrieve X users. I'm reading that as all coupons for the first X distinct user_ids in coupon id column order.
It appears your attempt was close. dense_rank() is the right idea. Since you want to look over the entire table you can't use partition by. And a sorting column is also required to determine the ranking.
with data as (
select *,
dense_rank() over (order by id) as dr
from coupons
)
select * from data where dr <= <X>;

Postgres The plpgsql aggregate function filters the length of each group

For plpgsql aggregate function help, not sure whether it can be realized. Thanks in advance for your help
Table
_id group_id content num len
0 2 tab 1 3
1 2 name 2 4
2 1 tag 1 3
3 1 bag 2 3
4 1 a 3 1
5 2 b 3 1
6 1 bo 4 2
7 2 an 4 2
I want to implement an aggregation function to aggregate according to group_id, and num is processed in sorted order, and then judge in the function to skip if len is less than or equal to 2, and then return the data of the specified length after each aggregation.
example:
with sorted_table as(select * from Table order by num)
select my_func(content, len, 2(required_num)) from sorted_table group by group_id;
expect result
_id group_id content num len
0 2 tab 1 3
1 2 name 2 4
2 1 tag 1 3
3 1 bag 2 3
for example, need to sort the top 10 (required_num) in each group, sort according to the num of each group, and compare the contents of the top 10 in turn. If the similarity is too high(i can use select similarity judge), filter out, and so on to reach 10 per group Claim. It may also be this
group_id result
2 [{"num":1,"content":"tab","len":3,"_id":0},{"num":2,"content":"name","len":4,"_id":1}]
1 [{"num":1,"content":"tag","len":3,"_id":2},{"num":2,"content":"bag","len":3,"_id":3}]
As far as I understand the question, you don't really need the custom aggregate:
select group_id,
jsonb_agg(t) filter (where len <= 2) as result
from the_table t
group by group_id;

Replace a column value with random values

I want to replace values in a column with randomized values
NO LINE
-- ----
1 1
1 2
1 3
1 4
2 1
2 2
3 1
4 1
4 2
I want to randomize column NO and replace with random values. I have 5 million records and doing something like below script gives me 5 million unique NO's but as you can see NO is not unique and i want the same random value assigned for the same NO.
UPDATE table1
SET NO= abs(checksum(NewId())) % 100000000
I want my resultant dataset like below
NO LINE
------ ----
99 1
99 2
99 3
99 4
1092 1
1092 2
3456 1
41098 1
41098 2
I would recommend rand() with a seed:
UPDATE table1
SET NO = FLOOR(rand(NO) * 100000000);
This runs a slight risk of collisions, so two different NO rows could get the same value.
If the numbers do not need to be "random" you can give them consecutive values in an arbitrary order and avoid collisions:
with toupdate as (
select t1.*,
dense_rank() over (order by rand(NO), no) as new_no
from t
)
update toupdate
set no = new_no;

Oracle SQL find row crossing limit

I have a table which has four columns as below
ID.
SUB_ID. one ID will have multiple SUB_IDs
Revenue
PAY where values of Pay is always less than or equal to Revenue
select * from Table A order by ID , SUB_ID will have data as below
ID SUB_ID REVENUE PAY
100 1 10 8
100 2 12 9
100 3 9 7
100 4 11 11
101 1 6 5
101 2 4 4
101 3 3 2
101 4 8 7
101 5 4 3
101 6 3 3
I have constant LIMIT value 20 . Now I need to find the SUB_ID which Revenue crosses the LIMIT when doing consecutive SUM using SUB_ID(increasing order) for each ID and then find total Pay ##. In this example
for ID 100 Limit is crossed by SUB ID 2 (10+12) . So total Pay
is 17 (8+9)
for ID 101 Limit is crossed by SUB ID 4
(6+4+3+8) . So total Pay is 18 (5+4+2+7)
Basically I need to find the row which crosses the Limit.
Fiddle: http://sqlfiddle.com/#!4/4f12a/4/0
with sub as
(select x.*,
sum(revenue) over(partition by id order by sub_id) as run_rev,
sum(pay) over(partition by id order by sub_id) as run_pay
from tbl x)
select *
from sub s
where s.run_rev = (select min(x.run_rev)
from sub x
where x.id = s.id
and x.run_rev > 20);

TSQL - divide rows into groups based on one field

This is modified version of my earlier question: TSQL equally divide resultset to groups and update them
I have my database with 2 tables like so:
Orders table has data like below:
OrderID OperatorID GroupID OrderDesc Status Cash ...
--------------------------------------------------------------------------
1 1 1 small_order 1 300
2 1 1 another_order 1 0
3 1 2 xxxxxxxxxxx 2 1000
5 2 2 yyyyyyyyyyy 2 150
9 5 1 xxxxxxxxxxx 1 50
10 NULL 2 xxxxxxxxxxx 1 150
11 NULL 3 xxxxxxxxxxx 1 -50
12 4 1 xxxxxxxxxxx 1 200
Operators table:
OperatorID Name GroupID Active
---------------------------------------
1 John 1 1
2 Kate 1 1
4 Jack 2 1
5 Will 1 0
6 Sam 3 0
I'm able to equally divide my recordset into equally groups using below query:
SELECT o.*, op.operatorName AS NewOperator, op.operatorID AS NewOperatorId
FROM (SELECT o.*, (ROW_NUMBER() over (ORDER BY newid()) % numoperators) + 1 AS randseqnum
FROM Orders o CROSS JOIN
(SELECT COUNT(*) AS numoperators FROM operators WHERE operators.active=1) op
WHERE o.status in (1,3)
) o JOIN
(SELECT op.*, ROW_NUMBER() over (ORDER BY newid()) AS seqnum
FROM Operators op WHERE op.active=1
) op
ON o.randseqnum = op.seqnum ORDER BY o.orderID
Demo available at: http://sqlfiddle.com/#!3/ff47b/1
Using script from above I can divide Orders to (almost) equal groups but based on number or Orders for Operator, but I need to modify it so that it will assign Operators to Orders based on sum or Cash for orders.
For example:
If I have 6 Orders with Cash values: 300, 0, 50, 150, -50, 200 they sum gives 650.
My script should assign to 3 Operators random 2 Orders with random sum of Cash for Orders.
What I would like to get is to assign for example 300,-50 to operator1, 200, 0 to second and 150, 50 to third.
Hope this sound clear :)
Here is example output that I expect to get:
ORDERID OPERATORID GROUPID DESCRIPTION STATUS CASH NEWOPERATORID
------------------------------------------------------------------------
1 1 1 small_order 1 300 2
2 1 1 another_order 1 0 1
9 5 1 xxxxxxxxxxx 1 50 4
10 (null) 2 xxxxxxxxxxx 1 150 4
11 (null) 3 xxxxxxxxxxx 1 -50 2
12 4 1 xxxxxxxxxxx 1 200 1
How can I (if I can at all) assign Operators to my Orders so that sum or Cash will be closest to average
If I'm understanding this right, could you get the result you want by ordering the Cash column by the biggest, then the smallest, then the next biggest, then the next smallest, etc. Like this:
ROW_NUMBER() over (order by CASE WHEN CashOrder % 2 = 1 then Cash else -Cash END) as splitCash
where you've provided CashOrder lower in the query with
ROW_NUMBER() over (ORDER by CASH) as CashOrder
Then you specify each of your operators depending on this split value, ie (for three operators):
splitCash%3 +1