Hive, ordering lines using a variable lag

Hive, ordering lines using a variable lag - sql

I have the following hive table:
product | price
A | 100
B | 102
C | 220
D | 240
E | 242
F | 410
For every line I would like to divide the lower price by the current price, if the result is greater than 0.9 I would like to increments a row number. If the result is lower than 0.9 then row number should be 1 for this line, and current price become lower price, then iterate.
Result should look like:
product | price | row_number
A | 100 | 1
B | 102 | 2
C | 220 | 1
D | 240 | 2
E | 242 | 3
F | 410 | 1
Because:
lower price = 100: product A get 1 as row_number
100/102 >= 0.9: product B get 2 as row_number
100/220 < 0.9: product C get 1 as row_number, lower price = 220
220/240 >= 0.9: product D get 2 as row_number
220/242 >= 0.9: product E get 3 as row_number
220/410 < 0.9: product F get 1 as row_number, lower price = 410
I was thinking about creating a temporary_row_number just ordered by price:
product | price | temp_row_number
A | 100 | 1
B | 102 | 2
C | 220 | 3
D | 240 | 4
E | 242 | 5
F | 410 | 6
And then:
Select
product,
price,
case
when lag(price,temp_row_number-1,0)/price over() >= 0.9 then lag(price,temp_row_number-1,0)
else price
end as test
from my_table
This will retrieve:
product | price | test
A | 100 | 100
B | 102 | 100
C | 220 | 220
D | 240 | 240
E | 242 | 242
F | 410 | 410
But ideally I would like to retrieve
product | price | test
A | 100 | 100
B | 102 | 100
C | 220 | 220
D | 240 | 220
E | 242 | 220
F | 410 | 410
So I could compute row_number row using the row_number() function order by product and price and get the expected result.

WITH CTE
AS
(select product,price,(case when price between 100 and 200 then 1
when price between 200 and 300 then 2
when price between 300 and 400 then 3 END ) AS RN
FROM #test)
SELECT Product,Price, ROW_NUMBER() OVER (PARTITION BY RN ORDER BY RN) FROM CTE
ORDER BY Product

Related

Partition By - Sum all values Excluding Maximum Value

I have data as follows
+----+------+--------+
| ID | Code | Weight |
+----+------+--------+
| 1 | M | 200 |
| 1 | 2A | 50 |
| 1 | 2B | 50 |
| 2 | | 350 |
| 2 | M | 350 |
| 2 | 3A | 120 |
| 2 | 3B | 120 |
| 3 | 5A | 100 |
| 4 | | 200 |
| 4 | | 100 |
+----+------+--------+
For ID 1 the max weight is 200, I want to subtract sum of all weights from ID 1 except the max value that is 200.
There might be a case when there are 2 rows containing max values for same id. Example for ID 2 we have 2 rows containing max value i.e. 350 . In such scenario I want to sum all values except the max value. But I would mark weight 0 for 1 of the 2 rows containing max value. That row would be the one where Code is NULL/Blank.
Case where there is only 1 row for an ID the row would be kept as is.
Another scenario could be one where there is only row containing max weight but Code is NULL/Blank in such case we would simply do what we did for ID 1. Sum all values except max value and subtract from row containing max value.
Desired Output
+----+------+--------+---------------+
| ID | Code | Weight | Actual Weight |
+----+------+--------+---------------+
| 1 | M | 200 | 100 |
| 1 | 2A | 50 | 50 |
| 1 | 2B | 50 | 50 |
| 2 | | 350 | 0 |
| 2 | M | 350 | 110 |
| 2 | 3A | 120 | 120 |
| 2 | 3B | 120 | 120 |
| 3 | 5A | 100 | 100 |
| 4 | | 200 | 100 |
| 4 | | 100 | 100 |
+----+------+--------+---------------+
I want to create column Actual Weight as shown above. I can't find a way to apply partition by excluding max value and create column Actual Weight.

dense_rank() to identify the row with max weight, dr = 1 is rows with max weight
row_number() to differentiate the max weight row for Code = blank from others
with cte as
(
select *,
dr = dense_rank() over (partition by ID order by [Weight] desc),
rn = row_number() over (partition by ID order by [Weight] desc, Code desc)
from tbl
)
select *,
ActWeight = case when dr = 1 and rn <> 1
then 0
when dr = 1 and rn = 1
then [Weight]
- sum(case when dr <> 1 then [Weight] else 0 end) over (partition by ID)
else [Weight]
end
from cte
dbfiddle demo

Hmmm . . . I think you just want window functions and conditional logic:
select t.*,
(case when 1 = row_number() over (partition by id order by weight desc, (case when code <> '' then 2 else 1 end))
then weight - sum(case when weight <> max_weight then weight else 0 end) over (partition by id)
else weight
end) as actual_weight
from (select t.*,
max(weight) over (partition by id, code) as max_weight
from t
) t

SQL server 2008: join 3 tables and select last entered record from child table against each parent record

I have following 3 tables and last entered reasoncode from Reasons table against each claimno in claims table.
Reasons:
Rid |chargeid| enterydate user reasoncode
-----|--------|-------------|--------|----------
1 | 210 | 04/03/2018 | john | 99
2 | 212 | 05/03/2018 | juliet | 24
5 | 212 | 26/12/2018 | umar | 55
3 | 212 | 07/03/2018 | borat | 30
4 | 211 | 03/03/2018 | Juliet | 20
6 | 213 | 03/03/2018 | borat | 50
7 | 213 | 24/12/2018 | umer | 60
8 | 214 | 01/01/2019 | john | 70
Charges:
chargeid |claim# | amount
---------|-------|---------
210 | 1 | 10
211 | 1 | 24.2
212 | 2 | 5.45
213 | 2 | 76.30
214 | 1 | 2.10
Claims:
claimno | Code | Code
--------|-------|------
1 | AH22 | AH22
2 | BB32 | BB32
Expected result would be like this:
claimno | enterydate | user | reasoncode
--------|-------------|--------|-----------
1 | 01/01/2019 | john | 70
2 | 26/12/2018 | umer | 55
I have applied many solutions but no luck. Following is the latest solution I was trying using SQL Server 2008 but still got incorrect result.
With x As
(
select r.chargeid,r.enterydate,ch.claimno from charges ch
join (select chargeid,max(enterydate) enterydate,user from Reasons group by chargeid) r on r.chargeid = ch.chargeid
)
select x.*,r1.user, r1.reasoncode from x
left outer join Reasons r1 on r1.chargeid = x.chargeid and r1.enterydate = x.enterydate
--group by x.claimno

Is this what you want?
select claimno, enterydate, user, reasoncode
from (select c.claimno, r.*,
row_number() over (partition by c.claimno order by r.entrydate desc) as seqnum
from charges c join
reasons r
on c.chargeid = r.chargeid
) cr
where seqnum = 1;

You can try using row_number()
select * from
(
select r.chargeid,r.enterydate,ch.claimno,user,reasoncode,
row_number() over(partition by ch.claimno order by r1.enterydate desc) as rn
from charges ch left outer join Reasons r1 on r1.chargeid = ch.chargeid
)A where rn=1

PostgreSQL, Cumulative amount with interval

Hello there i have this example dataset:
employee_id | amount | cumulative_amount
-------------+------------+-----------------
2 | 100 | 100
6 | 220 | 320
7 | 45 | 365
8 | 50 | 415
9 | 110 | 525
16 | 300 | 825
17 | 250 | 1075
18 | 200 | 1275
And interval, let's say 300:
I'd like to pick only rows, that match the interval, with condition:
Pick value if it's >= previous value+interval
(e.g if start Val = 100, next matching row is where cumulative amount >= 400, and so on)
:
employee_id | amount | cumulative_amount
-------------+------------+-----------------
2 | 100 | 100 <-- $Start
6 | 220 | 320 - 400
7 | 45 | 365 - 400
8 | 50 | 415 <-- 1
9 | 110 | 525 - 715 (prev value (415)+300)
16 | 300 | 825 <-- 2
17 | 250 | 1075 - 1125 (825+300)
18 | 200 | 1275 <-- 3
so final result would be :
employee_id | amount | cumulative_amount
-------------+------------+-----------------
2 | 100 | 100
8 | 50 | 415
16 | 300 | 825
18 | 200 | 1275
How to achieve this in PostgreSQL in most efficient way ?
Column cumulative_amount is progressive sum of column amount
and it's calculated in another query, which result is dataset above, table is ordered by employee_id.
Regards.

not saying it is the most effective way, but probably the easiest:
s=# create table s1(a int, b int, c int);
CREATE TABLE
Time: 10.262 ms
s=# copy s1 from stdin delimiter '|';
...
s=# with g as (select generate_series(100,1300,300) s)
, o as (select *,sum(b) over (order by a) from s1)
, c as (select *, min(sum) over (partition by g.s)
from o
join g on sum >= g.s and sum < g.s + 300
)
select a,b,sum from c
where sum = min
;
a | b | sum
----+-----+------
2 | 100 | 100
8 | 50 | 415
16 | 300 | 825
17 | 250 | 1075
(4 rows)
here I used order by a as you sad your cumulative sum is by first column (which reconciled with third row)

SQL Server 2008 - accumulating column

I would like to accumulate my data as you can see below there is origin table table1:
What is the best query for to do this?
Is possible to do this dynamically - when I add more types of terms??
Table 1
ID | term | value
-----------------------
1 | I | 100
2 | I | 200
3 | II | 100
4 | II | 50
5 | II | 75
6 | III | 50
7 | III | 65
8 | IV | 30
9 | IV | 45
And the result should be like below:
YTD | Acc Value
------------------
I-I | 300
I-II | 525
I-III| 640
I-IV | 715
Thanks

select
(select min(term) from yourtable ) +'-'+term,
(select sum(value) from yourtable t1 where t1.term<=t.term)
from yourtable t
group by term

Reduce rows in SQL

I have a select query that will return something like the following table:
start | stop | id
------------------
0 | 100 | 1
1 | 101 | 1
2 | 102 | 1
2 | 102 | 2
5 | 105 | 1
7 | 107 | 2
...
300 | 400 | 1
370 | 470 | 1
450 | 550 | 1
Where stop = start + n; n = 100 in this case.
I would like to merge the overlaps for each id:
start | stop | id
------------------
0 | 105 | 1
2 | 107 | 2
...
300 | 550 | 1
id 1 does not give 0 - 550 because the start 300 is after stop 105.
There will be hundreds of thousands of records returned by the first query and n can go up to tens of thousands, so the faster it can be processed the better.
Using PostgreSQL btw.

WITH bounds AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY start) AS rn
FROM (
SELECT id, LAG(stop) OVER (PARTITION BY id ORDER BY start) AS pstop, start
FROM q
UNION ALL
SELECT id, MAX(stop), NULL
FROM q
GROUP BY
id
) q2
WHERE start > pstop OR pstop IS NULL OR start IS NULL
)
SELECT b2.start, b1.pstop
FROM bounds b1
JOIN bounds b2
ON b1.id = b2.id
AND b1.rn = b2.rn + 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive, ordering lines using a variable lag - sql

WITH CTE AS (select product,price,(case when price between 100 and 200 then 1 when price between 200 and 300 then 2 when price between 300 and 400 then 3 END ) AS RN FROM #test) SELECT Product,Price, ROW_NUMBER() OVER (PARTITION BY RN ORDER BY RN) FROM CTE ORDER BY Product

Related

Partition By - Sum all values Excluding Maximum Value

SQL server 2008: join 3 tables and select last entered record from child table against each parent record

PostgreSQL, Cumulative amount with interval

SQL Server 2008 - accumulating column

Reduce rows in SQL

Categories

Resources