How to choose the first value then ignor rest? - sql

I have a table with data as follows:
id activity amount
1 unknown 20
2 storage 20
3 storage 20
4 swift 20
5 delivery 50
6 storage 20
I want to create a query which gives me the "calculated" sum.
for the example above.. the desired result is:
id activity amount calculatedsum
1 unknown 20 0
2 storage 20 20
3 storage 20 20
4 swift 20 20
5 delivery 50 70 (had 20 and 50 arrived)
6 storage 20 70
the logic is simple..
find the first row which is 'storage', that is the calculatedsum. when encounter a row with 'delivery' add it and that is the new calculatedsum.
This is what I tried to do:
select *,
sum(case when activity = 'Storage' then amount
when activity = 'delivery' then + amount
else 0
end) over (order by id)
from A
however this doesn't work...
how can I get the expected result?
Edit: id is a colum which was created by: select row_number() over (order by .... nulls last) as id
the table contains the result from the query... and eveytime the query runs the table is reseted by it... so the id is always the actual row number.

If you are only counting the first 'storage', then you need to identify it. You can do that using row_number():
select *,
sum(case when activity = 'Storage' and seqnum = 1 then amount
when activity = 'delivery' then amount
else 0
end) over (order by id)
from (select a.*,
row_number() over (partition by activity order by id) as seqnum
from A a
) a
A totally weird way of doing this without a subquery, if we assuming that the amounts for storage are all the same:
select *,
(sum(case when activity = 'delivery' then amount
else 0
end) over (order by id) +
min(case when activity = 'storage' then amount else 0
end) over (order by id)
)
from A a;

Related

Remove duplicates from query

I have the below table
item
area
qty
item 1
a
10
item 1
b
17
item 2
b
20
item 3
a
10
item 2
c
8
I am looking to have a result in SQL as below (a unique item and a unique area):
item
area a
area b
area c
item 1
10
17
0
item 2
0
20
8
item 3
10
0
0
i do have this query which not giving me what am looking for if the area has been changed or increased also its for 2 columns table not 3 columns:
select
item,
max(case when seqnum = 1 then area end) as area_1,
max(case when seqnum = 2 then area end) as area_2,
max(case when seqnum = 3 then area end) as area_3
from (
select A.*,
row_number() over (partition by item order by area) as seqnum
from A
) A
group by item;
Looking forwards to your kind help.
If you have a fixed list of areas, then no need for window functions ; you can explicitly filter on each individual value in max().
Another fix to your query is to take the max of qty rather than of area (whose value is already filtered).
select item,
coalesce(max(case when area = 'a' then qty end), 0) as area_a,
coalesce(max(case when area = 'b' then qty end), 0) as area_b,
coalesce(max(case when area = 'c' then qty end), 0) as area_c
from mytable
group by item

how to convert a table to another in SQL (similar to pivot, but not exactly)

I have a database table looks like this in the below, in SQL Server 2016:
ProjectKey - Type - Percentage
----------------------------------------
40 8 100%
50 6 40%
50 9 60%
60 3 30%
60 8 30%
60 9 40%
(the max rows for the same ProjectKey is 3)
I want to write a query to be able to convert the above table to the following:
ProjectKey - Type1 - Percentage1 - Type2 - Percentage2 - Type3 - Percentage3
-------------------------------------------------------------------------------------
40 8 100% null null null null
50 6 40% 9 60% null null
60 3 30% 8 30% 9 40%
If it can be achieved by writing a SQL query that would be great. Anyone can help? Thank you very much!
You can use row_number() and conditional aggregation:
select projectkey,
max(case when seqnum = 1 then type end) as type_1,
max(case when seqnum = 1 then percentage end) as percentage_1,
max(case when seqnum = 2 then type end) as type_2,
max(case when seqnum = 2 then percentage end) as percentage_2,
max(case when seqnum = 3 then type end) as type_3,
max(case when seqnum = 3 then percentage end) as percentage_3
from (select t.*,
row_number() over (partition by projectkey order by type) as seqnum
from t
) t
group by projectkey;

how to set auto increment column value with condition

I have table like this:
value nextValue
1 2
2 3
3 20
20 21
21 22
22 23
23 NULL
Value is ordered ASC, nextValue is next row Value.
requirement is group by with condition nextValue-value>10, and count how many values in different groups.
For example, there should be two groups (1,2,3) and (20,21,22,23), first group count is 3, the second group count is 4.
I'm trying to mark each group with unique number, so I could group by these marked nums
value nextValue mark
1 2 1
2 3 1
3 20 1
20 21 2
21 22 2
22 23 2
23 NULL 2
But I don't know how to write mark column, I need an autocrement variable when nextValue-value>10.
Can I make it happen in Hive? Or there's better solution for the requirement?
If I understand correctly, you can use a cumulative sum. The idea is to set a flag when next_value - value > 10. This identifies the groups. So, this query adds a group number:
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc) as mark
from t
order by value;
You might not find this solution satisfying, because the numbering is in descending order. So, a bit more arithmetic fixes that:
select t.*,
(sum(case when nextvalue > value + 10 then 1 else 0 end) over () + 1 -
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc)
) as mark
from t
order by value;
Here is a db<>fiddle.
Calculate previous value, then calculate new_group_flag if value-prev_value >10, then calculate cumulative sum of new_group_flag to get group number (mark). Finally you can calculate group count using analytics function or group-by (in my example analytics count is used to show you the full dataset with all intermediate calculations). See comments in the code.
Demo:
with your_data as (--use your table instead of this
select stack(10, --the number of tuples generated
1 ,
2 ,
3 ,
20 ,
21 ,
22 ,
23 ,
40 ,
41 ,
42
) as value
)
select --4. Calculate group count, etc, etc
value, prev_value, new_group_flag, group_number,
count(*) over(partition by group_number) as group_count
from
(
select --3. Calculate cumulative sum of new group flag to get group number
value, prev_value, new_group_flag,
sum(new_group_flag) over(order by value rows between unbounded preceding and current row)+1 as group_number
from
(
select --2. calculate new_group_flag
value, prev_value, case when value-prev_value >10 then 1 else 0 end as new_group_flag
from
(
select --1 Calculate previous value
value, lag(value) over(order by value) prev_value
from your_data
)s
)s
)s
Result:
value prev_value new_group_flag group_number group_count
1 \N 0 1 3
2 1 0 1 3
3 2 0 1 3
20 3 1 2 4
21 20 0 2 4
22 21 0 2 4
23 22 0 2 4
40 23 1 3 3
41 40 0 3 3
42 41 0 3 3
This works for me
It needs "rows between unbounded preceding and current row" in my case.
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc rows between unbounded preceding and current row) as mark
from t
order by value;

COUNT() OVER possible using DISTINCT and WINDOWING IN HIVE

I want to calculate the number of distinct port numbers that exist between the current row and the X previous rows (sliding window), where x can be any integer number.
For instance,
If the input is:
ID PORT
1 21
2 22
3 23
4 25
5 25
6 21
The output should be:
ID PORT COUNT
1 21 1
2 22 2
3 23 3
4 25 4
5 25 4
6 21 4
I am using Hive, over RapidMiner and I have tried the following:
select id, port,
count (*) over (partition by srcport order by id rows between 5 preceding and current row)
This must work for big data and when X is big integer number.
Any feedback would be appreciated.
I don't think there is an easy way. One method uses lag():
select ( (case when port_5 is not null then 1 else 0 end) +
(case when port_4 is not null and port_4 not in (port_5) then 1 else 0 end) +
(case when port_3 is not null and port_3 not in (port_5, port_4) then 1 else 0 end) +
(case when port_2 is not null and port_2 not in (port_5, port_4, port_3) then 1 else 0 end) +
(case when port_1 is not null and port_1 not in (port_5, port_4, port_3, port_2) then 1 else 0 end) +
(case when port is not null and port not in (port_5, port_4, port_3, port_2, port_2) then 1 else 0 end)
) as cumulative_distinct_count
from (select t.*,
lag(port, 5) over (partition by srcport order by id rows) as port_5,
lag(port, 4) over (partition by srcport order by id rows) as port_4,
lag(port, 3) over (partition by srcport order by id rows) as port_3,
lag(port, 2) over (partition by srcport order by id rows) as port_2,
lag(port, 1) over (partition by srcport order by id rows) as port_1
from t
) t
This is a complicated query, but the performance should be ok.
Note: port and srcport I assume are the same thing, but this borrows from your query.
One way to do it is with a self join as distinct isn't supported in window functions.
select t1.id,count(distinct t2.port) as cnt
from tbl t1
join tbl t2 on t1.id-t2.id>=0 and t1.id-t2.id<=5 --change this number per requirements
group by t1.id
order by t1.id
This assumes id's are in sequential order.
If not, first get the row numbers and use the logic from above. It would be like
with rownums as (select id,port,row_number() over(order by id) as rnum
from tbl)
select r1.id,count(distinct r2.port)
from rownums r1
join rownums r2 on r1.rnum-r2.rnum>=0 and r1.rnum-r2.rnum<=5
group by r1.id

SQL Level the data conditionally

I have the following puzzle to solve (an urgent business assignment to be exact)
SQL SERVER 2008
I have a table of this form
ID Market SubMarket Value
1 1 1 3
2 1 2 6
3 1 3 2
4 2 23 1
5 2 24 9
I have specific MarketIDs and every MarketID has specific SubMarketIDs (maximum 5 - I know how may for each)
eg MarketID 1 has SubMarketIDs 1,2,3
MarketID 2 has SubMarketIDs 23,24 etc
and each SubMarketID has a variable value
I must transform my data in a fixed table of this type
MarketID SubMarketAvalue SubMarketBValue SubMarketCValue....SubMarketEValue
1 3 6 2 null
2 1 9 null null
SubMarketAValue must contain the value of the smaller SubMarketID
SubMarketBValue must contain the value of the next bigger SubMarketID
You did not specify the RDBMS, but you can use the following in SQL Server 2005+, Oracle and PostgreSQL:
select market,
max(case when rn = 1 then value end) as SubMarketAvalue,
max(case when rn = 2 then value end) as SubMarketBvalue,
max(case when rn = 3 then value end) as SubMarketCvalue,
max(case when rn = 4 then value end) as SubMarketDvalue,
max(case when rn = 5 then value end) as SubMarketEvalue
from
(
select id, market, submarket, value,
row_number() over(partition by market
order by market, submarket) rn
from yourtable
) x
group by market
see SQL Fiddle with Demo