SQL: count last equal values - sql

I need to solve this problem in pure SQL:
I have to count all the records with a specific value:
In my table there is a column flag with values 0 or 1. I need to count all the 1 after last 0 and sum the amount column values of those records.
Example:
Flag | Amount
0 | 5
1 | 8
0 | 10
1 | 20
1 | 30
Output:
2 | 50
If last value is 0 I don't need to do anything.
I hasten that I need to perform a fast query (possibly accessing just one time).

I assumed that your example table is logically ordered by Amount. Then you can do this:
select
count(*) as cnt
,sum(Amount) as Amount
from yourTable
where Amount > (select max(Amount) from yourTable where Flag = 0)
If the biggest value is from a row where Flag = 0 then nothing will be returned.

If your table may not contain any zeros, then you are safer with:
select count(*) as cnt, sum(Amount) as Amount
from t
where Amount > all (select Amount from t where Flag = 0)
Or, using window functions:
select count(*) as cnt, sum(amount) as amount
from (select t.*, max(case when flag = 0 then amount end) as flag0_amount
from t
) t
where flag0_amount is null or amount > flag0_amount

I find the solution by myself:
select decode(lv,0,0,tot-prog) somma ,decode(lv,0,0,cnt-myrow) count
from(
select * from
(
select pan,dt,flag,am,
last_value(flag) over() lv,
row_number() OVER (order by dt) AS myrow,
count(*) over() cnt,
case when lead(flag) OVER (ORDER BY dt) != flag then rownum end AS change,
sum(am) over() tot,
sum(am) over(order by dt) prog
from test
where pan=:pan and dt > :dt and flag is not null
order by dt
) t
where change is not null
order by change desc
) where rownum =1

Related

How would I extract only the latest week from a select over statement in Hiveql?

I need some help, I've created a query which keeps a running total of whether an element returns a 1 or 0 against a specific measure with the running total returning to 0 if the measure provides a 0, Example below:
year_week element measure running_total
2020_40 A 1 1
2020_41 A 1 2
2020_42 A 1 3
2020_43 A 0 0
2020_44 A 1 1
2020_45 A 1 2
2020_40 B 1 1
2020_41 B 1 2
2020_42 B 1 3
2020_43 B 1 4
2020_44 B 1 5
2020_45 B 1 6
The above is achieved using this query:
SELECT element,
year_week,
measure,
SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM (
SELECT *,
SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
FROM (
SELECT *,
CASE WHEN measure = 1 THEN 0 ELSE 1 END AS measure_flag
FROM database.table ) x ) y
This is great and works - but I'd like to provide only the latest weeks data for each element. So in the above example it would be:
year_week element measure running_total
2020_45 A 1 2
2020_45 B 1 6
Essentially I need to keep the logic the same but limit the returned data set. I've attempted this however it changes the result from the correct running total to a 1 or 0.
Any help is greatly appreciated!
You can add another level of nesting, and filter the latest record per element with row_number().
I would suggest:
select element, year_week, measure, running_total
from (
select t.*,
row_number() over(partition by element, grp order by year_week) - 1 as running_total
from (
select t.*,
sum(1 - measure) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1
I simplified the query a little, considering the fact that measure has values 0 and 1 only, as showed in your sample data. If that's not the case, then:
select element, year_week, measure, running_total
from (
select t.*,
sum(measure) over(partition by element, grp order by year_week) as running_total
from (
select t.*,
sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1

COUNT() OVER possible using DISTINCT and WINDOWING IN HIVE

I want to calculate the number of distinct port numbers that exist between the current row and the X previous rows (sliding window), where x can be any integer number.
For instance,
If the input is:
ID PORT
1 21
2 22
3 23
4 25
5 25
6 21
The output should be:
ID PORT COUNT
1 21 1
2 22 2
3 23 3
4 25 4
5 25 4
6 21 4
I am using Hive, over RapidMiner and I have tried the following:
select id, port,
count (*) over (partition by srcport order by id rows between 5 preceding and current row)
This must work for big data and when X is big integer number.
Any feedback would be appreciated.
I don't think there is an easy way. One method uses lag():
select ( (case when port_5 is not null then 1 else 0 end) +
(case when port_4 is not null and port_4 not in (port_5) then 1 else 0 end) +
(case when port_3 is not null and port_3 not in (port_5, port_4) then 1 else 0 end) +
(case when port_2 is not null and port_2 not in (port_5, port_4, port_3) then 1 else 0 end) +
(case when port_1 is not null and port_1 not in (port_5, port_4, port_3, port_2) then 1 else 0 end) +
(case when port is not null and port not in (port_5, port_4, port_3, port_2, port_2) then 1 else 0 end)
) as cumulative_distinct_count
from (select t.*,
lag(port, 5) over (partition by srcport order by id rows) as port_5,
lag(port, 4) over (partition by srcport order by id rows) as port_4,
lag(port, 3) over (partition by srcport order by id rows) as port_3,
lag(port, 2) over (partition by srcport order by id rows) as port_2,
lag(port, 1) over (partition by srcport order by id rows) as port_1
from t
) t
This is a complicated query, but the performance should be ok.
Note: port and srcport I assume are the same thing, but this borrows from your query.
One way to do it is with a self join as distinct isn't supported in window functions.
select t1.id,count(distinct t2.port) as cnt
from tbl t1
join tbl t2 on t1.id-t2.id>=0 and t1.id-t2.id<=5 --change this number per requirements
group by t1.id
order by t1.id
This assumes id's are in sequential order.
If not, first get the row numbers and use the logic from above. It would be like
with rownums as (select id,port,row_number() over(order by id) as rnum
from tbl)
select r1.id,count(distinct r2.port)
from rownums r1
join rownums r2 on r1.rnum-r2.rnum>=0 and r1.rnum-r2.rnum<=5
group by r1.id

SQL Server : group by sum of column

I need to aggregate data by one column which contains numeric data.
I have data like:
ID | Amount
---+-------
1 | 44
2 | 15
3 | 16
4 | 8
5 | 16
Result, which I expect is:
ID | Amount
---+-------
1 | 44
2 | 31
4 | 24
Query should group data ordered by ID column by Amount column in parts of max sum of amount 32. If amount is greater then 32 then it should be presented as one 'group'. Result should contain Min(ID) and SUM(Amount) which can't be greater than 32 when group more than one record.
The only way that I know how to accomplish this is using iteration (although in your case if you have enough single values over 32, then you might be able to use a more efficient approach).
Iteration in SQL Server queries is handled by recursive CTEs (once you forswear cursors):
with v as (
select *
from (values (1, 44), (2, 15), (3, 16), (4, 8), (5, 16) ) v(id, amount)
),
t as (
select v.*, row_number() over (order by id) as seqnum
),
cte as (
select seqnum, id, amount, id as grp
from t
where seqnum = 1
union all
select t.seqnum, t.id,
(case when t.amount + cte.amount > 32 then t.amount else t.amount + cte.amount end) as amount,
(case when t.amount + cte.amount > 32 then t.id else cte.grp end) as grp
from cte join
t
on cte.seqnum = t.seqnum + 1
)
select grp, max(amount)
from cte
group by grp;
I should note that the use of max(amount) in the outer query assumes that the values are never negative. A slight modification can handle that situation.
Also, the intermediate result using t is not strictly necessary for the data you have provided. It ensures that the columns used in the join actually have no gaps.
You can try this version with rownumbers assigned initially and each row is joined to the previous one in a recursive cte. And if the running sum > 32 a new group starts.
with rownums as (select t.*,row_number() over(order by id) as rnum from t)
,cte(rnum,id,amount,runningsum,grp) as (select rnum,id,amount,amount,1 from rownums where rnum=1
union all
select t.rnum,t.id,t.amount
,case when c.runningsum+t.amount > 32 then t.amount else c.runningsum+t.amount end
,case when c.runningsum+t.amount > 32 then t.id else c.grp end
from cte c
join rownums t on t.rnum=c.rnum+1
)
select grp as id,max(runningsum) as amount
from cte
group by grp
Sample Demo

Count consecutive duplicate values in SQL

I have a table like so
ID OrdID Value
1 1 0
2 2 0
3 1 1
4 2 1
5 1 1
6 2 0
7 1 0
8 2 0
9 2 1
10 1 0
11 2 0
I want to get the count of consecutive value where the value is 0. Using the example above the result will be 3 (Rows 6, 7 and 8). I am using sql server 2008 r2.
I am going to presume that id is unique and increasing. You can get counts of consecutive values by using the different of row numbers. The following counts all sequences:
select grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value;
If you want the longest sequence of 0s:
select top 1 grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value
having value = 0
order by count(*) desc
A query using not exists to find consecutive 0s
select top 1 min(t2.id), max(t2.id), count(*)
from mytable t
join mytable t2 on t2.id <= t.id
where not exists (
select 1 from mytable t3
where t3.id between t2.id and t.id
and t3.value <> 0
)
group by t.id
order by count(*) desc
http://sqlfiddle.com/#!3/52989/3

Second maximum and minimum values

Given a table with multiple rows of an int field and the same identifier, is it possible to return the 2nd maximum and 2nd minimum value from the table.
A table consists of
ID | number
------------------------
1 | 10
1 | 11
1 | 13
1 | 14
1 | 15
1 | 16
Final Result would be
ID | nMin | nMax
--------------------------------
1 | 11 | 15
You can use row_number to assign a ranking per ID. Then you can group by id and pick the rows with the ranking you're after. The following example picks the second lowest and third highest :
select id
, max(case when rnAsc = 2 then number end) as SecondLowest
, max(case when rnDesc = 3 then number end) as ThirdHighest
from (
select ID
, row_number() over (partition by ID order by number) as rnAsc
, row_number() over (partition by ID order by number desc) as rnDesc
) as SubQueryAlias
group by
id
The max is just to pick out the one non-null value; you can replace it with min or even avg and it would not affect the outcome.
This will work, but see caveats:
SELECT Id, number
INTO #T
FROM (
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 11 number
UNION
SELECT 1 ID, 13 number
UNION
SELECT 1 ID, 14 number
UNION
SELECT 1 ID, 15 number
UNION
SELECT 1 ID, 16 number
) U;
WITH EX AS (
SELECT Id, MIN(number) MinNumber, MAX(number) MaxNumber
FROM #T
GROUP BY Id
)
SELECT #T.Id, MIN(number) nMin, MAX(number) nMax
FROM #T INNER JOIN
EX ON #T.Id = EX.Id
WHERE #T.number <> MinNumber AND #T.number <> MaxNumber
GROUP BY #T.Id
DROP TABLE #T;
If you have two MAX values that are the same value, this will not pick them up. So depending on how your data is presented you could be losing the proper result.
You could select the next minimum value by using the following method:
SELECT MAX(Number)
FROM
(
SELECT top 2 (Number)
FROM table1 t1
WHERE ID = {MyNumber}
order by Number
)a
It only works if you can restrict the inner query with a where clause
This would be a better way. I quickly put this together, but if you can combine the two queries, you will get exactly what you were looking for.
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID) as myRowNumber
from MyTable
) x
where x.myRowNumber = 2
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID desc) as myRowNumber
from MyTable
) y
where x.myRowNumber = 2
let the table name be tblName.
select max(number) from tblName where number not in (select max(number) from tblName);
same for min, just replace max with min.
As I myself learned just today the solution is to use LIMIT. You order the results so that the highest values are on top and limit the result to 2. Then you select that subselect and order it the other way round and only take the first one.
SELECT somefield FROM (
SELECT somefield from table
ORDER BY somefield DESC LIMIT 2)
ORDER BY somefield ASC LIMIT 1