Complex query involving average of a column over the month - sql

I have a table like this one, which name is tv_v2.tv_momentum
tv_date instrument_name factor
2019-07-22 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.1228599355797
2019-07-23 cbc267f7-6ace-4357-a803-7aaf96a2cc5a 50.0851750766468
2019-07-24 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.0474332287848
2019-07-25 cbc267f7-6ace-4357-a803-7aaf96a2cc31 50.0096342626235
2019-07-26 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.312332423432343
2019-07-27 cbc267f7-6ace-4357-a803-7aaf96a2cc48 23.424234234234
2019-07-28 cbc267f7-6ace-4357-a803-7aaf96a77777 15.33333332332323
2019-07-29 cbc267f7-6ace-4357-a803-7aaf96a2cc48 66.3333333333333
2019-07-30 cbc267f7-6ace-4357-a803-7aaf96a2cc4f 77.322332323223
2019-07-31 cbc267f7-6ace-4357-a803-7aaf96a2cc4s 50
I would like to get the average factor per instrument per month and the factor of just the last day of the month..can you help me in designing the query?
YEAR MONTH END_OF_MONTH_DAY INSTRUMENT_NAME AVERAGE_FACTOR_OVER_THE_MONTH END_OF_THE_MONTH_FACTOR
2019 7 31-7 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.11 50
2019 8 31-8 cbc267f7-6ace-4357-a803-7aaf96a2cc48 33 56

You can use conditional aggregation:
select year(tv_date), month(tv_date), max(tv_date),
instrument_name,
avg(factor),
max(case when seqnum = 1 then factor end)
from (select t.*,
row_number() over (partition by year(tv_date), month(tv_date) order by tv_date desc) as seqnum
from t
) t
group by year(tv_date), month(tv_date), instrument_name;

Related

IF Else or Case Function for SQL select problem

Hi I would like to make a select expression using case or if/else which seems to be a simple solution from logic perspective but I can't seem to get it to work. Basically I am joining against two table here, the first table is customer record with date filter called min_del_date and then the second table for the model scoring table with BIN and update_date parameters.
There are two logics I want to display
Picking the model score that was the month before min_del_date
If model score month before delivery is greater than 50 (Bin > 50) then pick the model score for same month as min_del_date
My 1st logic code is below
with cust as (
select
distinct cust_no, max(del_date) as del_date, min(del_date) as min_del_date, (EXTRACT(YEAR FROM min(del_date)) -1900)*12 + EXTRACT(MONTH FROM min(del_date)) AS upd_seq
from customer.cust_history
group by 1
)
,model as (
select party_id, model_id, update_date, upd_seq, bin, var_data8, var_data2
from
(
select
party_id, update_date, bin, var_data8, var_data2,
(EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) AS upd_seq,
dense_Rank() over (partition by (EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) order by update_date desc) as rank1
from
(
select party_id,update_date, bin, var_data8, var_data2
from model.rpm_model
group by party_id,update_date, bin, var_data8, var_data2
) model
)model_final
where rank1 = 1
)
-- Add model scores
-- 1st logic Picking the model score that was the month before delivery date
select *
from
(
select cust.cust_no, cust.del_date, cust.min_del_date, model.upd_seq, model.bin
from cust
left join cust
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
Now I am struggling in creating the 2nd logic in the same query?.. any assistance would be appreciated
cust table
cust_no
min_del_date
upd_seq
123
2021-01-11
1453
234
2020-06-29
1446
456
2020-07-20
1447
model table
party_id
update_date
upd_seq
BIN
123
2020-11-30
1451
22
123
2020-12-25
1452
54
123
2020-01-11
1453
14
234
2020-05-23
1445
76
234
2020-06-18
1446
48
234
2020-07-23
1447
12
456
2020-06-18
1446
23
456
2020-07-23
1447
39
456
2020-08-21
1448
21
desired results
cust_no
min_del_date
model.upd_seq
update_date
BIN
123
2021-01-11
1453
2020-01-11
14
234
2020-06-29
1446
2020-06-18
48
456
2020-07-20
1446
2020-06-18
23
Update
I managed to find the solution by myself, thanks for everyone who has attending this question. The solution is per below
select a.cust_no, a.del_date, a.min_del_date, b.update_date, b.upd_seq, b.bin
from
(
select cust.cust_no, cust.del_date, cust.min_del_date,
CASE WHEN model.BIN <=50 THEN model.upd_seq WHEN BIN > 50 THEN model.upd_seq +1 ELSE NULL END as upd_seq
from cust
inner join model
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
inner join model b
on a.cust_no = b.party_id
and a.upd_seq = b.upd_seq

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25
You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

oracle Filter duplicate column values

Below is the sample data extract i have. And i wanted to delete the duplicate row (last one in this example) as below. I was wondering how can i easily fetch this without that extra record in select query
ID YEAR CNT VOLUME INT_VOLUME RATE INT_RATE GM GM_RCNT
545 2016 12 5508 5508 1604 1604 0.71 NULL
545 2017 5 1138 2731 824 1977 0.28 -50.42
545 2018 NULL NULL -45 2351 NULL NULL NULL
626 2016 12 679862 679862 252693 252693 0.63 NULL
626 2017 12 705365 705365 282498 282498 0.6 3.75
626 2018 12 707472 707472 291762 291762 0.59 0.3
626 2018 NULL NULL 711372 NULL 295186 NULL NULL --Filter such rows in select
You can choose one year for each id using row_number():
select t.*
from (select t.*,
row_number() over (partition by id, year order by id) as seqnum
from t
) t
where seqnum = 1;
This chooses an arbitrary row to keep. You can adjust the order by to refine which row you want to keep. You can order by rowid, but there is no guarantee that it is the "earliest" row. You need a date or sequence column for that purpose.

How to get last quarterly and last half yearly average of balance for each month in hive?

I have a table with column cust_id, year_, month_, monthly_txn, monthly_bal. I need
to calculate the previous three month and previous six month avg(monthly_txn) and variance(monthly_bal) for each month. I have a query which returns avg and variance for last three and six month only for last month not for each month. I am not good in analytical function in Hive.
SELECT cust_id, avg(monthly_txn)y,variance(monthly_bal)x, FROM (
SELECT cust_id, monthly_txn,monthly_bal,
row_number() over (partition by cust_id order by year_,month_ desc) r
from mytable) b WHERE r <= 3 GROUP BY cust_id
But I want something like below.
input:
cust_id year_ month_ monthly_txn monthly_bal
1 2018 1 456 8979289
1 2018 2 675 4567
1 2018 3 645 4890
1 2017 1 342 44522
1 2017 2 378 9898900
1 2017 2 456 234492358
1 2017 4 3535 789
1 2017 5 456 345
1 2017 6 598 334
expecting output:
suppose for txn the quaterly and half yearly txn will be like this same for variance also
cust_id year_ month_ monthly_txn monthly_bal q_avg_txn h_avg_txn
1 2018 1 456 8979289 avg(456,598,4561) avg(456,598,4561,3535,4536,378)
1 2018 2 675 4567 avg(675,456,598) avg(675,456,3535,4561,598,4536)
1 2018 3 645 4890 avg(645,675,645) avg(645,675,645,3535,4561,598)
1 2017 1 342 44522 avg(342) avg(342)
1 2017 2 378 9898900 avg(378,342) avg(378,342)
1 2017 3 4536 234492358 avg(4536,372,342) avg(4536,378,342)
1 2017 4 3535 789 avg(3535,4536,378) avg(3535,4536,378,342)
1 2017 5 4561 345 avg(4561,3535,4536) avg(4561,3535,4536,342,378)
1 2017 6 598 334 avg(598,4561,3535) avg(598,4561,3535,4536,342,378)
use unbounded preceding analytic functions (/* to get the quarterly and half years values) and then use the subquery to get results.
What is ROWS UNBOUNDED PRECEDING used for in Teradata?
If you have data for every month of interest (i.e., no gaps), then this should work:
select t.*,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as avg_3,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as avg_6,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as variance_3,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as variance_6
from mytable t;

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.