oracle Filter duplicate column values - sql

Below is the sample data extract i have. And i wanted to delete the duplicate row (last one in this example) as below. I was wondering how can i easily fetch this without that extra record in select query
ID YEAR CNT VOLUME INT_VOLUME RATE INT_RATE GM GM_RCNT
545 2016 12 5508 5508 1604 1604 0.71 NULL
545 2017 5 1138 2731 824 1977 0.28 -50.42
545 2018 NULL NULL -45 2351 NULL NULL NULL
626 2016 12 679862 679862 252693 252693 0.63 NULL
626 2017 12 705365 705365 282498 282498 0.6 3.75
626 2018 12 707472 707472 291762 291762 0.59 0.3
626 2018 NULL NULL 711372 NULL 295186 NULL NULL --Filter such rows in select

You can choose one year for each id using row_number():
select t.*
from (select t.*,
row_number() over (partition by id, year order by id) as seqnum
from t
) t
where seqnum = 1;
This chooses an arbitrary row to keep. You can adjust the order by to refine which row you want to keep. You can order by rowid, but there is no guarantee that it is the "earliest" row. You need a date or sequence column for that purpose.

Related

Retain all the records grouped by an ID comparing only the values with similar strings within the group that has the minimum value

Given this data:
Bolt_Table:
PID
UNIQ ID
GROUP_ID
Distance
PID_24_2225
14
13
1141
PID_5_1444E
3214
13
652
PID_5_14454
3152
13
802
PID_24_2225
15
14
1141
PID_5_14454
3151
14
802
PID_5_1444E
3213
14
652
PID_26_21FC
536
2300
597
PID_5_13388
4121
2300
620
PID_5_13382
4169
2300
802
This is the desired result:
PID
UNIQ_ID
GROUP_ID
Distance
PID_5_1444E
3214
13
652
PID_5_1444E
3213
14
652
PID_5_13388
4121
2300
620
Explanation:
1st Record: #Group ID = 13,
Get the similar PID
PID_5_1444E and PID_5_14454 - compare the corresponding distances minimum of (652 and 802). Since 652 is the least, the corresponding PID: " PID_5_1444E " should be retained, hence record 1 of the desired table
What would be the query for SQL? (Microsoft Access)
I tried using LIKE, MID(String,1,4), GROUP BY & HAVING but nothing seems to work. How should I make the query for this?
The closest one I got is when I force to do the judging under a hard coded GROUP_ID,I would like to do it FOR EACH GROUP_ID
SELECT TOP 1 PERCENT PID, UNIQ_ID, GROUP_ID, Distance
FROM
(
SELECT
a.PID, a.UNIQ_ID, a.GROUP_ID, ID, a.Distance,
(select count(PID) as counter from Bolt_Table where GROUP_ID = a.GROUP_ID and LEFT(PID, 9) = LEFT(a.PID, 9)) as counter from Bolt_Table a WHERE a.GROUP_ID = 13
)
where counter > 1
order by Distance
SELECT b.pid, b.[uniq id], b.group_id, b.distance FROM bolt_table as b INNER JOIN (SELECT group_id, min(distance) as mindist FROM bolt_table GROUP BY group_id) as a on b.group_id = a.group_id AND b.distance = a.mindist

IF Else or Case Function for SQL select problem

Hi I would like to make a select expression using case or if/else which seems to be a simple solution from logic perspective but I can't seem to get it to work. Basically I am joining against two table here, the first table is customer record with date filter called min_del_date and then the second table for the model scoring table with BIN and update_date parameters.
There are two logics I want to display
Picking the model score that was the month before min_del_date
If model score month before delivery is greater than 50 (Bin > 50) then pick the model score for same month as min_del_date
My 1st logic code is below
with cust as (
select
distinct cust_no, max(del_date) as del_date, min(del_date) as min_del_date, (EXTRACT(YEAR FROM min(del_date)) -1900)*12 + EXTRACT(MONTH FROM min(del_date)) AS upd_seq
from customer.cust_history
group by 1
)
,model as (
select party_id, model_id, update_date, upd_seq, bin, var_data8, var_data2
from
(
select
party_id, update_date, bin, var_data8, var_data2,
(EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) AS upd_seq,
dense_Rank() over (partition by (EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) order by update_date desc) as rank1
from
(
select party_id,update_date, bin, var_data8, var_data2
from model.rpm_model
group by party_id,update_date, bin, var_data8, var_data2
) model
)model_final
where rank1 = 1
)
-- Add model scores
-- 1st logic Picking the model score that was the month before delivery date
select *
from
(
select cust.cust_no, cust.del_date, cust.min_del_date, model.upd_seq, model.bin
from cust
left join cust
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
Now I am struggling in creating the 2nd logic in the same query?.. any assistance would be appreciated
cust table
cust_no
min_del_date
upd_seq
123
2021-01-11
1453
234
2020-06-29
1446
456
2020-07-20
1447
model table
party_id
update_date
upd_seq
BIN
123
2020-11-30
1451
22
123
2020-12-25
1452
54
123
2020-01-11
1453
14
234
2020-05-23
1445
76
234
2020-06-18
1446
48
234
2020-07-23
1447
12
456
2020-06-18
1446
23
456
2020-07-23
1447
39
456
2020-08-21
1448
21
desired results
cust_no
min_del_date
model.upd_seq
update_date
BIN
123
2021-01-11
1453
2020-01-11
14
234
2020-06-29
1446
2020-06-18
48
456
2020-07-20
1446
2020-06-18
23
Update
I managed to find the solution by myself, thanks for everyone who has attending this question. The solution is per below
select a.cust_no, a.del_date, a.min_del_date, b.update_date, b.upd_seq, b.bin
from
(
select cust.cust_no, cust.del_date, cust.min_del_date,
CASE WHEN model.BIN <=50 THEN model.upd_seq WHEN BIN > 50 THEN model.upd_seq +1 ELSE NULL END as upd_seq
from cust
inner join model
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
inner join model b
on a.cust_no = b.party_id
and a.upd_seq = b.upd_seq

Complex query involving average of a column over the month

I have a table like this one, which name is tv_v2.tv_momentum
tv_date instrument_name factor
2019-07-22 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.1228599355797
2019-07-23 cbc267f7-6ace-4357-a803-7aaf96a2cc5a 50.0851750766468
2019-07-24 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.0474332287848
2019-07-25 cbc267f7-6ace-4357-a803-7aaf96a2cc31 50.0096342626235
2019-07-26 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.312332423432343
2019-07-27 cbc267f7-6ace-4357-a803-7aaf96a2cc48 23.424234234234
2019-07-28 cbc267f7-6ace-4357-a803-7aaf96a77777 15.33333332332323
2019-07-29 cbc267f7-6ace-4357-a803-7aaf96a2cc48 66.3333333333333
2019-07-30 cbc267f7-6ace-4357-a803-7aaf96a2cc4f 77.322332323223
2019-07-31 cbc267f7-6ace-4357-a803-7aaf96a2cc4s 50
I would like to get the average factor per instrument per month and the factor of just the last day of the month..can you help me in designing the query?
YEAR MONTH END_OF_MONTH_DAY INSTRUMENT_NAME AVERAGE_FACTOR_OVER_THE_MONTH END_OF_THE_MONTH_FACTOR
2019 7 31-7 cbc267f7-6ace-4357-a803-7aaf96a2cc48 50.11 50
2019 8 31-8 cbc267f7-6ace-4357-a803-7aaf96a2cc48 33 56
You can use conditional aggregation:
select year(tv_date), month(tv_date), max(tv_date),
instrument_name,
avg(factor),
max(case when seqnum = 1 then factor end)
from (select t.*,
row_number() over (partition by year(tv_date), month(tv_date) order by tv_date desc) as seqnum
from t
) t
group by year(tv_date), month(tv_date), instrument_name;

How to use aggregate functions in my criteria in SQL Server?

I have table called VoucherEntry
These are my records,
ID VoucherOnlineID TransactionNumber Store Amount
-------------------------------------------------------------
120 137 26 1001 100
126 137 22 2000 -56
128 137 30 3000 -20
133 137 11 2000 -5
Now I want to add 2 columns which is having carry amount and Balance amount. If the VoucherEntry.Amount = 100 Then carry column should be 0, other wise it should display like below
Expecting output
ID VoucherOnlineID TransactionNumber Store Carry Amount Balance
---------------------------------------------------------------------------------
120 137 26 1001 0 100 100
126 137 22 2000 100 -56 44
128 137 30 3000 44 -20 24
133 137 11 2000 24 -5 19
Update
we can sort the record By ID column or Date column, after you sort the records will display in above order
You need two variations of a Cumulative Sum:
SELECT
VoucherOnlineID
,TransactionNumber
,Store
,Coalesce(Sum(Amount) -- Cumulative Sum of previous rows
Over (PARTITION BY VoucherOnlineID
ORDER BY DATE -- or whatever determines correct order
ROWS BETWEEN Unbounded Preceding AND 1 Preceding), 0) AS Carry
,Amount
,Sum(Amount) -- Cumulative Sum including current row
Over (PARTITION BY VoucherOnlineID
ORDER BY DATE -- or whatever determines correct order
ROWS Unbounded Preceding) AS Balance
FROM VoucherEntry
sql Server 2008 and below
declare #t table(ID int,VoucherOnlineID int,TransactionNumber int,Store int,Amount int)
insert into #t VALUES
(120,137,26,1001,100)
,(126,137,22,2000,-56)
,(128,137,30,3000,-20)
,(133,137,11,2000,-5 )
select *
,isnull((Select sum(Amount) from #t t1
where t1.VoucherOnlineID=t.VoucherOnlineID
and t1.id<t.id ) ,0)Carry
,isnull((Select sum(Amount) from #t t1
where t1.VoucherOnlineID=t.VoucherOnlineID
and t1.id<=t.id ) ,0)Balance
from #t t

add column based on a column value in one row

I've this table with the following data
user Date Dist Start
1 2014-09-03 150 12500
1 2014-09-04 220 null
1 2014-09-05 100 null
2 2014-09-03 290 18000
2 2014-09-04 90 null
2 2014-09-05 170 null
Based on the value in Start Column i need to add another column and repeat the value if not null for the same user
The resultant table should be as below
user Date Dist Start StartR
1 2014-09-03 150 12500 12500
1 2014-09-04 220 null 12500
1 2014-09-05 100 null 12500
2 2014-09-03 290 18000 18000
2 2014-09-04 90 null 18000
2 2014-09-05 170 null 18000
Can someone please help me out with this query? because i don't have any idea how can i do it
For the data you have, you can use a window function:
select t.*, min(t.start) over (partition by user) as StartR
from table t
You can readily update using the same idea:
with toupdate as (
select t.*, min(t.start) over (partition by user) as new_StartR
from table t
)
update toupdate
set StartR = new_StartR;
Note: this works for the data in the question and how you have phrased the question. It would not work if there were multiple Start values for a given user, or if there were NULL values that you wanted to keep before the first non-NULL Start value.
You can use COALESCE/ISNULL and a correlated sub-query:
SELECT [user], [Date], [Dist], [Start],
StartR = ISNULL([Start], (SELECT MIN([Start])
FROM dbo.TableName t2
WHERE t.[User] = t2.[User]
AND t2.[Start] IS NOT NULL))
FROM dbo.TableName t
I have used MIN([Start]) since you haven't said what should happen if there are multiple Start values for one user that are not NULL.