Remove rows having swapped columns in Hive - hive

I have table with 10 columns but we need 3 columns to be considered
Id from_value to_value
1234 ABC CDR
1234 CDR ABC
3456 XYZ PQR
3456 PQR XYZ
OUTPUT should be:
Id from_value to_value
1234 ABC CDR
3456 XYZ PQR

SELECT id,from_value,to_value
FROM (
SELECT id,from_value,to_value,row_number() over (partition by id order by timecol desc) as row_num
from table) table
WHERE row_num = 1
U need to use windowing and row num to achieve this.
U may relace your time colums

Related

find out the records - first two date records of each member of every year- db2udb

Need Help -
Scenario - I have a table testdata with columns
memberid (varchar) ,codetype(varchar),effectivedate(datetime)
this table having 20k records - from year 2015 to 2021
I need to find out the records - first two date records of each member of every year [ only memberid is unique)
eg.
member id
codetype
effectivedate
123
ABC
1/2/2015
123
ABC
1/2/2015
123
ABC
8/15/2015
123
EFG
9/15/2015
123
EFG
2/15/2018
345
EFG
3/14/2018
345
EFG
3/17/2018
345
ABC
9/19/2020
456
EFG
12/20/2021
result should be like below
member id
codetype
effectivedate
123
ABC
1/2/2015
123
ABC
1/2/2015
123
ABC
2/15/2018
345
EFG
3/14/2018
345
EFG
3/17/2018
345
ABC
9/19/2020
456
EFG
12/20/2021
tried lot of ways but no luck so far
Try this
with u as
(select memberid, codetype, effectivedate,
row_number() over(partition by memberid, year(effectivedate) order by
memberid) as rownum
from testdata)
(select memberid, codetype, effectivedate from u
where rownum <= 2)
Basically you get the row numbers of every record partitioned by memberid and the year of the record, then keep only the records where rownum is 1 or 2.

How to join tables in sql to exclude rows already matched from further consideration

I have 2 tables as shown below. I am trying to join table 1 to table 2 on order_code such that once a row matches, the corresponding id from either table should not show up again on the joined table. The match should happen in ascending order of date from both tables. Expected results are shown as well:
Table 1:
PK1
order_code
Date1
1
XYZ
1/1/2021
2
ABC
1/2/2021
3
ABC
1/3/2021
4
XYZ
1/4/2021
Table 2:
PK2
order_code
Date2
1
ABC
2/7/2021
2
XYZ
2/6/2021
3
ABC
2/5/2021
4
XYZ
2/8/2021
5
ABC
2/11/2021
6
XYZ
2/14/2021
Expected result:
PK1
order_code
Date1
PK2
order_code
Date2
1
XYZ
1/1/2021
2
XYZ
2/6/2021
2
ABC
1/2/2021
3
ABC
2/5/2021
3
ABC
1/3/2021
1
ABC
2/7/2021
4
XYZ
1/4/2021
4
XYZ
2/8/2021
Please let me know if more clarity is needed and I can try explaining this better. All help is appreciated!
Join by order_code and a row position within an order_code.
select t1.PK PK1, t1.order_code, t1.Date Date1, t2.PK PK2, t2.order_code, t2.Date Date2
from (
select *, row_number() over(partition by order_code order by Date) rn
from table1
) t1
join (
select *, row_number() over(partition by order_code order by Date) rn
from table2
) t2 on t1.order_code = t2.order_code and t1.rn = t2.rn);

SQL Access -- Keep record only with most recent timestamp

I have a table that appears as follows:
Time Name Cust_ID Num_Calls Num_Orders
12.00 ABC 100 20 10
12.25 PQR 102 23 12
12.30 ABC 100 26 15
01.00 ABC 100 26 18
02.00 PQR 102 23 14
04.00 PQR 102 25 20
How do I delete the earlier records for each "Name & Cust_ID" and keep the most recent one. The other fields in the record may change as I run them through my Access Database, but Name and ID remains the same.
My output at the End of the Day should be:
Time Name Cust_ID Num_Calls Num_Orders
01.00 ABC 100 26 18
04.00 PQR 102 25 20
I think if your cust_id is unique, you should make it a primary key in the table.
Then whenever you have a new entry, first check and see if the current cust_id already exists.
If yes, update that entry in the table.
Else do an insert.
Try this...
This should give you your most recent records based on max(time), you could delete the complement of this set.
SELECT * FROM YOUR_TABLE A
INNER JOIN
( SELECT MAX(time) MAX_time
, NAME , CUST_ID
FROM YOUR_TABLE
GROUP BY
NAME , CUST_ID )B
ON A.NAME=B.Name
and A.CUST_ID=B.CUst_ID
and A.time =B.max_time
So you would delete the following records
DELETE FROM YOUR_TABLE
WHERE EXISTS
(SELECT * FROM YOUR_TABLE B
WHERE TIME <>( SELECT MAX(time) FROM YOUR_TABLE C WHERE B.NAME=C.Name
and C.CUST_ID=B.CUst_ID )
AND A.NAME=B.Name
and A.CUST_ID=B.CUst_ID)

How to bring together multiple delta tables?

I have a table with IDs and primary information. I also have two delta tables keyed on ID and date of change. I need to build a view that merges these three tables together indicating all changes over time.
Main Table:
ID Name
-- ------------------
1 Bob Jones
2 Dave Smith
First Attribute Table:
ID Date Attr1
-- ---------- -----
1 01/01/2013 25
1 02/15/2013 33
1 02/17/2013 47
1 03/02/2013 58
2 02/01/2013 1
...
Second Attribute Table
ID Date Attr2
-- ---------- -----
1 01/01/2013 ABC
1 01/05/2013 DEF
1 01/15/2013 RST
1 02/10/2013 XYZ
1 02/15/2013 Foo
1 03/05/2013 Blah
2 02/01/2013 Two
...
Based on that data, for Bob Jones, I need the view to return the following:
ID Name Date Attr1 Attr2
-- ----------- ---------- ----- -----
1 Bob Jones 01/01/2013 25 ABC
1 Bob Jones 01/05/2013 25 DEF
1 Bob Jones 01/15/2013 25 RST
1 Bob Jones 02/10/2013 25 XYZ
1 Bob Jones 02/15/2013 33 Foo
1 Bob Jones 02/17/2013 47 Foo
1 Bob Jones 03/02/2013 58 Foo
1 Bob Jones 03/05/2013 58 Blah
I tried outer joining the attribute tables to get all change values ordered by date and then used an outer join on the entire query with itself to get "prior" records:
with qry as (
select
rownum = ROW_NUMBER() OVER (ORDER BY m.ID, a.DATE),
m.ID,
m.Name,
a.DATE,
a.Attr1,
a.Attr2
from Main m
inner join (
select
COALESCE(a1.ID, a2.ID) as ID,
COALESCE(a1.LOAD_DATE, a2.LOAD_DATE) as LOAD_DATE,
a1.Attr1,
a2.Attr2
from Attributes1 a1
full outer join Attributes2 a2
on (a1.ID = a2.ID and a1.DATE = a2.DATE)
) a on (a.ID = m.ID)
)
select
COALESCE(qry.ID, prev.ID) as ID,
COALESCE(qry.Name, prev.Name) as Name,
COALESCE(qry.DATE, prev.DATE) as DATE,
COALESCE(qry.Attr1, prev.Attr1) as Attr1,
COALESCE(qry.Attr2, prev.Attr2) as Attr2,
from qry
left join qry prev
on (prev.rownum = qry.rownum - 1)
order by ID, DATE
However, that doesn't work when one attribute table changes quicker than the other because the attributes that didn't change are null in the results of the attribute table join and if two nulls show up back-to-back, the coalesce will return a null when I need the last non-null value that was in that column.
Can this even be done in a view in SQL Server 2012?

Find duplicate records in two group of data in one sql server table

I have one sql server (2008) table containing group of data.
Source PersonId Date Description Code IsDup
------ -------- ----- ----------- ------ -----
EHR 1234 1/1/2012 Fever 120.12
EHR 1234 6/1/2012 Fracture 101.00
EHR 1234 11/4/2012 Hypertension 223.15
RAM 1234 1/1/2012 Fever 120.12 <-- Duplicate
RAM 1234 6/1/2012 Fracture 101.00 <-- Duplicate
RAM 1234 4/1/2012 Diabetic 601.00
TAR 1234 2/1/2012 Asthma 456.00
TAR 1234 1/1/2012 Fever 120.12 <-- Duplicate
I need to compare the data between the different groups. "EHR" being the master group, I need to check if any other group has data exactly matching that in "EHR" master group within the table. And then it should update the IsDup column with 1.
Expected Result:
Source PersonId Date Description Code IsDup
------ -------- ----- ----------- ------ -----
EHR 1234 1/1/2012 Fever 120.12
EHR 1234 6/1/2012 Fracture 101.00
EHR 1234 11/4/2012 Hypertension 223.15
RAM 1234 1/1/2012 Fever 120.12 1
RAM 1234 6/1/2012 Fracture 101.00 1
RAM 1234 4/1/2012 Diabetic 601.00
TAR 1234 2/1/2012 Asthma 456.00
TAR 1234 1/1/2012 Fever 120.12 1
I know how to check for duplicates within the table but not sure how can we do comparison keeping one group static.
I got this from one of the stackoverflow thread to identify dups but how to add grouped comparision:
with x as (select *, rn = row_number()
over(PARTITION BY [PersonId], [Date], [Description], [Code] order by [PersonId], [Date], [Description], [Code])
from Results)
select * from x
where rn > 1
You can update your table using self join :
update r1 set isDup = 1
from results r1 join results r2 on
r1.PersonId = r2.PersonId and r1.Date = r2.Date and
r1.Description = r2.Description and r1.Code = r2.Code
where r1.Source <> 'EHR' and r2.Source = 'EHR'
This should do:
UPDATE A
SET IsDup = 1
FROM YourTable A
WHERE [Source] != 'EHR'
AND EXISTS (SELECT 1 FROM YourTable
WHERE [Source] = 'EHR'
AND PersonId = A.PersonId
AND [Date] = A.[Date]
AND Description = A.Description
AND Code = A.Code)
Here is a demo for you to try.
Try this:
;With rootQuery as
(
Select SOURCE, PersonId, Date, Description, Code
From MedicalHistory
Where Source = 'EHR'
)
Update mhd
Set IsDuplicate = 1
From rootquery mh
Join MedicalHistory mhd on mh.PersonId = mhd.PersonId
Where mh.Description = mhd.Description
And mh.Code = mhd.Code
And mh.Date = mhd.Date
And mhd.Source != 'EHR'
Try this please..
update tab
set tab1.isDup=1
from table1 tab1, table1 tab2
where
tab1.PersonId=tab1.PersonId and
tab1.Date=tab2.Date and
tab1.desription=tab2.desription and
tab1.Code=tab2.Code and
tab1.Source != tab2.source