Oracle SQL- Fetch a particular column based on other two columns - sql

Table 1
I_ID S_id E_id
1000 1234 123
1002 1235 12
1002 1235 13
1003 3456 234
1004 1256 236
1004 1257 236
1005 1239 236
Table2
Desc SS_id EE_id
aaaa 1234 125
bbbb 1235 13
cccc 2222 234
hhhh 4444 236
jjjj 1239 236
1.First I need to match S_id of table 1 with SS_id of Table 2 and pick the corresponding Desc
2.If the count of S_id in point 1 is greater than 1 then match S_id,E_ID of table 1 with SS_id,EE_ID of Table 2 and pick the corresponding Desc
3.When S_ID of Table 1 is not present in SS_ID of Table2 then match E_id of Table 1with EE_id of Table2 and pick the corresponding Desc
4.if count of E_id in the point 3 is greater than 1 then match S_id,E_ID of table 1 with SS_id ,EE_ID of Table 2 and pick the corresponding Desc
5.Else populate null
Output
I_ID Desc
1000 aaaa
1002 bbbb
1003 cccc
1004 null
1005 jjjj
can you help me write SQl query

looking to your result seem that you need only
select table1.I_ID
INNER JOIN table2 on table1.S_id = table2.SS_id
(otherwise try to post an example more appropriate)

Related

hive - Duplicate counts check associated from one to another column

I have a table with and trying to fetch counts of distinct uniqueness from across a column by comparing to another column and the data is across millions to billions for each TMKEY partitioned column
ID TNUM TMKEY
23455 ABCD 1001
23456 ABCD 1001
23455 ABCD 1001
112233 BCDE 1001
113322 BCDE 1001
9009 DDEE 1001
9009 DDEE 1001
1009 FFGG 1001
Looking for desired output:
total_distinct_tNUM_count count_of_TNUM_which_has_more_than_disintct_ID TMKEY
4 2 1001
Here when TNUM is DDEE, the ID is fetching 9009 which has duplicates shouldn't be picked up when calculating the count of TNUM which has more than distinct ID. All I'm looking in here is get group concat counts. Any suggestions please. As I have data with more than 3 billion to 4 billions my approach is completely different and stuck.
select a.tnum,a.group_id,a.time_week from (SELECT time_week,tnum,count(*) as num_of_rows, concat_ws('|' , collect_set(id)) as group_id from source_table_test1 where time_week=1001 group by tnum,time_week) as a where length(a.group_id)>16 and num_of_rows>1

Duplicates with condition (SQL)

I would like to get the number of duplicates for article_id for each merchant_id, where the zip_code is identical. Please see example below:
Table
merchant_id article_id zip_code
1 4555 1000
1 4555 1003
1 4555 1000
1 3029 1000
2 7539 1005
2 7539 1005
2 7539 1002
2 1232 1006
3 5555 1000
3 5555 1001
3 5555 1001
3 5555 1001
Output Table
merchant_id count_duplicate zip_code
1 2 1000
2 2 1005
3 3 1001
This is the query that I am currently using but I am struggling to include the zip_code condition:
SELECT merchant_id
,duplicate_count
FROM main_table mt
JOIN(select article_id, count(*) AS duplicate_count
from main_table
group by article_id
having count(article_id) =1) mt_1
ON mt.article_id ON mt_1.article_id = mt.article_id
This seems to return what you want. I'm not sure why article_id is not included in the result set:
select merchant_id, zip_code, count(*)
from main_table
group by merchant_id, article_id, zip_code
having count(*) > 1

Select certain records and add them to a queue table

I have a table (Table1) that stores several snapshots for each account. Every day we may receive/Insert new snapshots to the accounts if values change for any of the columns val1, val2, val3, val4, val5
Table1
T1ID Account# snapshotDate val1 val2 val3 val4 val5
1 1001 1/1/2017 1111 2222 3333 4224 5551
2 1001 1/1/2018 1111 2222 3333 4444 5551
3 1001 1/1/2019 1111 2222 3333 4444 5550
4 2002 1/1/2017 123 1234 12345 123456 3434
5 2002 1/1/2018 123 1212 12345 123456 3434
6 2002 1/2/2019 333 1212 62626 252525 3434
I want to pull from Table1 the updated snapshots for these accounts every week and add them to a table/Queue (Table2) only if it is the first snapshot or if certain columns change (val2 or val5)
Table2
T2ID T1ID
01 1
02 3
03 4
04 5
T1ID 1 for account# 1001 was added because it’s the first snapshot
T1ID 2 for account# 1001 was NOT added because no change to columns (val2 or val5)
T1ID 3 for account# 1001 was added because of the change to column (val5)
T1ID 4 for account# 2002 was added because it’s the first snapshot
T1ID 5 for account# 2002 was added because of the change to column (val2)
T1ID 6 for account# 2002 was NOT added because no change to columns (val2 or val5)
Table2 will be used as a queue of changes for each account that will be sent to another process.
What is the best optimized query that I can use for this?
Use LAG() and ROW_NUMBER().
In a subquery, you can recover the last value for each of the 2 columns to compare withing account partitions, ordered by date. Then, the outer query can bring in the first record in each group, along with the records where any of the 2 related values changed.
SELECT
ROW_NUMBER() OVER(ORDER BY t1id) t2id,
t1id
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY [Account#] ORDER BY snapshotDate) rn,
LAG(val2) OVER(PARTITION BY [Account#] ORDER BY snapshotDate) lval2,
LAG(val5) OVER(PARTITION BY [Account#] ORDER BY snapshotDate) lval5
FROM mytable t
) x
WHERE
rn = 1
OR NOT (val2 = lval2)
OR NOT (val5 = lval5)
Demo on DB Fiddle:
t2id | t1id
:--- | ---:
1 | 1
2 | 3
3 | 4
4 | 5

SQL transpose data

I need to transpose data that looks like the following. I don't need to use any aggregate functions, just want to transpose columns to rows.
Current view:
Name | Code1 | Code2 | Code3 | Pct1 | Pct2 | Pct3 | Amt1 | Amt2 | Amt3
Name1 123 124 125 50 25 25 1000 1500 1555
Name2 123 124 125 50 25 25 1222 1520 1600
What I Need:
AccountName | Code# | Pct | Amt
Name1 123 50 1000
Name1 124 25 1500
Name1 125 25 1555
Name2 123 50 1222
Name2 124 25 1520
Name2 125 25 1600
if this is possible, could you also include where I would place my joins if I need to use data in a different table?
I'm using SQL Server Management Studio 2014 and I don't have the permission to create tables
This is a neat trick using table valued expression
SELECT [Name], ca.*
From myTable
CROSS APPLY (Values
(Code1, Pct1, Amt1),
(Code2, Pct2, Amt2),
(Code3, Pct3, Amt3)
) ca([Code#], [Pct], [Amt])
select
Name,
case n when 1 then Code1 when 2 then Code2 when 3 then Code3 end as Code,
case n when 1 then Pct1 when 2 then Pct2 when 3 then Pct3 end as Pct,
case n when 1 then Amt1 when 2 then Amt2 when 3 then Amt3 end as Amt
from T cross join (values (1), (2), (3)) multiplier(n)
The basic idea is to triplicate the rows and then use case to pick out the correct values.

Finding the last duplicate row in (Oracle) SQL

We have a history table which is defined as follows:
--ID (pk) -----------Object ID--------------Work ID--------date------
1 1111 AAAA 1/1/2010
2 1111 AAAA 1/2/2010
3 2222 BBBB 1/1/2010
4 3333 CCCC 1/1/2010
5 1111 DDDD 1/3/2010
We need the latest (date-based NOT id-based) row PER Work ID. Note that an object ID can have multiple work id's and we need the latest for EACH work ID.
What we need as our result set:
ID (pk) -----------Object ID--------------Work ID--------date------
2 1111 AAAA 1/2/2010
3 2222 BBBB 1/1/2010
4 3333 CCCC 1/1/2010
5 1111 DDDD 1/3/2010
Thoughts/Ideas?
SELECT *
FROM (
SELECT h.*,
ROW_NUMBER() OVER (PARTITION BY workID ORDER BY date DESC) AS rn
FROM history
)
WHERE rn = 1
select * from your_table a where (a.date, a.work_id) in (select max(b.date), b.work_id from your_table b where a.work_id=b.work_id group by work_id)