SQL Query - ID Join, Just Duplicates - sql

I am working with Oracle SQL. I have two tables. One has ItemID and DatePurchased and the other has ItemID, CustomerID. I'm trying to join the tables so that I can see only those customers with multiple items.
In other words, if I had:
TABLE 1
ItemID---DatePurchased
1 MAR15
2 JUN10
3 APR02
and
TABLE 2
ItemID---CustomerID
1 1
2 1
3 2
I would want this returned:
TABLE 3
ItemID--DatePurchased--CustomerID
1 MAR15 1
2 JUN10 1
(Customer 2 is left out because he only has one item (ItemID=3)).
Any ideas on how to do this in SQL?

select ItemID, DatePurchased, CustomerID
from
(
select
T1.ItemID, T1.DatePurchased, T2.CustomerID,
count(*) over (partition by T2.CustomerId) as ItemCnt
from TABLE2 T2
join TABLE1 T1 on T1.ItemID = T2.ItemID
) dt
where ItemCnt > 1

select
T2.ItemID, T2.CustomerID, T1.DatePurchased
from TABLE2 as T2
inner join TABLE1 as T1 on T1.ItemID = T2.ItemID
where
T2.CustomerID in
(
select TT.CustomerID
from TABLE2 as TT
group by TT.CustomerID
having count(*) > 1
)

Related

How to randomly update a column if the number of records between the 2 join tables are not equal in Postgres sql

Table 1(5 records):
id
name
date
units
1
abc
3/16/2021
1
abc
3/17/2021
1
abc
3/18/2021
1
abc
3/19/2021
1
abc
3/20/2021
Table 2(3 records):
id
name
startdate
enddate
units
1
abc
3/16/2021
03/23/2021
2
1
abc
3/16/2021
03/23/2021
2
1
abc
3/16/2021
03/23/2021
2
Below is the join condition:
select * from Table1 a right join Table2 b on
(a.id = b.id) and (a.name = b.name) and (a.date between b.startdate and b.enddate)
I am trying to update the units columns in Table 1 from Table 2. My requirement is since there are 3 records in Table 2, only 3 records in Table 1 should be updated based on the above join condition. It can be random. But the number of records updated should not go above 3.
I tried doing this.
with e as
(select *,
row_number() over(partition by a.id
order by id) as rn
from Table1 a right join Table 2 b on (a.id = b.id) and (a.name = b.name) and (a.date between b.startdate and b.enddate)
)
update table1
set units = e.units
from e
where e.rn = 1
However, in this case all 5 records get updated. How do I resolve this? Any help is appreciated. Thank you.
Join the tables together. Then choose one row from from table2 for each row in table1 and do the update:
update table1 t1
set t1.units = t2.units
from (select distinct on (t1.id, t1.name, t1.date) t1.*, t2.units
from table1 t1 join
table2 t2
on t2.id = t1.id and t2.name = t1.name and
t1.date between t2.startdate and t2.enddate
order by t1.id, t1.name, t1.date, random())
) tt1
where tt1.name = t1.name and
tt1.id = t1.id and
tt1.date = t1.date;

Merge 2 tables without repeat data

I need to merge the below tables.
Table 1
UserID |TopicName
1 |Topic1
1 |Topic2
2 |Topic1
2 |Topic2
2 |Topic3
Table2
UserID |Levelname
1 |level1
1 |level2
1 |level3
1 |level4
2 |level1
Output
UserID |TopicName|LevelName
1 |Topic1 |Level1
1 |Topic2 |Level2
1 | |Level3
1 | |Level4
TopicName|LevelName
1 |Topic1 |Level1
1 |Topic2 |Level2
1 | |Level3
1 | |Level4
It looks like you want to match rows that have the same userid, based on their position. You could enumerate the rows in a subquery, then left join.
For this to work consistently, you need a column that defines the position of each row in each table - since nothing is showing in that regard in your data, I used the other column table - but you might want to change that.
select t2.userid, t1.topicname, t2.levelname
from (
select t2.*, row_number() over(partition by userid order by topicname) rn
from table2 t2
) t2
left join (
select t1.*, row_number() over(partition by userid order by levelname) rn
from table2 t1
) t1 on t1.userid = t2.userid and t1.rn = t2.rn
You can add a where clause at the end of the query to filter on a given userid, as showned in your expected results:
where t2.userid = 1
If there might be "missing" rows in both tables, then a full join is a better pick:
select coalesce(t1.userid, t2.userid) userid, t1.topicname, t2.levelname
from (
select t2.*, row_number() over(partition by userid order by topicname) rn
from table2 t2
) t2
full join (
select t1.*, row_number() over(partition by userid order by levelname) rn
from table2 t1
) t1 on t1.userid = t2.userid and t1.rn = t2.rn

comparing sum of 2 column values in one table with another column value in second table sql server

I have two tables...
table1 ( id, item, price ) values:
id | item | price
-------------
10 | book | 20
20 | copy | 30
30 | pen | 10
....table2 ( id, item, price) values:
id | item | price
-------------
10 | book | 20
10 | book | 30
now i if do not have a record with id-10,item-book and price-(20+30) in table 1 then i want to insert that row with sum(20+30) in a new table ...
If I haven't misunderstood your requirements, this should do it:
SELECT T2.ID, T2.ITEM,T2.SUMPRICE FROM
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table1 GROUP BY ID, ITEM) AS T1
INNER JOIN
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table2 GROUP BY ID, ITEM) AS T2
ON T1.ID = T2.id AND T1.item = T2.item WHERE T1.SUMPRICE <> T2.SUMPRICE
If your 3rd table is already created you could just use an INSERT INTO SELECT statement. Otherwise you could use a SELECT INTO like this:
SELECT T2.ID, T2.ITEM,T2.SUMPRICE as price into table3 FROM
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table1 GROUP BY ID, ITEM) AS
T1
INNER JOIN
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table2 GROUP BY ID, ITEM) AS
T2
ON T1.ID = T2.id AND T1.item = T2.item WHERE T1.SUMPRICE <> T2.SUMPRICE
Hope it helps!
EDIT 1
In the case that you want to get all the unmatched rows, that is:
Rows from table 2 where the id, item or price don't match simultenously with a given row in table 1
Rows from table 2 where none of their columns match simultenously with a given row in table 1
You could use the EXCEPT statement, for example:
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table2 GROUP BY ID, ITEM)
EXCEPT
(SELECT ID, ITEM, SUM(PRICE) AS SUMPRICE FROM table1 GROUP BY ID, ITEM)
This returns:
ID ITEM SUMPRICE
---- -------------------- -----------
10 book 50
Try the following
SELECT T2.id,T2.item,T2.Price as Table2_Price,T1.Price as Table1_Price FROM (
SELECT id,Item,Sum(Price) as Price
FROM Table2
Group BY id,Item ) AS T2 LEFT JOIN Table1 AS T1
ON T1.Item = T2.Item and T1.ID = T2.ID
After reading your comments when sum of these two not equals the one in another table it should return the sum from table 2 , you are asking for the following logic
SELECT T2.id,T2.Item, CASE WHEN T1.Price IS NULL THEN T2.Price
WHEN T2.Price <> T1.Price THEN T2.Price
ELSE T1.Price END as Price FROM (
SELECT id,Item,Sum(Price) as Price
FROM Table2
Group BY id,Item ) AS T2 LEFT JOIN Table1 AS T1
ON T1.Item = T2.Item and T1.ID = T2.ID
But this logic isn't useful, because if Sum(Table2_price) <> Table1_Price you want to select Sum(Table2_price) else when Sum(Table2_price) = Table1_Price you want Table1_Price !! So you want to always choose Sum(Table2_Price)

Waterfall join conditions

I have two tables similar to:
Table 1 --unique ID's
ID Date
1 3/8/2017
2 3/8/2017
3 3/8/2017
Table 2
ID Date SourceID
1 3/8/2017 1
1 3/8/2017 2
1 3/8/2017 3
2 3/8/2017 2
3 3/8/2017 1
3 3/8/2017 3
And I want to write a query that has a result like:
Result
ID SourceID
1 2
2 2
3 1
Where the source ID ordering should be 2, 1, 3
I have:
select Table1.ID
, COALESCE(Join1.SourceID, Join2.SourceID, Join3.SourceID) as SourceID
from Table1
left outer join Table2 Join1
on Table1.date = Join1.date
and Table1.ID = Join1.ID
and Join1.SourceID = 2
left outer join Table2 Join2
on Table1.date = Join2.date
and Table1.ID = Join2.ID
and Join2.SourceID = 1
and Join1.SourceID is null
left outer join Table2 Join3
on Table1.date = Join3.date
and Table1.ID = Join3.ID
and Join3.SourceID = 3
and Join1.SourceID is null
and Join2.SourceID is null
But this currently just keeps the records where sourceid = 2 and does not add in the other sourceid's.
Thanks in advance for any help. Let me know if you need any clarification. Using SQL-Server. I only need a few and fixed amount of sources so I am avoiding using a cursor.
This is a prioritization query. I would do it using outer apply:
select t1.*, t2.sourceId
from table1 t1 outer apply
(select top 1 t2.*
from table2 t2
where t2.id = t1.id and t2.date = t1.date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) t2;
Note: For readability, you can simplify the order by to:
order by charindex(cast(t2.sourceId as varchar(255)), '2,1,3')
If you are uncomfortable with outer apply, you can do the same thing with a single join:
select t1.*, t2.sourceId
from table1 t1 join
(select t2.*,
row_number() over (partition by id, date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) as seqnum
from table2 t2
) t2
on t2.id = t1.id and t2.date = t1.date and t2.seqnum = 1;

SQL: 3 self-joins and then join them together

I have 2 tables to join in a specific way. I think my query is right, but not sure.
select t1.userID, t3.Answer Unit, t5.Answer Demo
FROM
table1 t1
inner join (select * from table2) t3 ON t1.userID = t3.userID
inner join (select * from table2) t5 ON t1.userID = t5.userID
where
NOT EXISTS (SELECT * FROM table1 t2 WHERE t2.userID = t1.userID AND t2.date > t1.date)
and NOT EXISTS (SELECT * FROM table2 t4 WHERE t4.userID = t3.userID and t4.counter > t3.counter)
and NOT EXISTS (SELECT * FROM table2 t6 WHERE t6.userID = t5.userID and t6.counter > t5.counter)
and t1.date_submitted >'1/1/2009'
and t3.question = Unit
and t5.question = Demo
order by
t1.userID
From table1 I want distinct userID where date > 1/1/2009
table1
userID Date
1 1/2/2009
1 1/2/2009
2 1/2/2009
3 1/2/2009
4 1/1/2008
So The result I want from table1 should be this:
userID
1
2
3
I then want to join this on userID with table2, which looks like this:
table2
userID question answer counter
1 Unit A 1
1 Demo x 1
1 Prod 100 1
2 Unit B 1
2 Demo Y 1
3 Prod 100 1
4 Unit A 1
1 Unit B 2
1 Demo x 2
1 Prod 100 2
2 Unit B 2
2 Demo Z 2
3 Prod 100 2
4 Unit A 2
I want to join table1 with table2 with this result:
userID Unit Demo
1 B X
2 B Z
In other words,
select distinct userID from table2 where question = Unit for the highest counter
and then
select distinct userID from table2 where question = Demo for the highest counter.
I think what I've done is 3 self-joins then joined those 3 together.
Do you think it's right?
SELECT du.userID, unit.answer, demo.answer
FROM (
SELECT DISTINCT userID
FROM table1
WHERE date > '1/1/2009'
) du
LEFT JOIN
table2 unit
ON (userID, question, counter) IN
(
SELECT du.userID, 'Unit', MAX(counter)
FROM table2 td
WHERE userID = du.userID
AND question = 'Unit'
)
LEFT JOIN
table2 demo
ON (userID, question, counter) IN
(
SELECT du.userID, 'Demo', MAX(counter)
FROM table2 td
WHERE userID = du.userID
AND question = 'Demo'
)
Having an index on table2 (userID, question, counter) will greatly improve this query.
Since you mentioned SQL Server 2005, the following will be easier and more efficient:
SELECT du.userID,
(
SELECT TOP 1 answer
FROM table2 ti
WHERE ti.user = du.userID
AND ti.question = 'Unit'
ORDER BY
counter DESC
) AS unit_answer,
(
SELECT TOP 1 answer
FROM table2 ti
WHERE ti.user = du.userID
AND ti.question = 'Demo'
ORDER BY
counter DESC
) AS demo_answer
FROM (
SELECT DISTINCT userID
WHERE date > '1/1/2009'
FROM table1
) du
To aggregate:
SELECT answer, COUNT(*)
FROM (
SELECT DISTINCT userID
FROM table1
WHERE date > '1/1/2009'
) du
JOIN table2 t2
ON t2.userID = du.userID
AND t2.question = 'Unit'
GROUP BY
answer