SQL query to merge 2 tables with additional conditions? - sql

I have 2 identical tables: user_id, name, age, date_added.
USER_ID column may contain multiple duplicate IDs.
Need to merge those 2 tables into 1 with the following condition.
If there are multiple records with identical 'name' for the same user then need to keep only the LATEST (by date_added) record.
This script will be used with MSSQL 2005, but would also appreciate if somebody comes up with version that does not use ROW_NUMBER(). Need this script to reload a broken table once, performance is not critical.
example:
table1:
1,'john',21,01/01/2010
1,'john',15,01/01/2005
1,'john',71,01/01/2001
table2:
1,'john',81,01/01/2007
1,'john',15,01/01/2005
1,'john',11,01/01/2008
result:
1,'john',21,01/01/2010
UPDATE:
I think that I've found my own solution. It is based on an answer for my previous question given by Larry Lustig and Joe Stefanelli.
with tmp2 as
(
SELECT * FROM table1
UNION
SELECT * FROM table2
)
SELECT * FROM tmp2 c1
WHERE (SELECT COUNT(*) FROM tmp2 c2
WHERE c2.user_id = c1.user_id AND
c2.name = c1.name AND
c2.date_added >= c1.date_added) <= 1
Could you please help me to convert this query to the one without 'WITH' clause?

Here's a variant of #Andomar's answer:
; with all_users as
(
select *
from table1 u1
union all
select *
from table2 u2
)
, ranker as (
select *,
rank() over (partition by userid order by recordtime) as [r]
)
select * from ranker where [r] = 1

Just in the interests of giving a different approach...
WITH distinctlist
As (SELECT user_id,
name
FROM table1
UNION
SELECT user_id,
name
FROM table2)
SELECT C.*
FROM distinctlist d
CROSS APPLY (SELECT TOP 1 *
FROM (SELECT TOP 1 *
FROM table1
WHERE user_id = d.user_id
AND name = d.name
ORDER BY date_added DESC
UNION ALL
SELECT TOP 1 *
FROM table1
WHERE user_id = d.user_id
AND name = d.name
ORDER BY date_added DESC) T
ORDER BY date_added DESC) C

You could use not exists, like:
; with all_users as
(
select *
from table1 u1
union all
select *
from table2 u2
)
select *
from all_users u1
where not exists
(
select *
from all_users u2
where u1.name = u2.name
and u1.record_time < u2.record_time
)
If the database doesn't support CTE's, expand all_users in the two places it is used.
P.S. If there are only three columns, and no more, you could use an even simpler solution:
select name
, MAX(record_time)
from (
select *
from table1 u1
union all
select *
from table2 u2
) sub
group by
name

Related

SQL Case Condition On Inner Join

I am currently trying to join a table to itself to check if for one email there exist two or more Ids.
I am trying to join my table with itself on its email. I then wanted to query my table with a case condition saying if the count of the email in the nested query > 1 then select the latest modified record in the outer table.
SELECT *
FROM table1 <-- outer table
WHERE email IN
(SELECT email, COUNT(*)
FROM table1 as src
INNER JOIN table1 ON src.Email = table1.Email AND src.Id = table1.id
GROUP BY src.Email)
How can I write a query to say if the count for the given email is greater than 1 then select the latest record from the outer table?
Why would you go through all that trouble? How about just selecting the last modified record:
select t1.*
from table1 t1
where t1.modified_dt = (select max(tt1.modified_dt)
from table1 tt1
where tt1.email = t1.email
);
Another way to do it using window functions:
DECLARE #Tab TABLE (ID INT, Email VARCHAR(100), LastModified DATE)
INSERT #Tab
VALUES (1,'testemail#none.com','2019-12-01'),
(2,'testemail#none.com','2019-11-19'),
(3,'otheremail#none.com','2019-12-15')
SELECT *
FROM(
SELECT ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.LastModified DESC) rn, t.*
FROM #Tab t
) t2
WHERE t2.rn = 1
If by latest you mean the latest id number (the maximum number) then this should help you
With cte AS
(
SELECT email,
COUNT(id) OVER (PARTITION BY email) AS CountOfIDs,
ROW_NUMBER() OVER (PARITION BY email ORDER BY ID DESC) AS IdIndex
FROM table1
)
SELECT *
FROM cte
WHERE CountOfIDs > 1 AND IdIndex = 1

Optimalization of select containing Union

I have and easy select:
define account_id = 7
select * from A where ACCOUNT_ID = &account_id
UNION
select * from B where ACCOUNT_ID = &account_id;
I would like to have account_id as input from another select and I did it this way:
select * from A where ACCOUNT_ID in(select accound_id from ACCOUNTS where EMAIL like 'aa#aa.com') -- id 7 returned
UNION
select * from B where ACCOUNT_ID in(select accound_id from ACCOUNTS where EMAIL like 'aa#aa.com')
How could be this optimalized to call select accound_id from ACCOUNTS where EMAIL like 'aa#aa.com' only once?
My first question is whether the union can be replaced by the union all. So, my first attempt would be to use exists and union all:
select a.*
from a
where exists (select 1
from accounts aa
where aa.account_id = a.account_id and
aa.email = 'aa#aa.com'
)
union all
select b.*
from b
where exists (select 1
from accounts aa
where aa.account_id = b.account_id and
aa.email = 'aa#aa.com'
);
For this structure, you want an index on accounts(account_id, email). The exists simply looks up the values in the index. This does require scanning a and b.
If the query is returning a handful of rows and you want to remove duplicates, then union and replace union all. If it is returning a large set of rows -- and there are not duplicates in each table and there is an easy way to identify the duplicates -- then you can instead do:
with cte_a as (
select a.*
from a
where exists (select 1
from accounts aa
where aa.account_id = a.account_id and
aa.email = 'aa#aa.com'
)
)
select cte_a.*
from ctea_a
union all
select b.*
from b
where exists (select 1
from accounts aa
where aa.account_id = b.account_id and
aa.email = 'aa#aa.com'
) and
not exists (select 1
from cte_a
where cte_a.? = b.? -- whatever is needed to identify duplicates
);
This is where WITH comes in handy
WITH ids AS (select account_id from ACCOUNTS where EMAIL like 'aa#aa.com')
select * from A where ACCOUNT_ID in ids
UNION ALL
select * from B where ACCOUNT_ID in ids;
I also changed it to UNION ALL, because it's much faster.

SQL Server - Exclude Records from other tables

I used the search function which brought me to the following solution.
Starting Point is the following: I have one table A which stores all data.
From that table I select a certain amount of records and store it in table B.
In a new statement I want to select new records from table A that do not appear in table B and store them in table c. I tried to solve this with a AND ... NOT IN statement.
But I still receive records in table C that are in table B.
Important: I can only work with select statements, each statement needs to start with select as well.
Does anybody have an idea where the problem in the following statement could be:
Select *
From
(Select TOP 10000 *
FROM [table_A]
WHERE Email like '%#domain_A%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Union
(Select TOP 7500 *
FROM table_A]
WHERE Email like '%#domain_B%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Union
(SELECT TOP 5000 *
FROM [table_A]
WHERE Email like '%#domain_C%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Try NOT EXISTS instead of NOT IN
SELECT
*
FROM TableA A
WHERE NOT EXISTS
(
SELECT 1 FROM TableB WHERE Id = A.Id
)
So Basically the idea here is to select everything from table A that doesnt exists in table B and Insert all that into Table C?
INSERT INTO Table_C
SELECT a.colum1, a.column2,......
FROM [table_A]
LEFT JOIN [table_B] ON a.id = b.ID
WHERE a.Email like '%#domain_A%' AND b.id IS NULL
Thank you guys all for your feedback, from which I learned a lot.
I was able to fix the statement with your help. Above is the statement which is working now with the desired results:
Select Id
From
(Select TOP 10000 * FROM Table_A
WHERE Email like '%#domain_a%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t1
Union
Select Id
From
(Select TOP 7500 * FROM Table_A
WHERE Email like '%#domain_b%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t2
Union
Select Id
From
(SELECT TOP 5000 * FROM Table_A
WHERE Email like '%#domain_c%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t3

Subqueries with different universes

I have an Oracle DB and I need to run a select with sub selects, however, none of them share the same table universe, therefore, I would need to do something like this:
SELECT (
SELECT COUNT(*)
FROM user_table
) AS tot_user,
(
SELECT COUNT(*)
FROM cat_table
) AS tot_cat,
(
SELECT COUNT(*)
FROM course_table
) AS tot_course
I know this is possible at other databases but I need something like this for Oracle.
Can someone help?
To make this work in oracle, add from dual to the end:
SELECT (SELECT COUNT(*)
FROM user_table
) AS tot_user,
(SELECT COUNT(*)
FROM cat_table
) AS tot_cat,
(SELECT COUNT(*)
FROM course_table
) AS tot_course
FROM dual;
A database independent way of writing the query is:
select tot_user, tot_cat, tot_course
from (SELECT COUNT(*) as tot_user
FROM user_table
) u cross join
(SELECT COUNT(*) as tot_cat
FROM cat_table
) c cross join
(SELECT COUNT(*) as tot_course
FROM course_table
) ct;

Logical AND between table elements in T-SQL

I have n tables all with the same fields: Username and Value. The same Username can have multiple registers on each table but the combination Username/Value is unique on each one.
I want to join the tables into a single one which contains all the users who appear on all the tables with all the different (Username/Value) pairs.
Example
Table A: {(User1,Value1);(User1,Value2);(User2,Value2);(User3,Value4)]
Table B: {(User1,Value4);(User3,Value5)]
Table C: {(User1,Value5);(User1,Value2);(User2,Value7);(User3,Value8)]
Desired output
Table D: {(User1,Value1);(User1,Value2);(User1,Value4);(User1,Value5);(User3,Value4);(User3,Value5);(User3,Value8)}
Now I'm doing multiple joins (using perl) like this
SELECT *
INTO $target_table
FROM (SELECT *
FROM $table1
WHERE bname IN (SELECT DISTINCT bname FROM $table2)
UNION
SELECT *
FROM $table2
WHERE bname IN (SELECT DISTINCT bname FROM $table1)
) UN
and then doing the same join between a third table and target_table and so on, but I think it should be a better way.
Any hints?
You can use UNION for this:
SELECT username, value
FROM $table1
UNION
SELECT username, value
FROM $table2
...
SELECT username, value
FROM $tablex
SQL Fiddle Demo
This will return you distinct records. If you are interested in duplicates, use UNION ALL.
Given your edits, it appears you only want to return records if the user is in all the tables.
Breaking that down, you need to do a few things. First, combine all your records together again, but this time denote which table each are coming from. Then you need to know the count of tables each user is in. Finally you need to check that number against the overall number of tables.
Here's one way using a few CTEs:
WITH CTE AS (
SELECT username, value, 1 AS tbl
FROM t1
UNION
SELECT username, value, 2 AS tbl
FROM t2
UNION
SELECT username, value, 3 AS tbl
FROM t3
),
CTECnt AS (
SELECT username, COUNT(DISTINCT tbl) tblCnt
FROM CTE
GROUP BY username
),
CTEMaxCnt AS (
SELECT COUNT(DISTINCT tbl) MaxCnt
FROM CTE
)
SELECT C.username, C.value
FROM CTE C
JOIN CTECnt C2 ON C.username = C2.username
JOIN CTEMaxCnt C3 ON C2.tblCnt = C3.MaxCnt
Another SQL Fiddle Demo
With Combined As
(
Select 'A' As TableName, Username, Value
From TableA
Union All
Select 'B', Username, Value
From TableB
Union All
Select 'C', Username, Value
From TableC
)
Select C.Username, C.Value
From Combined As C
Join (
Select C1.Username
From Combined As C1
Group By C1.Username
Having Count(Distinct C1.TableName) = 3
) As Z
On Z.Username = C.Username
Group By C.Username, C.Value
SQL Fiddle version