Query to find field duplicate between 9 tables - sql

I have 9 tables each with value like
Level_1_tab
Code Name
ae1 hdgdgd
ae2 dhdh
level_2_tab
code Name
2 jfjfjf
3 fkfjfjf
similarly level_3_tab , level_4_tab, level_5 table so on and so forth till level_9_tab.
I am inserting the code column into a new table and checking for duplicates.
SELECT
code, name, COUNT(*)
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1;
Can i write a query and compare the code column of these 9 tables and check for duplicates ? that all the rows with duplicate code values should be retrieved

You can do a union all of the 9 tables and run your same query on that.
select code, name, count(*) from
(select code, name from table 1 union all
select code, name from table 2 union all
select code, name from table 3 union all
select code, name from table 4 union all
.....)
group by code, name
having count(*) > 1;

Everyone's suggestion of union all is great to build your initial table to look for duplicates in. But you say you already have a temp table with all of the values from the 9 tables which is perfect and another great way of doing it if your dataset isn't huge.
The only step your missing from your description to get the actually duplicate rows is to use your duplicate query above to re-query your temp table and return the rows you want. A great way of doing this is through common table expressions which basically allows you to build a query on top of your other query without another temp table. So use a cte and join back to your temp table.
;WITH CommonTableExpression AS (
SELECT
code, name, COUNT(*)
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1;
)
SELECT t.*
FROM
new_table t
INNER JOIN CommonTableExpression c
ON t.code = c.code
AND t.name = c.name
If you want to do it to each of the 9 tables independently rather than to your temp table. Place the duplicates into another temp table and join on it.
SELECT
code, name, COUNT(*)
INTO #Duplicates
FROM
new_table
GROUP BY
code, name
HAVING
COUNT(*) > 1
SELECT
l.*
FROM
leve_1_tab l
INNER JOIN #Duplicates d
ON l.Code = d.Code
AND l.name = d.name
Seeing everyone loves union all here is a way to do it with out temp tables and lots of union all s I wonder which would be a more optimized query though.
;WITH cteAllCodeValues AS (
select code, name from table 1 union all
select code, name from table 2 union all
select code, name from table 3 union all
select code, name from table 4 union all
--.....)
)
, cteDuplicates AS (
SELECT code, name, RecordCount = COUNT(*)
FROM
cteAllCodeValues
GROUP BY
code, name
)
SELECT c.*
FROM
cteDuplicates d
INNER JOIN cteAllCodeValues c
ON d.code = c.code
AND d.name = c.name

Related

Compare the results of a ROW COUNT

I have 2 databases in the same server and I need to compare the registers on each one, since one of the databases is not importing all the information
I was trying to do a ROW count but it's not working
Currently I am doing packages of 100,000 rows approximate, and lookup at them in Excel.
Let's say I want a query that does a count for each ID in TABLE A and then compares the count result VS TABLE B count for each ID, since they are the same ID the count should be the same, and I want that brings me the ID on which there where any mismatch between counts.
--this table will contain the count of occurences of each ID in tableA
declare #TableA_Results table(
ID bigint,
Total bigint
)
insert into #TableA_Results
select ID,count(*) from database1.TableA
group by ID
--this table will contain the count of occurences of each ID in tableB
declare #TableB_Results table(
ID bigint,
Total bigint
);
insert into #TableB_Results
select ID,count(*) from database2.TableB
group by ID
--this table will contain the IDs that doesn't have the same amount in both tables
declare #Discordances table(
ID bigint,
TotalA bigint,
TotalB bigint
)
insert into #Discordances
select TA.ID,TA.Total,TB.Total
from #TableA_Results TA
inner join #TableB_Results TB on TA.ID=TB.ID and TA.Total!=TB.Total
--the final output
select * from #Discordances
The question is vague, but maybe this SQL Code might help nudge you in the right direction.
It grabs the IDs and Counts of each ID from database one, the IDs and counts of IDs from database two, and compares them, listing out all the rows where the counts are DIFFERENT.
WITH DB1Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseOne.dbo.TableOne
GROUP BY ID
), DB2Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseTwo.dbo.TableTwo
GROUP BY ID
)
SELECT a.ID, a.CountOfIDs AS DBOneCount, b.CountOfIDs AS DBTwoCount
FROM DB1Counts a
INNER JOIN DB2Counts b ON a.ID = b.ID
WHERE a.CountOfIDs <> b.CountOfIDs
This SQL selects from the specific IDs using the "Database.Schema.Table" notation. So replace "DatabaseOne" and "DatabaseTwo" with the names of your two databases. And of course replace TableOne and TableTwo with the names of your tables (I'm assuming they're the same). This sets up two selects, one for each database, that groups by ID to get the count of each ID. It then joins these two selects on ID, and returns all rows where the counts are different.
You could full outer join two aggregate queries and pull out ids that are either missing in one table, or for which the record count is different:
select coalesce(ta.id, tb.id), ta.cnt, tb.cnt
from
(select id, count(*) cnt from tableA) ta
full outer join (select id, count(*) cnt from tableB) tb
on ta.id = tb.id
where
coalesce(ta.cnt, -1) <> coalesce(tb.cnt, -1)
You seem to want aggregation and a full join:
select coalesce(a.id, b.id) as id, a.cnt, b.cnt
from (select id, count(*) as cnt
from a
group by id
) a full join
(select id, count(*) as cnt
from b
group by id
) b
on a.id = b.id
where coalesce(a.cnt, 0) <> coalesce(b.cnt, 0);

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Logical AND between table elements in T-SQL

I have n tables all with the same fields: Username and Value. The same Username can have multiple registers on each table but the combination Username/Value is unique on each one.
I want to join the tables into a single one which contains all the users who appear on all the tables with all the different (Username/Value) pairs.
Example
Table A: {(User1,Value1);(User1,Value2);(User2,Value2);(User3,Value4)]
Table B: {(User1,Value4);(User3,Value5)]
Table C: {(User1,Value5);(User1,Value2);(User2,Value7);(User3,Value8)]
Desired output
Table D: {(User1,Value1);(User1,Value2);(User1,Value4);(User1,Value5);(User3,Value4);(User3,Value5);(User3,Value8)}
Now I'm doing multiple joins (using perl) like this
SELECT *
INTO $target_table
FROM (SELECT *
FROM $table1
WHERE bname IN (SELECT DISTINCT bname FROM $table2)
UNION
SELECT *
FROM $table2
WHERE bname IN (SELECT DISTINCT bname FROM $table1)
) UN
and then doing the same join between a third table and target_table and so on, but I think it should be a better way.
Any hints?
You can use UNION for this:
SELECT username, value
FROM $table1
UNION
SELECT username, value
FROM $table2
...
SELECT username, value
FROM $tablex
SQL Fiddle Demo
This will return you distinct records. If you are interested in duplicates, use UNION ALL.
Given your edits, it appears you only want to return records if the user is in all the tables.
Breaking that down, you need to do a few things. First, combine all your records together again, but this time denote which table each are coming from. Then you need to know the count of tables each user is in. Finally you need to check that number against the overall number of tables.
Here's one way using a few CTEs:
WITH CTE AS (
SELECT username, value, 1 AS tbl
FROM t1
UNION
SELECT username, value, 2 AS tbl
FROM t2
UNION
SELECT username, value, 3 AS tbl
FROM t3
),
CTECnt AS (
SELECT username, COUNT(DISTINCT tbl) tblCnt
FROM CTE
GROUP BY username
),
CTEMaxCnt AS (
SELECT COUNT(DISTINCT tbl) MaxCnt
FROM CTE
)
SELECT C.username, C.value
FROM CTE C
JOIN CTECnt C2 ON C.username = C2.username
JOIN CTEMaxCnt C3 ON C2.tblCnt = C3.MaxCnt
Another SQL Fiddle Demo
With Combined As
(
Select 'A' As TableName, Username, Value
From TableA
Union All
Select 'B', Username, Value
From TableB
Union All
Select 'C', Username, Value
From TableC
)
Select C.Username, C.Value
From Combined As C
Join (
Select C1.Username
From Combined As C1
Group By C1.Username
Having Count(Distinct C1.TableName) = 3
) As Z
On Z.Username = C.Username
Group By C.Username, C.Value
SQL Fiddle version

How to get same results without using distinct in query

I have a table with data like so:
[ID, Name]
1, Bob
1, Joe
1, Joe
1, Bob
I want to retrieve a list of records showing the relationship between the records with the same ID.
For instance, I want the following result set from my query:
Bob, Joe
Joe, Bob
Bob, Bob
Joe, Joe
This shows me the "from" and "to" for every item in the table.
I can get this result by using the following query:
SELECT DISTINCT [NAME]
FROM TABLE A
INNER JOIN TABLE B ON A.ID = B.ID
Is there anyway for me to achieve the same result set without the use of the "distinct" in the select statement? If I don't include the distinct, I get back 16 records, not 4.
The reason you get duplicate rows without DISTINCT is because every row of ID = x will be joined with every other row with ID = x. Since the original table has (1, "Bob") twice, both of those will be joined to every row in the other table with ID = 1.
Removing duplicates before doing a join will do two things: decrease the time to run the query, and prevent duplicate rows from showing up in the result.
Something like (using MySQL version of SQL):
SELECT L.NAME, R.NAME
FROM (SELECT DISTINCT ID, NAME FROM A) AS L
INNER JOIN (SELECT DISTINCT ID, NAME FROM B) AS R
ON L.ID = R.ID
Edit: is B an alias for table A?
In SQL and MY SQL
SELECT COLUMN_NAME FROM TABLE_NAME group by COLUMN_NAME
Have you tried using a group by clause?
select name
from table a
inner join table b
on a.id=b.id
group by name
That should get you the same thing as your distinct query above. As for the result set that you want, a simple self join should do it:
select name1,name2
from(
select id,name as name1
from table
group by 1,2
)a
join(
select id,name as name2
from table
group by 1,2
)b
using(id)
Eliminating duplicate values with union without using distinct
Declare #TableWithDuplicateValue Table(Name Varchar(255))
Insert Into #TableWithDuplicateValue Values('Cat'),('Dog'),('Cat'),('Dog'),('Lion')
Select Name From #TableWithDuplicateValue
union
select null where 1=0
Go
Output
---------
Cat
Dog
Lion
For more alternate kindly visit my blog
http://www.w3hattrick.com/2016/05/getting-distinct-rows-or-value-using.html