SELECT COUNT(DISTINCT [name]) from several tables - sql

I can perform the following SQL Server selection of distinct (or non-repeating names) from a column in one table like so:
SELECT COUNT(DISTINCT [Name]) FROM [MyTable]
But what if I have more than one table (all these tables contain the name field called [Name]) and I need to know the count of non-repeating names in two or more tables.
If I run something like this:
SELECT COUNT(DISTINCT [Name]) FROM [MyTable1], [MyTable2], [MyTable3]
I get an error, "Ambiguous column name 'Name'".
PS. All three tables [MyTable1], [MyTable2], [MyTable3] are a product of a previous selection.

After the clarification, use:
SELECT x.name, COUNT(x.[name])
FROM (SELECT [name]
FROM [MyTable]
UNION ALL
SELECT [name]
FROM [MyTable2]
UNION ALL
SELECT [name]
FROM [MyTable3]) x
GROUP BY x.name
If I understand correctly, use:
SELECT x.name, COUNT(DISTINCT x.[name])
FROM (SELECT [name]
FROM [MyTable]
UNION ALL
SELECT [name]
FROM [MyTable2]
UNION ALL
SELECT [name]
FROM [MyTable3]) x
GROUP BY x.name
UNION will remove duplicates; UNION ALL will not, and is faster for it.

EDIT: Had to change after seeing recent comment.
Does this give you what you want? This gives a count for each person after combining the rows from all tables.
SELECT [NAME], COUNT(*) as TheCount
FROM
(
SELECT [Name] FROM [MyTable1]
UNION ALL
SELECT [Name] FROM [MyTable2]
UNION ALL
SELECT [Name] FROM [MyTable3]
) AS [TheNames]
GROUP BY [NAME]

Here's another way:
SELECT x.name, SUM(x.cnt)
FROM ( SELECT [name], COUNT(*) AS cnt
FROM [MyTable]
GROUP BY [name]
UNION ALL
SELECT [name], COUNT(*) AS cnt
FROM [MyTable2]
GROUP BY [name]
UNION ALL
SELECT [name], COUNT(*) AS cnt
FROM [MyTable3]
GROUP BY [name]
) AS x
GROUP BY x.name

In case you have different amounts of columns per table, like:
table1 has 3 columns,
table2 has 2 columns,
table3 has 1 column
And you want to count the amount of distinct values of different column names, what it was useful to me in AthenaSQL was to use CROSS JOIN since your output would be only one row, it would be just 1 combination:
SELECT * FROM (
SELECT COUNT(DISTINCT name1) as amt_name1,
COUNT(DISTINCT name2) as amt_name2,
COUNT(DISTINCT name3) as amt_name3,
FROM table1 ) t1
CROSS JOIN
(SELECT COUNT(DISTINCT name4) as amt_name4,
COUNT(DISTINCT name5) as amt_name5,
MAX(t3.amt_name6) as amt_name6
FROM table2
CROSS JOIN
(SELECT COUNT(DISTINCT name6) as amt_name6
FROM table3) t3) t2
Would return a table with one row and their counts:
amt_name1 | amt_name2 | amt_name3 | amt_name4 | amt_name5 | amt_name6
4123 | 675 | 564 | 2346 | 18667 | 74567

Related

How to find missing data from duplicate table without using 'EXCEPT'

I used to use 'EXCEPT' to find missing data from 2 tables that should have the same data but was told not to use it anymore. I found a solution but I'm not entirely sure how it works. Could someone explain it to me or help me with another solution?
This is a basic example of my query:
SELECT MIN(C.TABLE_NAME) as TABLE_NAME,columnid,column
FROM
(
SELECT DISTINCT 'Source' as TABLE_NAME,columnid,column
FROM table1
UNION ALL
SELECT DISTINCT 'Output' as TABLE_NAME,columnid,column
FROM table2
) AS C
GROUP BY columnid,column
HAVING COUNT(*) = 1;
The output result shouldn't display any rows if the data is matching. The above code works as intended as I tested it on a table where I know the data is matching and not matching. I'm just not sure how it works. Sorry for the simple question. I'm new to this.
Edit:
I quickly made some sample data if it helps.
WITH salesman AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5011 AS id, 'Lauson Hen' AS [name]
UNION ALL
SELECT 5007 AS id, 'Paul Adam' AS [name]
) ,
salesmancopy AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5010 AS id, 'Lauson Hen' AS [name]
)
SELECT MIN(C.TABLE_NAME) as TABLE_NAME,id,[name]
FROM
(
SELECT DISTINCT 'original' as TABLE_NAME,id,[name]
FROM salesman
UNION ALL
SELECT DISTINCT 'copy' as TABLE_NAME,id,[name]
FROM salesmancopy
) AS C
GROUP BY id,[name]
HAVING COUNT(*) = 1;
If you want rows from table1 that are not in table2 then your solution will work only if table2 does not contain some unique rows. In other words, rows in table2 have to exist in table1. Another solution is to use NOT EXISTS
select *
from table1 t1
where not exists (
select 1
from table2 t2
where t1.columnid = t2.columnid and
t1.column = t2.column
)
Here you can see a comparison of different approaches to this problem where NOT EXISTS solution is prefered over LEFT JOIN + IS NULL solution.
Except is the fastest method to determinate if exists on one side.
But if you want to check both tables on single go you could use FULL OUTER JOIN
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
DROP TABLE #t1;
IF OBJECT_ID('tempdb..#t2') IS NOT NULL
DROP TABLE #t2;
SELECT *
INTO #t1
FROM (SELECT 1 AS num UNION SELECT 2 AS num UNION SELECT 3 AS num) d;
SELECT *
INTO #t2
FROM (SELECT 1 AS num UNION SELECT 2 AS num UNION SELECT 5 AS num) d;
SELECT *
FROM #t1
FULL OUTER JOIN #t2
ON #t2.num = #t1.num
WHERE #t1.num IS NULL
OR #t2.num IS NULL;
Output:
To get around the issue mentioned by Radim Bača, ie if you would need to figure out the differences between two tables, when rows dont exist in table2 but they exist in table1, you can choose the following option.
1.Create two column to indicate if the record is from original or copy
2.group by the columns you wish to compare.
3.use the clause having count(orig)<> count(copy)
use the same query as before with small changes,
WITH salesman AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5011 AS id, 'Lauson Hen' AS [name]
UNION ALL
SELECT 5007 AS id, 'Paul Adam' AS [name]
) ,
salesmancopy AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5010 AS id, 'Lauson Hen' AS [name]
)
SELECT c.id
,c.name
,count(orig) as present_in_orig
,count(copy) as present_in_copy
FROM
(
SELECT 'original' as orig
,null as copy
,id
,[name]
FROM salesman
UNION ALL
SELECT null as orig
,'copy' as copy
,id
,[name]
FROM salesmancopy
) AS C
GROUP BY id
,[name]
HAVING COUNT(copy)<> count(orig)
order by 1,2
See the following link from Stew who details this method very nicely.
https://stewashton.wordpress.com/2014/02/04/compare-and-sync-tables-tom-kyte-and-group-by/

Identify duplicates rows based on multiple columns

#SQL Experts,
I am trying to fetch duplicate records from SQL table where 1st Column and 2nd Column values are same but 3rd column values should be different.
Below is my table
ID NAME DEPT
--------------------
1 VRK CSE
1 VRK ECE
2 AME MEC
3 BMS CVL
From the above table , i am trying to fetch first 2 rows, below is the Query, suggest me why isn't give correct results.
SELECT A.ID, A.NAME, A.DEPT
FROM TBL A
INNER JOIN TBL B ON A.ID = B.ID
AND A.NAME = B.NAME
AND A.DEPT <> B.DEPT
Somehow I am not getting the expected results.
Your sample data does not make it completely clear what you want here. Assuming you want to target groups of records having duplicate first/second columns with all third column values being unique, then we may try:
SELECT ID, NAME, DEPT
FROM
(
SELECT ID, NAME, DEPT,
COUNT(*) OVER (PARTITION BY ID, NAME) cnt,
MIN(DEPT) OVER (PARTITION BY ID, NAME) min_dept,
MAX(DEPT) OVER (PARTITION BY ID, NAME) max_dept
FROM yourTable
) t
WHERE cnt > 1 AND min_dept = max_dept;
UPDATE
select *
from
(
select *,
COUNT(*) over (partition by id, [name]) cnt1,
COUNT(*) over (partition by id, [name], dept) cnt2
from dbo.T
) x
where x.cnt1 > 1 and x.cnt2 < x.cnt1;
For find duplicate column
select x.id, x.name, count(*)
from
(select distinct a.id, a.name, a.dept
from tab a) x
group by x.id, x.name
having count(*) > 1
If you want the original rows, I would just go for exists:
select t.*
from tbl t
where exists (select 1
from tbl t
where t2.id = t.id and t2.name = t.name and
t2.dept <> t.dept
);
If you just want the id/name pairs:
select t.id, t.name
from tbl t
group by t.id, t.name
having min(t.dept) <> max(t.dept);

SQL - How to Order By in UNION query

Is there a way to union two tables, but keep the rows from the first table appearing first in the result set? However orderby column is not in select query
For example:
Table 1
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Table 2
name surname
------------------
Lucky Dube
Abby Arnold
Result
Expected Result:
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Lucky Dube
Abby Arnold
I am bringing Data by following query
SELECT name,surname FROM TABLE 1 ORDER BY ID
UNION
SELECT name,surname FROM TABLE 2
The above query is not keeping track of order by after union.
P.S - I dont want to show ID in my select query
I am getting ORDER BY Column by joining tables. Following is my real query
SELECT tbl_Event_Type_Sort_Orders.Appraisal_Event_Type_ID AS Appraisal_Event_Type_ID , ISNULL(tbl_Appraisal_Event_Types.Appraisal_Event_Type_Display_Name, 'UnCategorized') AS Appraisal_Event_Type_Display_Name
INTO #temptbl
FROM tbl_Event_Type_Sort_Orders
INNER JOIN tbl_Appraisal_Event_Types
ON tbl_Event_Type_Sort_Orders.Appraisal_Event_Type_ID = tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID
WHERE 1=1
AND User_Name='abc'
ORDER BY tbl_Event_Type_Sort_Orders.Sort_Order
SELECT * FROM #temptbl
UNION
SELECT DISTINCT (tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID) AS Appraisal_Event_Type_ID , ISNULL(tbl_Appraisal_Event_Types.Appraisal_Event_Type_Display_Name, 'UnCategorized') AS Appraisal_Event_Type_Display_Name
FROM tbl_Appraisal_Event_Types
INNER JOIN tbl_Appraisal_Events
ON tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID = tbl_Appraisal_Events.Event_Type_ID
INNER JOIN tbl_Appraisals
ON tbl_Appraisal_Events.Appraisal_ID = tbl_Appraisal_Events.Appraisal_ID
WHERE 1=1
AND ((tbl_Appraisals.Assigned_To_Staff_User) = 'abc' OR (tbl_Appraisals.Assigned_To_Staff_User2) = 'abc' OR (tbl_Appraisals.Assigned_To_Staff_User3) = 'abc')
Put a UNION ALL in a derived table. To keep duplicate elimination, do select distinct and also add a NOT EXISTS to second select to avoid returning same person twice if found in both tables:
select name, surname
from
(
select distinct name, surname, 1 as tno
from table1
union all
select distinct name, surname, 2 as tno
from table2 t2
where not exists (select * from table1 t1
where t2.name = t1.name
and t2.surname = t1.surname)
) dt
order by tno, surname, name
You can use a column for the table and one for the ID to order by:
SELECT x.name, x.surname FROM (
SELECT ID, TableID = 1, name, surname
FROM table1
UNION ALL
SELECT ID = -1, TableID = 2, name, surname
FROM table2
) x
ORDER BY x.TableID, x.ID
You can write as below, if you are ok with duplicate data then please use UNION ALL it will be faster:
SELECT NAME, surname FROM (
SELECT ID,name,surname FROM TABLE 1
UNION
SELECT ID,name,surname FROM TABLE 2 ) t ORDER BY ID
this will order the first row sets first then by anything you need
(haven't tested the code)
;with cte_1
as
(SELECT ID,name,surname,1 as table_id FROM TABLE 1
UNION
SELECT ID,name,surname,2 as table_id FROM TABLE 2 )
SELECT name, surname
FROM cte_1
ORDER BY table_id,ID
simply use a UNION clause with out order by.
SELECT name,surname FROM TABLE 1
UNION
SELECT name,surname FROM TABLE 2
if you wanted to order first table use the below query.
;WITH cte_1
AS
(SELECT name,surname,ROW_NUMBER()OVER(ORDER BY Id)b FROM TABLE 1 )
SELECT name,surname
FROM cte_1
UNION
SELECT name,surname
FROM TABLE 2

SQL Having count logic

i need help on HAVING COUNT , i have a result set of data below:
CREATE TABLE #tmpTest1 (Code VARCHAR(50), Name VARCHAR(100))
INSERT INTO [#tmpTest1]
(
[Code],
[Name]
)
SELECT '160215-039','ROBIN'
UNION ALL SELECT '160215-039','ROBIN'
UNION ALL SELECT '160215-046','SENGAROB'
UNION ALL SELECT '160215-046','BABYPANGET'
UNION ALL SELECT '160215-045','JONG'
UNION ALL SELECT '160215-045','JAPZ'
UNION ALL SELECT '160215-044','AGNES'
UNION ALL SELECT '160215-044','AGNES'
UNION ALL SELECT '160215-041','BABYTOT'
UNION ALL SELECT '160215-041','BABYTOT'
UNION ALL SELECT '160215-041','BABYTOT'
i want to show only the rows that have the same code but different name , so in this case my expected result is below since those are have the same code but different name:
160215-045 JAPZ
160215-045 JONG
160215-046 BABYPANGET
160215-046 SENGAROB
but when i try to group the two columns then use the having count, below is my query:
SELECT [Code], [Name] FROM [#tmpTest1]
GROUP BY [Code], [Name] HAVING COUNT([Code]) > 1
It gives me wrong result below which have the rows that have the same code and name, it is the opposite of what i want.
160215-044 AGNES
160215-041 BABYTOT
160215-039 ROBIN
How can i get my expected output ?
Thanks in advance, any help would much appreciated.
I believe this query will give you the result you want, although your original question is a bit unclear.
SELECT t1.[Code], t1.[Name]
FROM [#tmpTest1] t1
INNER JOIN
(
SELECT [Code]
FROM [#tmpTest1]
GROUP BY [Code]
HAVING COUNT(DISTINCT [Name]) > 1
) t2
ON t1.[Code] = t2.[Code]
Follow the link below for a running demo:
SQLFiddle
If you want rows with the same code and name, then use window functions:
select t.*
from (select t.*, count(*) over (partition by code, name) as cnt
from #temptest1 t
) t
where cnt >= 2;
From your comment
if there is 1 different name for the codes , i want to show those
records for me to know that there is one differs to others..
This sounds like an exists query because you want to check if another row with the same code but different name exists.
select * from [#tmpTest1] t1
where exists (
select 1 from [#tmpTest] t2
where t2.code = t1.code
and t2.name <> t1.name
)

Select count(*) from multiple tables

How can I select count(*) from two different tables (call them tab1 and tab2) having as result:
Count_1 Count_2
123 456
I've tried this:
select count(*) Count_1 from schema.tab1 union all select count(*) Count_2 from schema.tab2
But all I have is:
Count_1
123
456
SELECT (
SELECT COUNT(*)
FROM tab1
) AS count1,
(
SELECT COUNT(*)
FROM tab2
) AS count2
FROM dual
As additional information, to accomplish same thing in SQL Server, you just need to remove the "FROM dual" part of the query.
Just because it's slightly different:
SELECT 'table_1' AS table_name, COUNT(*) FROM table_1
UNION
SELECT 'table_2' AS table_name, COUNT(*) FROM table_2
UNION
SELECT 'table_3' AS table_name, COUNT(*) FROM table_3
It gives the answers transposed (one row per table instead of one column), otherwise I don't think it's much different. I think performance-wise they should be equivalent.
My experience is with SQL Server, but could you do:
select (select count(*) from table1) as count1,
(select count(*) from table2) as count2
In SQL Server I get the result you are after.
Other slightly different methods:
with t1_count as (select count(*) c1 from t1),
t2_count as (select count(*) c2 from t2)
select c1,
c2
from t1_count,
t2_count
/
select c1,
c2
from (select count(*) c1 from t1) t1_count,
(select count(*) c2 from t2) t2_count
/
select
t1.Count_1,t2.Count_2
from
(SELECT count(1) as Count_1 FROM tab1) as t1,
(SELECT count(1) as Count_2 FROM tab2) as t2
A quick stab came up with:
Select (select count(*) from Table1) as Count1, (select count(*) from Table2) as Count2
Note: I tested this in SQL Server, so From Dual is not necessary (hence the discrepancy).
For a bit of completeness - this query will create a query to give you a count of all of the tables for a given owner.
select
DECODE(rownum, 1, '', ' UNION ALL ') ||
'SELECT ''' || table_name || ''' AS TABLE_NAME, COUNT(*) ' ||
' FROM ' || table_name as query_string
from all_tables
where owner = :owner;
The output is something like
SELECT 'TAB1' AS TABLE_NAME, COUNT(*) FROM TAB1
UNION ALL SELECT 'TAB2' AS TABLE_NAME, COUNT(*) FROM TAB2
UNION ALL SELECT 'TAB3' AS TABLE_NAME, COUNT(*) FROM TAB3
UNION ALL SELECT 'TAB4' AS TABLE_NAME, COUNT(*) FROM TAB4
Which you can then run to get your counts. It's just a handy script to have around sometimes.
As I can't see any other answer bring this up.
If you don't like sub-queries and have primary keys in each table you can do this:
select count(distinct tab1.id) as count_t1,
count(distinct tab2.id) as count_t2
from tab1, tab2
But performance wise I believe that Quassnoi's solution is better, and the one I would use.
SELECT (SELECT COUNT(*) FROM table1) + (SELECT COUNT(*) FROM table2) FROM dual;
Here is from me to share
Option 1 - counting from same domain from different table
select distinct(select count(*) from domain1.table1) "count1", (select count(*) from domain1.table2) "count2"
from domain1.table1, domain1.table2;
Option 2 - counting from different domain for same table
select distinct(select count(*) from domain1.table1) "count1", (select count(*) from domain2.table1) "count2"
from domain1.table1, domain2.table1;
Option 3 - counting from different domain for same table with "union all" to have rows of count
select 'domain 1'"domain", count(*)
from domain1.table1
union all
select 'domain 2', count(*)
from domain2.table1;
Enjoy the SQL, I always do :)
select (select count(*) from tab1) count_1, (select count(*) from tab2) count_2 from dual;
--============= FIRST WAY (Shows as Multiple Row) ===============
SELECT 'tblProducts' [TableName], COUNT(P.Id) [RowCount] FROM tblProducts P
UNION ALL
SELECT 'tblProductSales' [TableName], COUNT(S.Id) [RowCount] FROM tblProductSales S
--============== SECOND WAY (Shows in a Single Row) =============
SELECT
(SELECT COUNT(Id) FROM tblProducts) AS ProductCount,
(SELECT COUNT(Id) FROM tblProductSales) AS SalesCount
If the tables (or at least a key column) are of the same type just make the union first and then count.
select count(*)
from (select tab1key as key from schema.tab1
union all
select tab2key as key from schema.tab2
)
Or take your satement and put another sum() around it.
select sum(amount) from
(
select count(*) amount from schema.tab1 union all select count(*) amount from schema.tab2
)
Declare #all int
SET #all = (select COUNT(*) from tab1) + (select count(*) from tab2)
Print #all
or
SELECT (select COUNT(*) from tab1) + (select count(*) from tab2)
JOIN with different tables
SELECT COUNT(*) FROM (
SELECT DISTINCT table_a.ID FROM table_a JOIN table_c ON table_a.ID = table_c.ID );
SELECT (
SELECT COUNT(*)
FROM tbl1
)
+
(
SELECT COUNT(*)
FROM tbl2
)
as TotalCount
If you're using Google BigQuery this will work.
SELECT
date,
SUM(Table_1_Id_Count) AS Table_1_Id_Count,
SUM(Table_2_Id_Count) AS Table_2_Id_Count
FROM
(
SELECT
Id AS Table_1_Id,
date,
COUNT(Id) AS Table_1_Id_Count,
0 AS Table_2_Id_Count
FROM
`your_project_name.Table_1`
GROUP BY
Id,
date
UNION ALL
SELECT
Id AS Table_2_Id,
date,
0 AS Table_1_Id_Count,
COUNT(Id) AS Table_2_Id_Count
FROM
`your_project_name.Table_2`
GROUP BY
Id,
date
)
GROUP BY
date
select
(select count() from tab1 where field like 'value') +
(select count() from tab2 where field like 'value')
count
select #count = sum(data) from
(
select count(*) as data from #tempregion
union
select count(*) as data from #tempmetro
union
select count(*) as data from #tempcity
union
select count(*) as data from #tempzips
) a