count (distinct ...) VS count from (select distinct ...) VS group by

count (distinct ...) VS count from (select distinct ...) VS group by - sql

On snowflake, I'm getting different results depending on how I count the distinct values from the same table. I used to think them as equivalent. Given the discrepancies, first I'd like to know in which scenarios these strategies can not be interchanged, and second, how to tell what strategy is returning the right number.
I include the query I'm using to test this
select 'count_distinct_subquery' as strat,count(*) from (
select distinct
plan_code,
fis_we_dt,
sku_no,
pog_segment_name,
shelf_no,
position_id
from src
)
union all
select 'count_distinct' as strat,count(
distinct
plan_code,
fis_we_dt,
sku_no,
pog_segment_name,
shelf_no,
position_id
)
from src
union all
select 'group_by_subquery' as strat, count(*) from (
select *
from src
group by
plan_code,
fis_we_dt,
sku_no,
pog_segment_name,
shelf_no,
position_id
)
The output is as in the image

The second version count (distinct expr1, ...) skips NULLs.
CREATE OR REPLACE TABLE src
AS
SELECT NULL AS plan_code, 1 AS fis_we_dt;
select 'count_distinct_subquery' as strat,count(*) from (
select distinct
plan_code,
fis_we_dt
from src
)
union all
select 'count_distinct' as strat,count(
distinct
plan_code,
fis_we_dt
)
from src
union all
select 'group_by_subquery' as strat, count(*) from (
select *
from src
group by
plan_code,
fis_we_dt
);
COUNT(*)
COUNT( [ DISTINCT ] <expr1> [ , <expr2> ... ] )
Alias.*, which indicates that the function should return the number of rows that do not contain any NULLs. See Examples for an example.
Aggregate Functions and NULL Values
Some aggregate functions can be passed more than one column. For example:
SELECT COUNT(col1, col2) FROM table1;
In these instances, the aggregate function ignores a row if any individual column is NULL.

As per Lukasz answer:
with data(col1, col2) as (
select * from values
(1, 10),
(1, 10),
(1, null),
(null, null),
(null, null)
), unions as (
select
'count_distinct_subquery' as strat,
count(*) as count
from (
select distinct col1, col2
from data
)
union all
select
'count_distinct' as strat,
count(distinct col1, col2)
from data
union all
select
'group_by_subquery' as strat,
count(*)
from (
select *
from data
group by 1,2
)
)
select * from unions;
gives:
STRAT
COUNT
count_distinct_subquery
3
count_distinct
1
group_by_subquery
3
the first and last are the same thing. The sub-select DISTINCT and GROUP BY are find the permutations, AND they respect NULLs as things, and then you count the number of rows present, which is 3.
The middle is asking for the count of row with no-nulls, which is 1.
expr1
This should be either:
A column name, which can be a qualified name (e.g. > database.schema.table.column_name).
Alias.*, which indicates that the function should return the number of rows that do not contain any NULLs. See Examples for an example.
This can be further seen by taking the sub-selects from the first and third queries, and wrap those in a count(distinct col1, col2) instead of the count(*)
select
'group_by_subquery_count_cols' as strat,
count(distinct col1, col2)
from (
select *
from data
group by 1,2
)
union all
select
'count_distinct_subquery_count_cols' as strat,
count(distinct col1, col2)
from (
select distinct col1, col2
from data
)
and now we get 1's again not three's, from the same data.
STRAT
COUNT
count_distinct_subquery
3
count_distinct
1
group_by_subquery
3
group_by_subquery_count_cols
1
count_distinct_subquery_count_cols
1

Related

If Array_Agg Struct have duplicate data how we can delete it in Big Query

If Array_Agg Struct have duplicate data how we can delete it in Big Query Eg. (1 ,[1,2,3] )`` (2, [3,4,5]) (1, [1,2,3])
Required Output` (1 ,[1,2,3] ), (2, [3,4,5]),

To remove duplicates in your array of structs, you can unnest your arrays to flatten the data and then select distinct values. When duplicates are removed, recreate the arrays and structs. See approach below:
with sample_data as (
select struct(1 as col1,[1,2,3] as col2) as test
union all select struct(2 as col1 ,[3,4,5] as col2) as test
union all select struct(1 as col1,[1,2,3] as col2) as test
),
t1 as (
select array_agg(test) as arr_of_structs from sample_data
),
-- disregard query above since it is for generating the sample data
remove_dups as (
select
distinct d_1.col1, d_2
from t1,
unnest(arr_of_structs) as d_1,
unnest(d_1.col2) as d_2
),
rebuild_struct as (
select
struct(col1,array_agg(d_2) as col2) as test
from remove_dups
group by col1
)
select array_agg(test) as arr_of_structs from rebuild_struct
Output:

one more option:
with your_table as (
select ([struct(1 as id ,[1,2,3] as arr), (2, [3,4,5]), (1, [1,2,3])]) array_agg_col
)
select array(
select any_value(x)
from t.array_agg_col x
group by format('%t', x)
) new_col
from your_table t
with output

You may consider this:
WITH sample_table AS (
SELECT [STRUCT(1 AS int_field, [1,2,3] AS arr_field), (2, [3,4,5]), (1, [1,2,3])] array_agg_col
)
SELECT ARRAY(
SELECT AS STRUCT *
FROM t.array_agg_col t1
QUALIFY ROW_NUMBER() OVER (PARTITION BY TO_JSON_STRING(t1)) = 1
) deduped_col
FROM sample_table t;

Selecting distinct values within a a group

I want to select distinct values of one variable within a group defined by another variable. What is the easiest way?
My first thought was to combine group by and distinct but it does not work. I tried something like:
select distinct col2, col1 from myTable
group by col1
I have looked at this one here but can't seem to solve my problem
Using DISTINCT along with GROUP BY in SQL Server_
Table example

If your requirement is to pick distinct combinations if col1 and COL2 then no need to group by just use
SELECT DISTINCT COL1, COL2 FROM TABLE1;
But if you want to group by then automatically one record per group is displayed by then you have to use aggregate function of one of the columns i.e.
SELECT COL1, COUNT(COL2)
FROM TABLE1 GROUP BY COL1;

no need group by just use distinct
select distinct col2, col1 from myTable

create table t as
with inputs(val, id) as
(
select 'A', 1 from dual union all
select 'A', 1 from dual union all
select 'A', 2 from dual union all
select 'B', 1 from dual union all
select 'B', 2 from dual union all
select 'C', 3 from dual
)
select * from inputs;
The above creates your table and the below is the solution (12c and later):
select * from t
match_recognize
(
partition by val
order by id
all rows per match
pattern ( a {- b* -} )
define b as val = a.val and id = a.id
);
Output:
Regards,
Ranagal

how to find all column records are same or not in group by column in SQL

How to find all column values are same in Group by of rows in table
CREATE TABLE #Temp (ID int,Value char(1))
insert into #Temp (ID ,Value ) ( Select 1 ,'A' union all Select 1 ,'W' union all Select 1 ,'I' union all Select 2 ,'I' union all Select 2 ,'I' union all Select 3 ,'A' union all Select 3 ,'B' union all Select 3 ,'1' )
select * from #Temp
Sample Table:
How to find all column value of 'Value' column are same or not if group by 'ID' Column.
Ex: select ID from #Temp group by ID
For ID 1 - Value column records are A, W, I - Not Same
For ID 2 - Value column records are I, I - Same
For ID 3 - Value column records are A, B, 1 - Not Same
I want the query to get a result like below

When all items in the group are the same, COUNT(DISTINCT Value) would be 1:
SELECT Id
, CASE WHEN COUNT(DISTINCT Value)=1 THEN 'Same' ELSE 'Not Same' END AS Result
FROM MyTable
GROUP BY Id

If you're using T-SQL, perhaps this will work for you:
SELECT t.ID,
CASE WHEN MAX(t.RN) > 1 THEN 'Same' ELSE 'Not Same' END AS GroupResults
FROM(
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID, VALUE ORDER BY ID) RN
FROM #Temp
) t
GROUP BY t.ID

Usally that's rather easy: Aggregate per ID and count distinct values or compare minimum and maximum value.
However, neither COUNT(DISTINCT value) nor MIN(value) nor MAX(value) take nulls into consideration. So for an ID having value 'A' and null, these would detect uniqueness. Maybe this is what you want or nulls don't even occur in your data.
But if you want nulls to count as a value, then select distinct values first (where null gets a row too) and count then:
select id, case when count(*) = 1 then 'same' else 'not same' end as result
from (select distinct id, value from #temp) dist
group by id
order by id;
Rextester demo: http://rextester.com/KCZD88697

How to join two tables with the same number of rows in SQLite?

I have almost the same problem as described in this question. I have two tables with the same number of rows, and I would like to join them together one by one.
The tables are ordered, and I would like to keep this order after the join, if it is possible.
There is a rowid based solution for MSSql, but in SQLite rowid can not be used if the table is coming from a WITH statement (or RECURSIVE WITH).
It is guaranteed that the two tables have the exact same number of rows, but this number is not known beforehand. It is also important to note, that the same element may occur more than twice. The results are ordered, but none of the columns are unique.
Example code:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, table_b.s
FROM table_a
LEFT JOIN table_b ON ( table_a.rowid = table_b.rowid )
The result I would like to achieve is:
(2, 'valuex'),
(4, 'valuey'),
(5, 'valuez')
SQLFiddle: http://sqlfiddle.com/#!5/9eecb7/6888

This is quite complicated in SQLite -- because you are allowing duplicates. But you can do it. Here is the idea:
Summarize the table by the values.
For each value, get the count and offset from the beginning of the values.
Then use a join to associate the values and figure out the overlap.
Finally use a recursive CTE to extract the values that you want.
The following code assumes that n and s are ordered -- as you specify in your question. However, it would work (with small modifications) if another column specified the ordering.
You will notice that I have included duplicates in the sample data:
WITH table_a (n) AS (
SELECT 2 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuez' UNION ALL
SELECT 'valuez'
),
a as (
select a.n, count(*) as a_cnt,
(select count(*) from table_a a2 where a2.n < a.n) as a_offset
from table_a a
group by a.n
),
b as (
select b.s, count(*) as b_cnt,
(select count(*) from table_b b2 where b2.s < b.s) as b_offset
from table_b b
group by b.s
),
ab as (
select a.*, b.*,
max(a.a_offset, b.b_offset) as offset,
min(a.a_offset + a.a_cnt, b.b_offset + b.b_cnt) - max(a.a_offset, b.b_offset) as cnt
from a join
b
on a.a_offset + a.a_cnt - 1 >= b.b_offset and
a.a_offset <= b.b_offset + b.b_cnt - 1
),
cte as (
select n, s, offset, cnt, 1 as ind
from ab
union all
select n, s, offset, cnt, ind + 1
from cte
where ind < cnt
)
select n, s
from cte
order by n, s;
Here is a DB Fiddle showing the results.
I should note that this would be much simpler in almost any other database, using window functions (or perhaps variables in MySQL).

Since the tables are ordered, you can add row_id values by comparing n values.
But still the best way in order to get better performance would be inserting the ID values while creating the tables.
http://sqlfiddle.com/#!5/9eecb7/7014
WITH
table_a_a (n, id) AS
(
WITH table_a (n) AS
(
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
) ,
table_b_b (n, id) AS
(
WITH table_a (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
)
select table_a_a.n,table_b_b.n from table_a_a,table_b_b where table_a_a.ID = table_b_b.ID
or convert the input set to comma separated list and try like this:
http://sqlfiddle.com/#!5/9eecb7/7337
WITH RECURSIVE table_b( id,element, remainder ) AS (
SELECT 0,NULL AS element, 'valuex,valuey,valuz,valuz' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_b
WHERE remainder IS NOT NULL
),
table_a( id,element, remainder ) AS (
SELECT 0,NULL AS element, '2,4,5,7' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_a
WHERE remainder IS NOT NULL
)
SELECT table_b.element, table_a.element FROM table_b, table_a WHERE table_a.element IS NOT NULL and table_a.id = table_b.id;

SQL
SELECT a1.n, b1.s
FROM table_a a1
LEFT JOIN table_b b1
ON (SELECT COUNT(*) FROM table_a a2 WHERE a2.n <= a1.n) =
(SELECT COUNT(*) FROM table_b b2 WHERE b2.s <= b1.s)
Explanation
The query simply counts the number of rows up until the current one for each table (based on the ordering column) and joins on this value.
Demo
See SQL Fiddle demo.
Assumptions
A single column in used for the ordering in each table. (But the query could easily be modified to allow multiple ordering columns).
The ordering values in each table are unique.
The values in the ordering column aren't necessarily the same between the two tables.
It is known that table_a contains either the same or more rows than table_b. (If this isn't the case then a FULL OUTER JOIN would need to be emulated since SQLite doesn't provide one.)
No further changes to the table structure are allowed. (If they are, it would be more efficient to have pre-populated columns for the ordering).

Either way...
Use something like
WITH
v_table_a (n, rowid) AS (
SELECT 2, 1
UNION ALL
SELECT 4, 2
UNION ALL
SELECT 5, 3
),
v_table_b (s, rowid) AS (
SELECT 'valuex', 1
UNION ALL
SELECT 'valuey', 2
UNION ALL
SELECT 'valuez', 3
)
SELECT v_table_a.n, v_table_b.s
FROM v_table_a
LEFT JOIN v_table_b ON ( v_table_a.rowid = v_table_b.rowid );
for "virtual" tables (with WITH or without),
WITH RECURSIVE vr_table_a (n, rowid) AS (
VALUES (2, 1)
UNION ALL
SELECT n + 2, rowid + 1 FROM vr_table_a WHERE rowid < 3
)
, vr_table_b (s, rowid) AS (
VALUES ('I', 1)
UNION ALL
SELECT s || 'I', rowid + 1 FROM vr_table_b WHERE rowid < 3
)
SELECT vr_table_a.n, vr_table_b.s
FROM vr_table_a
LEFT JOIN vr_table_b ON ( vr_table_a.rowid = vr_table_b.rowid );
for "virtual" tables using recursive WITHs (in this example the values are others then yours, but I guess you get the point) and
CREATE TABLE p_table_a (n INT);
INSERT INTO p_table_a VALUES (2), (4), (5);
CREATE TABLE p_table_b (s VARCHAR(6));
INSERT INTO p_table_b VALUES ('valuex'), ('valuey'), ('valuez');
SELECT p_table_a.n, p_table_b.s
FROM p_table_a
LEFT JOIN p_table_b ON ( p_table_a.rowid = p_table_b.rowid );
for physical tables.
I'd be careful with the last one though. A quick test shows, that the numbers of rowid are a) reused -- when some rows are deleted and others are inserted, the inserted rows get the rowids from the old rows (i.e. rowid in SQLite isn't unique past the lifetime of a row, whereas e.g. Oracle's rowid AFAIR is) -- and b) corresponds to the order of insertion. But I don't know and didn't find a clue in the documentation, if that's guaranteed or is subject to change in other/future implementations. Or maybe it's just a mere coincidence in my test environment.
(In general physical order of rows may be subject to change (even within the same database using the same DMBS as a result of some reorganization) and is therefore no good choice to rely on. And it's not guaranteed, a query will return the result ordered by physical position in the table as well (it might use the order of some index instead or have a partial result ordered some other way influencing the output's order). Consider designing your tables using common (sort) keys in corresponding rows for ordering and to join on.)

You can create temp tables to carry CTE data row. then JOIN them by sqlite row_id column.
CREATE TEMP TABLE temp_a(n integer);
CREATE TEMP TABLE temp_b(n VARCHAR(255));
WITH table_a(n) AS (
SELECT 2 n
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 5
)
INSERT INTO temp_a (n) SELECT n FROM table_a;
WITH table_b (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
UNION ALL
SELECT 'valuew'
)
INSERT INTO temp_b (n) SELECT n FROM table_b;
SELECT *
FROM temp_a a
INNER JOIN temp_b b on a.rowid = b.rowid;
sqlfiddle:http://sqlfiddle.com/#!5/9eecb7/7252

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it.
Something like this:
with tablea AS (
select id, rowid AS rid from someids),
tableb AS (
select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;
It is however as they have already warned you a really bad idea. What if the app breaks after inserting in one table but before the other one? What if you delete an old row? If you want to join two tables you need to specify the field to do so. There are so many things that could go wrong with this design. The most similar thing to this would be an incremental id field that you would save in the table and use in your application. Even simpler, make those into one table.
Read this link for more information about the rowid: https://www.sqlite.org/lang_createtable.html#rowid
sqlfiddle: http://sqlfiddle.com/#!7/29fd8/1

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it. Something like this:
with tablea AS (select id, rowid AS rid from someids),
tableb AS (select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;

The problem statement indicates:
The tables are ordered
If this means that the ordering is defined by the ordering of the values in the UNION ALL statements, and if SQLite respects that ordering, then the following solution may be of interest because, apart from small tweaks to the last three lines of the sample program, it adds just two lines:
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
That is, table A is table_a augmented with a rowid, and similarly for table B.
Unfortunately, there is a caveat, though it might just be the result of my not having found the relevant specifications. Before delving into that, however, here is the full proposed solution:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
),
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
SELECT A.n, B.s
FROM A LEFT JOIN B
ON ( A.rid = B.rid );
Caveat
The proposed solution has been tested against a variety of data sets using sqlite version 3.29.0, but whether or not it is, and will continue to be, "guaranteed" to work is unclear to me.
Of course, if SQLite offers no guarantees with respect to the ordering of the UNION ALL statements (that is, if the question is based on an incorrect assumption), then it would be interesting to see a well-founded reformulation.

How to obtain count of record differences in the same table, where there are distinct and nearly-distinct records

I've a table TABLEA with data as below
field1 field2 field3.......field16
123 10-JAN-12 0.8.......ABC
123 10-JAN-12 0.8.......ABC
.
.
.
123 10-JAN-12 0.7.......ABC
245 11-JAN-12 0.3.......CDE
245 11-JAN-12 0.3.......CDE
245 11-JAN-12 0.3.......XYZ
...
<unique rows>
When I do a
select field1, field2, ...field16
from TABLEA
I obtain M records,and when I do a
select distinct field1, field2...field16
from TABLEA
I obtain M-x records, where M is in the Millions and x is a much smaller #.
I am trying to write SQL to get the x records (eventually, just get the count).
I've tried all Set operator keywords like
select field1...field16
from TABLEA
EXCEPT
select distinct field1..field16
from TABLEA
Or using UNION ALL instead of EXCEPT. But none of them return x, instead they all return 0 rows.

You can select the rows that are not distinct by
SELECT field1, ... , field16
FROM tablea
GROUP BY field1, ... , field16
HAVING count(*) > 1
Edit: Another approach would be to use an analytical function ROW_NUMBER(), partitioning by all your field columns. The first (i.e. distinct) row for a given set of fields has ROW_NUMBER = 1, the second = 2, the third = 3 etc. So you can select the x-rows with WHERE ROW_NUMBER > 1.
CREATE TABLE tablea (
field1 NUMBER, field2 DATE, field3 NUMBER, field16 VARCHAR2(10)
);
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.7, 'ABC');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'XYZ');
To select the duplicate rows x:
SELECT *
FROM (
SELECT field1, field2, field3, field16,
ROWID AS rid,
ROW_NUMBER() OVER (PARTITION BY
field1, field2, field3, field16 ORDER BY ROWID) as rn
FROM tablea
)
WHERE rn > 1;
123 10.01.2012 0.8 ABC AAAJ6mAAEAAAAExAAB 2
245 11.01.2012 0.3 CDE AAAJ6mAAEAAAAExAAE 2

you will get what you want with your own 'Except' query that you have posted above. But you must include the 'ALL' keyword in your except as 'Except Distinct' is the default. So I have just added the ALL keyword below in your query itself:
select field1...field16
from TABLEA
EXCEPT ALL
select distinct field1..field16
from TABLEA
If you want a count of the records of M-x then make the above query a subquery in the FROM clause of another query and have count in that outer query and you would get the count as shown below:
Select count(*)
From
(
select field1...field16
from TABLEA
EXCEPT ALL
select distinct field1..field16
from TABLEA
) B
Guess this is what you are looking for.
Good luck

You are not going to get a count of a row result that is not in your distinct, if your column choices are the same. Distinct is showing a 'DISTINCT' possibility of all results so doing a union all is just going to repeat it and except is never going to find anything as you are limiting out your rows. What are you trying to even do? Try to count where the distincts are happening? The answer you got from Wolfgang does that already.
declare #Table Table ( personID int identity, person varchar(8));
insert into #Table values ('Brett'),('Brett'),('Brett'),('John'),('John'),('Peter');
-- gives me all results
select person
from #Table
-- gives me distinct results (no repeats)
Select distinct person
from #Table
-- gives me nothing as nothing exists that is distinct that is not in total
select person
from #Table
except
select distinct person
from #Table
-- shows me counts of rows repeated by pivoting on one column and counting resultant rows from that. Having clause adds predicate specific logic to hunt for.
-- in this case duplicates or rows greater than one
Select person, count(*)
from #Table
group by person
having count(*) > 1
EDIT you can get a difference of the distinct from the total if that is what you mean:
with dupes as
(
Select count(*) as cnts, sum(count(*)) over() as TotalDupes
from #Table
group by person
having count(*) > 1 -- dupes are defined by rows repeating
)
, uniques as
(
Select count(*) as cnts, sum(count(*)) over() as TotalUniques
from #Table
group by person
having count(*) = 1 -- non dupes are rows of only a single resulting row
)
select distinct TotalDupes - TotalUniques as DifferenceFromRepeatsToUnqiues
from Dupes, Uniques

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

count (distinct ...) VS count from (select distinct ...) VS group by - sql

Related

If Array_Agg Struct have duplicate data how we can delete it in Big Query

Selecting distinct values within a a group

how to find all column records are same or not in group by column in SQL

How to join two tables with the same number of rows in SQLite?

How to obtain count of record differences in the same table, where there are distinct and nearly-distinct records

Categories

Resources