BigQuery Count Unique and Count Distinct - google-bigquery

I am looking for SQL to count unique values in the column.
I am aware of DISTINCT - that gives me how many unique values there are. However, I am looking for - how many ONLY unique values there are.
So if my data is Letters: {A,A,A,B,B,B,C,D}. I am looking to get:
Count Distinct = 4 {A,B,C,D) and
Count Unique = 2 {C,D} <== this is what I am looking for
I am working with BigQuery.
Thank You,
Do

Below query will return only unique values in the column.
SELECT col
FROM UNNEST(SPLIT('A,A,A,B,B,B,C,D')) col
GROUP BY 1 HAVING COUNT(1) = 1;
Then, you can simply count rows.
WITH uniques AS (
SELECT col
FROM UNNEST(SPLIT('A,A,A,B,B,B,C,D')) col
GROUP BY 1 HAVING COUNT(1) = 1
)
SELECT COUNT(*) cnt FROM uniques;

Another option
select count(*) from (
select * from your_table
qualify 1 = count(*) over(partition by col)
)

Related

How to select only values that are not repeating in a column example if a have column with following values "A,b,c,a,c" i have to select only b

How to select only values that are not repeating in a column? For example if a have table with following values I expect to return only the id value of b:
id
--
a
b
c
a
c
Aggregation provides one approach:
SELECT id
FROM yourTable
GROUP BY id
HAVING COUNT(*) = 1;
In subselecet you select values which have only one record, Than in outer select you search for all datas based on that value in subselect
SELECT *
FROM table
WHERE id IN (SELECT id
FROM table
GROUP BY id
HAVING COUNT (id) = 1)

SQL script to identify row based on min value

How to write a SQL statement (in SQL Server) to get a row with minimum value based on two columns?
For example:
Type Rank Val1 val2
------------------------------
A 6 486.57 38847
B 6 430 56345
C 5 390 99120
D 5 329 12390
E 4 350 11109
E 4 320 11870
The SQL statement should return the last row in above table, because it has min value for Rank, and Val1.
Something like this:
select *
from Table1
where rank = (select min(rank) from Table1)
and Val1 = (select min(Val1)
from Table1
where rank = (select min(rank) from Table1))
Or this, if you like a simple life:
select top 1 *
from Table1
order by rank asc, Val1 asc
with cte as (
select *, row_number() over (order by rank, val1) as rn
from dbo.yourTable
)
select *
from cte
where rn = 1;
The idea here is that I'm assigning a 1..n enumeration to the rows based on rank and, in the case of ties, Val1. I return the row that takes the value of 1. If there is the possibility of a tie, use rank() instead of row_number().
I'm assuming that Type is the primary key for your table, and that you only want a row that has both the lowest Val1 and lowest Val2 (so if one row has the lowest Val1, but not the lowest Val2, this returns no data). I'm not sure about these assumptions, but your question could probably be clarified a bit.
Here's the code:
SELECT
*
FROM
Table1
WHERE
Type IN
(
SELECT
Type
FROM
Table1
GROUP BY
Type
HAVING
MIN(Val1) AND MIN(val2)
)

Find the unique value in column MS SQL database

I have a set of data as below
number quantity
1 4
2 6
3 7
4 9
2 1
1 2
5 4
I need to find the unique value in the column "number"
The output should look like this:
number quantity
3 7
4 9
5 4
Any help would be appreciated. I am using MS SQL
In the inner query get all the distinct numbers, then join with again with the main table to get your expected results.
select o.*
from mytable o , (select number
from mytable
group by number) dist
where o.number = dist.number
One way to go could be to have an aggregate query that counts the number of occurrences for each number use it in a subquery:
SELECT number, quantity
FROM my_table
WHERE number IN (SELECT number
FROM my_table
GROUP BY number
HAVING COUNT(*) = 1)
If your column name is my_column in table my_table, the query is:
SELECT my_column, COUNT(*) as count
FROM my_table
GROUP BY my_column
HAVING COUNT(*) > 1
This will return all records that have duplicate my_column content, as well as how many times this content occurs in the database.
you can use below code for desire output:
SELECT DISTINCT(my_column), COUNT(*) as count
FROM my_table
GROUP BY my_column
Try this :
SELECT *
FROM yourtable t1
WHERE (SELECT Count(*)
FROM yourtable t2
WHERE t1.number = t2.number) = 1
Query in where clause will return number of occurrences of each number and checking it with 1 will return only those rows will have only one occurrence in table.
You can probably use ROW_NUMBER() analytic function like
select * from
(
select number,
quantity,
ROW_NUMBER() OVER(PARTITION BY number ORDER BY number) AS rn
from table1
) tab where rn = 1;
Try this:
create table #TableName(number int, quantity int)
insert into #TableName values(1, 2)
insert into #TableName values(1, 4)
insert into #TableName values(2, 4)
SELECT number, quantity
FROM #TableName
WHERE number
IN(SELECT number
FROM #TableName
GROUP BY number
HAVING COUNT(NUMBER) = 1)

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

I'm having difficulty with what I figure should be an easy problem. I want to select all the columns in a table for which one particular column has duplicate values.
I've been trying to use aggregate functions, but that's constraining me as I want to just match on one column and display all values. Using aggregates seems to require that I 'group by' all columns I'm going to want to display.
If I understood you correctly, this should do:
SELECT *
FROM YourTable A
WHERE EXISTS(SELECT 1
FROM YourTable
WHERE Col1 = A.Col1
GROUP BY Col1
HAVING COUNT(*) > 1)
You can join on a derived table where you aggregate and determine "col" values which are duplicated:
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT col
FROM Table1
GROUP BY col
HAVING COUNT(1) > 1
) b ON a.col = b.col
This query gives you a chance to ORDER BY cola in ascending or descending order and change Cola output.
Here's a Demo on SqlFiddle.
with cl
as
(
select *, ROW_NUMBER() OVER(partition by colb order by cola ) as rn
from tbl)
select *
from cl
where rn > 1

Select DISTINCT, return entire row

I have a table with 10 columns.
I want to return all rows for which Col006 is distinct, but return all columns...
How can I do this?
if column 6 appears like this:
| Column 6 |
| item1 |
| item1 |
| item2 |
| item1 |
I want to return two rows, one of the records with item1 and the other with item2, along with all other columns.
In SQL Server 2005 and above:
;WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col6 ORDER BY id) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
In SQL Server 2000, provided that you have a primary key column:
SELECT mt.*
FROM (
SELECT DISTINCT col6
FROM mytable
) mto
JOIN mytable mt
ON mt.id =
(
SELECT TOP 1 id
FROM mytable mti
WHERE mti.col6 = mto.col6
-- ORDER BY
-- id
-- Uncomment the lines above if the order matters
)
Update:
Check your database version and compatibility level:
SELECT ##VERSION
SELECT COMPATIBILITY_LEVEL
FROM sys.databases
WHERE name = DB_NAME()
The key word "DISTINCT" in SQL has the meaning of "unique value". When applied to a column in a query it will return as many rows from the result set as there are unique, different values for that column. As a consequence it creates a grouped result set, and values of other columns are random unless defined by other functions (such as max, min, average, etc.)
If you meant to say you want to return all rows for which Col006 has a specific value, then use the "where Col006 = value" clause.
If you meant to say you want to return all rows for which Col006 is different from all other values of Col006, then you still need to specify what that value is => see above.
If you want to say that the value of Col006 can only be evaluated once all rows have been retrieved, then use the "having Col006 = value" clause. This has the same effect as the "where" clause, but "where" gets applied when rows are retrieved from the raw tables, whereas "having" is applied once all other calculations have been made (i.e. aggregation functions have been run etc.) and just before the result set is returned to the user.
UPDATE:
After having seen your edit, I have to point out that if you use any of the other suggestions, you will end up with random values in all other 9 columns for the row that contains the value "item1" in Col006, due to the constraint further up in my post.
You can group on Col006 to get the distinct values, but then you have to decide what to do with the multiple records in each group.
You can use aggregates to pick a value from the records. Example:
select Col006, min(Col001), max(Col002)
from TheTable
group by Col006
order by Col006
If you want the values to come from a specific record in each group, you have to identify it somehow. Example of using Col002 to identify the record in each group:
select Col006, Col001, Col002
from TheTable t
inner join (
select Col006, min(Col002)
from TheTable
group by Col006
) x on t.Col006 = x.Col006 and t.Col002 = x.Col002
order by Col006
SELECT *
FROM (SELECT DISTINCT YourDistinctField FROM YourTable) AS A
CROSS APPLY
( SELECT TOP 1 * FROM YourTable B
WHERE B.YourDistinctField = A.YourDistinctField ) AS NewTableName
I tried the answers posted above with no luck... but this does the trick!
select * from yourTable where column6 in (select distinct column6 from yourTable);
SELECT *
FROM harvest
GROUP BY estimated_total;
You can use GROUP BY and MIN() to get more specific result.
Lets say that you have id as the primary_key.
And we want to get all the DISTINCT values for a column lets say estimated_total, And you also need one sample of complete row with each distinct value in SQL. Following query should do the trick.
SELECT *, min(id)
FROM harvest
GROUP BY estimated_total;
create table #temp
(C1 TINYINT,
C2 TINYINT,
C3 TINYINT,
C4 TINYINT,
C5 TINYINT,
C6 TINYINT)
INSERT INTO #temp
SELECT 1,1,1,1,1,6
UNION ALL SELECT 1,1,1,1,1,6
UNION ALL SELECT 3,1,1,1,1,3
UNION ALL SELECT 4,2,1,1,1,6
SELECT * FROM #temp
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (PARTITION BY C6 Order by C1) ID,* FROM #temp
)T
WHERE ID = 1