SQL/HIVE - Distinct count query - How does SELECT COUNT (DISTINCT columns,..) differ from SELECT COUNT(*) with subquery of DISTINCT records - sql

In HIVE, I tried getting the count of distinct rows in 2 methods,
SELECT COUNT (*) FROM (SELECT DISTINCT columns FROM table);
SELECT COUNT (DISTINCT columns) FROM table;
Both are yielding DIFFERENT RESULTS.
The count for the first query is greater than the second query.
How are they working differently?
Thanks in advance.

Do a slight change to your query, ie name your sub query for eg:
SELECT COUNT (*) FROM (SELECT DISTINCT columns FROM table) myquery;

Try with this in hive:
SELECT COUNT (DISTINCT nvl(columns,'NA')) FROM table;
or:
SELECT COUNT (DISTINCT coalesce(columns,'NA')) FROM table;
Above query output will be same as below:
SELECT COUNT (*) FROM (SELECT DISTINCT columns FROM table);

Related

PostgreSQL create count, count distinct columns

fairly new to PostgreSQL and trying out a few count queries. I'm looking to count and count distinct all values in a table. Pretty straightforward -
CountD Count
351 400
With a query like this:
SELECT COUNT(*)
COUNT(id) AS count_id,
COUNT DISTINCT(id) AS count_d_id
FROM table
I see that I can create a single column this way:
SELECT COUNT(*) FROM (SELECT DISTINCT id FROM table) AS count_d_id
But the title (count_d_id) doesn't come through properly and unsure how can I add an additional column. Guidance appreciated
This is the correct syntax:
SELECT COUNT(id) AS count_id,
COUNT(DISTINCT id) AS count_d_id
FROM table
Your original query aliases the subquery rather than the column. You seem to want:
SELECT COUNT(*) AS count_d_id FROM (SELECT DISTINCT id FROM table) t
-- column alias --^ -- subquery alias --^

I want to run except query with row count

I want to do a count for both tables and then do a minus so result should be zero. Can you please provide me right syntax? Thanks in advance I am using sql server and had the source table is in oracle and target teradata.
Currently I am using following syntax:
SELECT COUNT (*) FROM Table 1.[BATCH] except SELECT count (*) FROM table 2;
Just do two subqueries and subtract them:
SELECT (SELECT COUNT (*) FROM Table1.[BATCH]) - (SELECT count (*) FROM table2);
you can run your except query as a subquery
SELECT Count(*)
FROM (SELECT *
FROM Table1
EXCEPT
SELECT *
FROM Table2) T

can count(*) return a different result from just count

for example can this query
select count(*) from table
return something different than this query:
select count(column) from table
?
COUNT(*) counts all rows
COUNT(column) counts non-NULLs only

COUNT of DISTINCT items in a column

Here's the SQL that works (strangely) but still just returns the COUNT of all items, not the COUNT of DISTINCT items in the column.
SELECT DISTINCT(COUNT(columnName)) FROM tableName;
SELECT COUNT(*) FROM tableName
counts all rows in the table,
SELECT COUNT(columnName) FROM tableName
counts all the rows in the table where columnName is not null, and
SELECT (DISTINCT COUNT(columnName)) FROM tableName
counts all the rows in the table where columnName is both not null and distinct (i.e. no two the same)
SELECT DISTINCT(COUNT(columnName)) FROM tableName
Is the second query (returning, say, 42), and the distinct gets applied after the rows are counted.
You need
SELECT COUNT(DISTINCT columnName) AS Cnt
FROM tableName;
The query in your question gets the COUNT (i.e. a result set with one row) then applies Distinct to that single row result which obviously has no effect.
SELECT COUNT(*) FROM (SELECT DISTINCT columnName FROM tableName);

SELECT *, COUNT(*) in SQLite

If i perform a standard query in SQLite:
SELECT * FROM my_table
I get all records in my table as expected. If i perform following query:
SELECT *, 1 FROM my_table
I get all records as expected with rightmost column holding '1' in all records. But if i perform the query:
SELECT *, COUNT(*) FROM my_table
I get only ONE row (with rightmost column is a correct count).
Why is such results? I'm not very good in SQL, maybe such behavior is expected? It seems very strange and unlogical to me :(.
SELECT *, COUNT(*) FROM my_table is not what you want, and it's not really valid SQL, you have to group by all the columns that's not an aggregate.
You'd want something like
SELECT somecolumn,someothercolumn, COUNT(*)
FROM my_table
GROUP BY somecolumn,someothercolumn
If you want to count the number of records in your table, simply run:
SELECT COUNT(*) FROM your_table;
count(*) is an aggregate function. Aggregate functions need to be grouped for a meaningful results. You can read: count columns group by
If what you want is the total number of records in the table appended to each row you can do something like
SELECT *
FROM my_table
CROSS JOIN (SELECT COUNT(*) AS COUNT_OF_RECS_IN_MY_TABLE
FROM MY_TABLE)