Apply the same aggregate to every column in a table - sql

I am using a proprietary mpp database that has been forked off psql 8.3. I am trying to apply a simple count to a wide table (around 450 columns) and so I was wondering if the best way to do this in terms of a simple sql function. I am just counting the number of distinct values in a given column as well as the count of the number of null values in the column. The query i want to generalize for every column is for example
If i want to run the query against the column names i write
select
count(distinct names) d_names,
sum(case when names is not null then 1 else 0 end) n_s_ip
from table;
How do i generalize the query above to iterate through every column in the table if the number of columns is 450 without writing out each column name by hand?

First, since COUNT() only counts non-null values, your query can be simplified:
SELECT count(DISTINCT names) AS unique_names
,count(names) AS names_not_null
FROM table;
But that's the number of non-null values and contradicts your description:
count of the number of null values in the column
For that you would use:
count(*) - count(names) AS names_null
Since count(*) count all rows and count(names) only rows with non-null names.
Removed inferior alternative after hint by #Andriy.
To automate that for all columns build an SQL statement off of the catalog table pg_attribute dynamically. You can use EXECUTE in a PL/pgSQL function to execute it immediately. Find full code examples with links to the manual and explanation under these closely related questions:
How to perform the same aggregation on every column, without listing the columns?
postgresql - count (no null values) of each column in a table

You can generate the repetitive part of query by using information_scheam.columns.
select 'count(distinct '||column_name||') d_names, sum(case when '||column_name||' is not null then 1 else 0 end) n_s_ip,'
from information_schema.columns where table_name='table'
order by ordinal_position;
The above query will generate count(...) and sum(...) for each column of table. This result can be used as select-list for your query. You can cut&paste the result to the following query:
select
-- paste here
from table;
After paste, you have to remove the last comma.
In this way, you can avoid writing select-list for 450 columns.

Related

Grouping by large number of columns with ordinal reference

I have a lengthy query in postgres that requires grouping by 30 different columns referenced in the query
At the moment, I have manually listed the group by columns by their ordinal value in the query
SELECT ...
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
Does a better way of doing this exist? I have tried
SELECT...
GROUP BY array(generate_series(1, 30))
But this method fails. Is there a way to use generate_series in the GROUP BY clause to reference columns in the query in cases where queries require a large number of group by columns?

SQL: How to disable result of aggregate on empty table?

When applying SQL aggregate functions (COUNT, MAX, etc.) on an empty table, I would like to get an empty result set (no rows) to simplify processing in the ORM.
Currently, the special return values (0 for COUNT, NULL for all other aggregates) are returned (assuming an empty table user):
sqlite> SELECT COUNT(id) FROM user;
count(id)
0
I know there is the trick to use GROUP BY plus HAVING clause to filter empty results, but this is rather cumbersome and I am unsure about performance:
sqlite> SELECT COUNT(id) FROM user GROUP BY 1=1 HAVING COUNT(id) > 0;
sqlite>
Thus the questions:
Is it possible to disable aggregate functions to return a row if the source table is empty?
Is there a performance impact of using a GROUP BY clause that true for all entries?
A SQL aggregation query with no group by returns one row. This is by definition. It is how SQL works. Usually, this is considered a good thing and actually makes applications work better.
For instance, it is easier to check that a single column count, rather than checking count (if there are rows) and checking for no rows (in other cases).
In SQLite, you can do what you want by adding a GROUP BY. So:
select . . . -- aggregation functions only
from . . .
group by null;
This is grouping by a constant, which is functionally equivalent to no group by, unless there are no rows. This version returns an empty result set.

select count(*)+count(*) is this sql statement produce any result or error?

I have faced this question on interview with option like error,1,2,3
Now got the result as : 2
select count(*)+COUNT(*)
result is 2
Normally all selects are of the form SELECT [columns, scalar computations on columns, grouped computations on columns, or scalar computations] FROM [table or joins of tables, etc]
Because this allows plain scalar computations we can do something like SELECT 1 + 1 FROM SomeTable and it will return a recordset with the value 2 for every row in the table SomeTable.
Now, if we didn't care about any table, but just wanted to do our scalar computed we might want to do something like SELECT 1 + 1. This isn't allowed by the standard, but it is useful and most databases allow it (Oracle doesn't unless it's changed recently, at least it used to not).
Hence such bare SELECTs are treated as if they had a from clause which specified a table with one row and no column (impossible of course, but it does the trick). Hence SELECT 1 + 1 becomes SELECT 1 + 1 FROM ImaginaryTableWithOneRow which returns a single row with a single column with the value 2.
Mostly we don't think about this, we just get used to the fact that bare SELECTs give results and don't even think about the fact that there must be some one-row thing selected to return one row.
In doing SELECT COUNT() you did the equivalent of SELECT COUNT() FROM ImaginaryTableWithOneRow which of course returns 1.
References : Why MySQL COUNT without table name gives 1

The Subquery which returns multiple rows in Oracle SQL

I have a complex SQL query with multiple sub queries. The Query returns a very big data. The tables are dynamic and they get updated every day. Yesterday, the query didn't execute, because one of the subqueries returned multiple rows.
The subquery would be something like this.
Select Value1 from Table1 where Table1.ColumnName = 123456
Table1.ColumnName will be fetched dynamically, nothing will be hardcoded. Table1.ColumnName will be fetched from another subquery which runs perfectly.
My Question would be,
How to find which value in the particular subquery returned two rows.
How to find which value in the particular subquery returned two rows.
You need to check each sub-query whether it returns a single-row or multiple-rows for a value. You can use the COUNT function to verify -
select column_name, count(*) from table_name
group by column_name
having count(*) > 1
The above is the sub-query for which it checks the count of rows grouped by each value, if any value returns more than one row, that value is the culprit.
Once you get to know which sub-query and respective column is the culprit, you coulkd then use ROWNUM or ANALYTIC functions to limit the number of rows.

SELECT COUNT(*) ;

I have a database, database1, with two tables (Table 1, Table2) in it.
There are 3 rows in Table1 and 2 rows in Table2. Now if I execute the following SQL query SELECT COUNT(*); on database1, then the output is "1".
Does anyone has the idea, what this "1" signifies?
The definition of the two tables is as below.
CREATE TABLE Table1
(
ID INT PRIMARY KEY,
NAME NVARCHAR(20)
)
CREATE TABLE Table2
(
ID INT PRIMARY KEY,
NAME NVARCHAR(20)
)
Normally all selects are of the form SELECT [columns, scalar computations on columns, grouped computations on columns, or scalar computations] FROM [table or joins of tables, etc]
Because this allows plain scalar computations we can do something like SELECT 1 + 1 FROM SomeTable and it will return a recordset with the value 2 for every row in the table SomeTable.
Now, if we didn't care about any table, but just wanted to do our scalar computed we might want to do something like SELECT 1 + 1. This isn't allowed by the standard, but it is useful and most databases allow it (Oracle doesn't unless it's changed recently, at least it used to not).
Hence such bare SELECTs are treated as if they had a from clause which specified a table with one row and no column (impossible of course, but it does the trick). Hence SELECT 1 + 1 becomes SELECT 1 + 1 FROM ImaginaryTableWithOneRow which returns a single row with a single column with the value 2.
Mostly we don't think about this, we just get used to the fact that bare SELECTs give results and don't even think about the fact that there must be some one-row thing selected to return one row.
In doing SELECT COUNT(*) you did the equivalent of SELECT COUNT(*) FROM ImaginaryTableWithOneRow which of course returns 1.
Along similar lines the following also returns a result.
SELECT 'test'
WHERE EXISTS (SELECT *)
The explanation for that behavior (from this Connect item) also applies to your question.
In ANSI SQL, a SELECT statement without FROM clause is not permitted -
you need to specify a table source. So the statement "SELECT 'test'
WHERE EXISTS(SELECT *)" should give syntax error. This is the correct
behavior.
With respect to the SQL Server implementation, the FROM
clause is optional and it has always worked this way. So you can do
"SELECT 1" or "SELECT #v" and so on without requiring a table. In
other database systems, there is a dummy table called "DUAL" with one
row that is used to do such SELECT statements like "SELECT 1 FROM
dual;" or "SELECT #v FROM dual;". Now, coming to the EXISTS clause -
the project list doesn't matter in terms of the syntax or result of
the query and SELECT * is valid in a sub-query. Couple this with the
fact that we allow SELECT without FROM, you get the behavior that you
see. We could fix it but there is not much value in doing it and it
might break existing application code.
It's because you have executed select count(*) without specifying a table.
The count function returns the number of rows in the specified dataset. If you don't specify a table to select from, a single select will only ever return a single row - therefore count(*) will return 1. (In some versions of SQL, such as Oracle, you have to specify a table or similar database object; Oracle includes a dummy table (called DUAL) which can be selected from when no specific table is required.)
you wouldn't normally execute a select count(*) without specifying a table to query against. Your database server is probably giving you a count of "1" based on default system table it is querying.
Try using
select count(*) from Table1
Without a table name it makes no sense.
without table name it always return 1 whether it any database....
Since this is tagged SQL server, the MSDN states.
COUNT always returns an int data type value.
Also,
COUNT(*) returns the number of items in a group. This includes NULL
values and duplicates.
Thus, since you didn't provide a table to do a COUNT from, the default (assumption) is that it returns a 1.
COUNT function returns the number of rows as result. If you don't specify any table, it returns 1 by default. ie., COUNT(*), COUNT(1), COUNT(2), ... will return 1 always.
Select *
without a from clause is "Select ALL from the Universe" since you have filtered out nothing.
In your case, you are asking "How many universe?"
This is exactly how I would teach it. I would write on the board on the first day,
Select * and ask what it means. Answer: Give me the world.
And from there I would teach how to filter the universe down to something meaningful.
I must admit, I never thought of Select Count(*), which would make it more interesting but still brings back a true answer. We have only one world.
Without consulting Steven Hawking, SQL will have to contend with only 1.
The results of the query is correct.