SQL - group by both bits - sql

I have an SQL table with one bit column. If only one bit value (in my case 1) occurs in the rows of the table, how can I make a SELECT statement which shows for, example, the occurance of both bit values in the table, even if the other does not occur? This is the result I'm trying to achieve:
+----------+--------+
| IsItTrue | AMOUNT |
+----------+--------+
| 1 | 12 |
| 0 | NULL |
+----------+--------+
I already tried to google the answer but without success, as English is not my native language and I am not that familiar with SQL jargon.

select IsItTrue, count(id) as Amount from
(select IsItTrue, id from table
union
select 1 as IsItTrue, null as id
union
select 0 as IsItTrue, null as id) t
group by bool

Related

How to create a table to count with a conditional

I have a database with a lot of columns with pass, fail, blank indicators
I want to create a function to count each type of value and create a table from the counts. The structure I am thinking is something like
| Value | x | y | z |
|-------|------------------|-------------------|---|---|---|---|---|---|---|
| pass | count if x=pass | count if y=pass | count if z=pass | | | | | | |
| fail | count if x=fail | count if y=fail |count if z=fail | | | | | | |
| blank | count if x=blank | count if y=blank | count if z=blank | | | | | | |
| total | count(x) | count(y) | count (z) | | | | | | |
where x,y,z are columns from another table.
I don't know which could be the best approach for this
thank you all in advance
I tried this structure but it shows syntax error
CREATE FUNCTION Countif (columnx nvarchar(20),value_compare nvarchar(10))
RETURNS Count_column_x AS
BEGIN
IF columnx=value_compare
count(columnx)
END
RETURN
END
Also, I don't know how to add each count to the actual table I am trying to create
Conditional counting (or any conditional aggregation) can often be done inline by placing a CASE expression inside the aggregate function that conditionally returns the value to be aggregated or a NULL to skip.
An example would be COUNT(CASE WHEN SelectMe = 1 THEN 1 END). Here the aggregated value is 1 (which could be any non-null value for COUNT(). (For other aggregate functions, a more meaningful value would be provided.) The implicit ELSE returns a NULL which is not counted.
For you problem, I believe the first thing to do is to UNPIVOT your data, placing the column name and values side-by-side. You can then group by value and use conditional aggregation as described above to calculate your results. After a few more details to add (1) a totals row using WITH ROLLUP, (2) a CASE statement to adjust the labels for the blank and total rows, and (3) some ORDER BY tricks to get the results right and we are done.
The results may be something like:
SELECT
CASE
WHEN GROUPING(U.Value) = 1 THEN 'Total'
WHEN U.Value = '' THEN 'Blank'
ELSE U.Value
END AS Value,
COUNT(CASE WHEN U.Col = 'x' THEN 1 END) AS x,
COUNT(CASE WHEN U.Col = 'y' THEN 1 END) AS y
FROM #Data D
UNPIVOT (
Value
FOR Col IN (x, y)
) AS U
GROUP BY U.Value WITH ROLLUP
ORDER BY
GROUPING(U.Value),
CASE U.Value WHEN 'Pass' THEN 1 WHEN 'Fail' THEN 2 WHEN '' THEN 3 ELSE 4 END,
U.VALUE
Sample data:
x
y
Pass
Pass
Pass
Fail
Pass
Fail
Sample results:
Value
x
y
Pass
3
1
Fail
1
1
Blank
0
2
Total
4
4
See this db<>fiddle for a working example.
I think you don't need a generic solution like a function with value as parameter.
Perhaps, you could create a view grouping your data and after call this view filtering by your value.
Your view body would be something like that
select value, count(*) as Total
from table_name
group by value
Feel free to explain your situation better so I could help you.
You can do this by grouping by the status column.
select status, count(*) as total
from some_table
group by status
Rather than making a whole new table, consider using a view. This is a query that looks like a table.
create view status_counts as
select status, count(*) as total
from some_table
group by status
You can then select total from status_counts where status = 'pass' or the like and it will run the query.
You can also create a "materialized view". This is like a view, but the results are written to a real table. SQL Server is special in that it will keep this table up to date for you.
create materialized view status_counts with distribution(hash(status))
select status, count(*) as total
from some_table
group by status
You'd do this for performance reasons on a large table which does not update very often.

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Rewriting the HAVING clause and COUNT function

I'm trying to conceptually understand how to rewrite the HAVING clause and COUNT function.
I was asked "Find the names of all reviewers who have contributed three or more ratings. (As an extra challenge, try writing the query without HAVING or without COUNT.)" in relation this this simple database: http://sqlfiddle.com/#!5/35779/2/0
The query with HAVING and COUNT is easy. Without, I'm having difficulty.
Help would be very much appreciated. Thank you.
One option would be to use SUM(1) in place of COUNT in a subquery, and using WHERE instead of HAVING:
SELECT b.name
FROM (SELECT rID,SUM(1) Sum1
FROM rating
GROUP BY rID
)a
JOIN reviewer b
ON a.rID = b.rID
WHERE Sum1 >= 3
Demo: SQL Fiddle
Update: Some explanation of SUM(1):
Adding a constant to a SELECT statement will result in that value being repeated for every row returned, for example:
SELECT rID
,1 as Col1
FROM rating
Returns:
| rID | Col1 |
|-----|------|
| 201 | 1 |
| 201 | 1 |
| 202 | 1 |
| 203 | 1 |
| 203 | 1 |
......
SUM(1) is applying a constant 1 to every row and aggregating it.

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)