Any uses of allowing literal NULL with operators? - sql

Some databases support using literal NULL as an operand while others do not. As an example:
SELECT 1 + NULL
Snowflake: null
BigQuery: error
MySQL: null
Postgres: null
SQLServer: null
I'm trying to determine how I should handle this in an application, and was wondering if there are ever any (valid) use cases for when it might be useful to have a literal null in an expression? This could also include testing.

Writing the expression 1 + NULL by itself is fairly meaningless, as we would expect it to always evaluate to NULL (except, apparently, on BigQuery, where it errors out). However, 1 + NULL could arise as the result of some other calculation. Consider the following data and query:
id | val
1 | NULL
2 | 5
2 | 10
3 | NULL
3 | 7
and the query:
SELECT id, 1 + SUM(val) AS total
FROM yourTable
GROUP BY id;
Here for id = 1 the aggregate total would evaluate to 1 + NULL, which would be NULL on most databases. One way around this would be to use COALESCE():
SELECT id, 1 + COALESCE(SUM(val), 0) AS total
FROM yourTable
GROUP BY id;
Now for id groups having only NULL values, we would replace that NULL sum by zero.

Related

How to create a table to count with a conditional

I have a database with a lot of columns with pass, fail, blank indicators
I want to create a function to count each type of value and create a table from the counts. The structure I am thinking is something like
| Value | x | y | z |
|-------|------------------|-------------------|---|---|---|---|---|---|---|
| pass | count if x=pass | count if y=pass | count if z=pass | | | | | | |
| fail | count if x=fail | count if y=fail |count if z=fail | | | | | | |
| blank | count if x=blank | count if y=blank | count if z=blank | | | | | | |
| total | count(x) | count(y) | count (z) | | | | | | |
where x,y,z are columns from another table.
I don't know which could be the best approach for this
thank you all in advance
I tried this structure but it shows syntax error
CREATE FUNCTION Countif (columnx nvarchar(20),value_compare nvarchar(10))
RETURNS Count_column_x AS
BEGIN
IF columnx=value_compare
count(columnx)
END
RETURN
END
Also, I don't know how to add each count to the actual table I am trying to create
Conditional counting (or any conditional aggregation) can often be done inline by placing a CASE expression inside the aggregate function that conditionally returns the value to be aggregated or a NULL to skip.
An example would be COUNT(CASE WHEN SelectMe = 1 THEN 1 END). Here the aggregated value is 1 (which could be any non-null value for COUNT(). (For other aggregate functions, a more meaningful value would be provided.) The implicit ELSE returns a NULL which is not counted.
For you problem, I believe the first thing to do is to UNPIVOT your data, placing the column name and values side-by-side. You can then group by value and use conditional aggregation as described above to calculate your results. After a few more details to add (1) a totals row using WITH ROLLUP, (2) a CASE statement to adjust the labels for the blank and total rows, and (3) some ORDER BY tricks to get the results right and we are done.
The results may be something like:
SELECT
CASE
WHEN GROUPING(U.Value) = 1 THEN 'Total'
WHEN U.Value = '' THEN 'Blank'
ELSE U.Value
END AS Value,
COUNT(CASE WHEN U.Col = 'x' THEN 1 END) AS x,
COUNT(CASE WHEN U.Col = 'y' THEN 1 END) AS y
FROM #Data D
UNPIVOT (
Value
FOR Col IN (x, y)
) AS U
GROUP BY U.Value WITH ROLLUP
ORDER BY
GROUPING(U.Value),
CASE U.Value WHEN 'Pass' THEN 1 WHEN 'Fail' THEN 2 WHEN '' THEN 3 ELSE 4 END,
U.VALUE
Sample data:
x
y
Pass
Pass
Pass
Fail
Pass
Fail
Sample results:
Value
x
y
Pass
3
1
Fail
1
1
Blank
0
2
Total
4
4
See this db<>fiddle for a working example.
I think you don't need a generic solution like a function with value as parameter.
Perhaps, you could create a view grouping your data and after call this view filtering by your value.
Your view body would be something like that
select value, count(*) as Total
from table_name
group by value
Feel free to explain your situation better so I could help you.
You can do this by grouping by the status column.
select status, count(*) as total
from some_table
group by status
Rather than making a whole new table, consider using a view. This is a query that looks like a table.
create view status_counts as
select status, count(*) as total
from some_table
group by status
You can then select total from status_counts where status = 'pass' or the like and it will run the query.
You can also create a "materialized view". This is like a view, but the results are written to a real table. SQL Server is special in that it will keep this table up to date for you.
create materialized view status_counts with distribution(hash(status))
select status, count(*) as total
from some_table
group by status
You'd do this for performance reasons on a large table which does not update very often.

select count by value

Given a table messages with the following fields:
id | Number
customer_id | Number
source | VARCHAR2
...
I want to know how many messages each customer has, but I want to differentiate between messages where source equals to 'xml' and all other sources.
My query so far
SELECT customer_id,
case when source = 'xml' then 'xml' else 'manual' end as xml,
count(*)
FROM MESSAGES
GROUP BY customer_id,
case when source = 'xml' then 'xml' else 'manual' end;
which gives me a result similar to this:
customer_id | xml | count
----------------------------
1 | xml | 12
1 | manual | 34
2 | xml | 54
3 | xml | 77
3 | manual | 1
...
This is rather ugly in two ways:
I have to repeat the case statement in both the field list and in the group list
I now have two rows per customer.
Q: Is it possible to formulate a query, such that the result looks like this instead?
customer_id | xml | manual
--------------------------
1 | 12 | 34
2 | 54 | 0
3 | 11 | 1
You are looking for conditional aggregation:
SELECT customer_id,
count(case when source = 'xml' then 1 end) as xml_count,
count(case when source <> 'xm' then 1 end) as manual_count
FROM MESSAGES
GROUP BY customer_id
This works because aggregates ignore NULL values and the result of the CASE will be NULL if source does not contain the value from the case condition.
Use conditional aggregation.
SELECT customer_id,
sum(case when source = 'xml' then 1 else 0 end) as xml,
sum(case when source <> 'xml' then 1 else 0 end) as manual
FROM MESSAGES
GROUP BY customer_id
This assumes the source column is non null. If it can be null use coalesce or nvl in the case expression so the comparison gives you expected results.
This will work, it doesn't appear you have a source called 'manual'. COUNT or SUM will give you the same difference.
SELECT
customer_id
,ISNULL(COUNT(CASE WHEN source = 'xml' THEN 1 END),0) xml
,ISNULL(COUNT(CASE WHEN source <> 'xml' OR source IS NULL THEN 1 END),0) manual
FROM Messages
GROUP BY customer_id
This will allow for zero to appear where you usually would see a NULL value, your sample has a zero rather than a null.
Here is a fancy solution (it does almost exactly what vkp's solution does), using the PIVOT operation introduced in Oracle 11.1. Note how the distinction between 'xml' and all others (including NULL) is dealt with in the subquery.
select *
from (select customer_id, case when source = 'xml' then 'xml' else 'other' as source
from messages)
pivot (count(*) for source in ('xml' as xml, 'other' as other))
;
There is other way by using decode function apart from CASE:
SELECT cust_id,
COUNT(DECODE(source,'xml','xml'))"XML",
COUNT(DECODE(source,'manual','manual'))"manual"
FROM MESSAGES
GROUP BY cust_id;
But, this won't show result when you have null as source.

select max value in a group of consecutive values

How do you do to retrieve only the max value of a group with only consecutive values?
I have a telephone database with only unique values and I want to get only the highest number of each telephone number group TelNr and I am struggling.
id | TeNr | Position
1 | 100 | SLMO2.1.3
2 | 101 | SLMO2.3.4
3 | 103 | SLMO2.4.1
4 | 104 | SLMO2.3.2
5 | 200 | SLMO2.5.1
6 | 201 | SLMO2.5.2
7 | 204 | SLMO2.5.5
8 | 300 | SLMO2.3.5
9 | 301 | SLMO2.6.2
10 | 401 | SLMO2.4.8
Result should be:
TelNr
101
104
201
204
301
401
I have tried almost every tip I could find so far and whether I get all TelNr or no number at all which is useless in my case.
Any brilliant idea to run this with SQLITE?
So you're searching for gaps and want to get the first value of those gaps.
This is probably the best way to get them, try to check for a row with the current TeNr plus 1 and if there's none you found it:
select t1.TeNr, t1.TeNr + 1 as unused_TeNr
from tab as t1
left join Tab as t2
on t2.TeNr = t1.TeNr + 1
where t2.TeNr is null
Edit:
To get the range of missing values you need to use some old-style SQL as SQLite doesn't seem to support ROW_NUMBER, etc.
select
TeNr + 1 as RangeStart,
nextTeNr - 1 as RangeEnd,
nextTeNr - TeNr - 1 as cnt
from
(
select TeNr,
( select min(TeNr) from tab as t2
where t2.TeNr > t1.TeNr ) as nextTeNr
from tab as t1
) as dt
where nextTeNr > TeNr + 1
It's probably not very efficient, but might be ok if the number of rows is small and/or there's a index on TeNr.
Getting each value in the gap as a row in your result set is very hard, if your version of SQLite supports recursive queries:
with recursive cte (TeNr, missing, maxTeNr) as
(
select
min(TeNr) as TeNr, -- start of range of existing numbers
0 as missing, -- 0 = TeNr exists, 1 = TeNr is missing
max(TeNr) as maxTeNr -- end of range of existing numbers
from tab
union all
select
cte.TeNr + 1, -- next TeNr, if it doesn't exists tab.TeNr will be NULL
case when tab.TeNr is not null then 0 else 1 end,
maxTeNr
from cte left join tab
on tab.TeNr = cte.TeNr + 1
where cte.TeNr + 1 < maxTeNr
)
select TeNr
from cte
where missing = 1
Depending on your data this might return a huge amount of rows.
You might also use the result of the previous RangeStart/RangeEnd query as input to this recursion.

SQL Count empty fields

I'm not sure if this is possible or if it is, how to do it -
I have the following data in a database -
id | improve | timeframe | criteria | impact
-------+------------+-------------+-----------+---------
1 | | Test | Test | Test
2 | Test | | Test |
3 | | Test | |
-------+------------+-------------+-----------+---------
Ignoring the id column, how can I determine the number of fields out of the remaining 12 that are not null using an SQL query?
I have started with -
SELECT improve, timeframe, impact, criteria
FROM data
WHERE improve IS NOT NULL
AND timeframe IS NOT NULL
AND impact IS NOT NULL
AND criteria IS NOT NULL;
This only returns the number of rows, ie. 3.
Any ideas?
Thanks.
SELECT count(improve) + count(timeframe) + count(impact) + count(criteria) FROM data
Something like this may get you going in the right direction
SELECT
SUM(CASE WHEN improve IS NULL THEN 0 ELSE 1 END +
CASE WHEN timeframe IS NULL THEN 0 ELSE 1 END +
CASE WHEN criteria IS NULL THEN 0 ELSE 1 END +
CASE WHEN impact IS NULL THEN 0 ELSE 1 END)
from
data
SELECT id, COUNT(improve) + COUNT(timeframe) + COUNT(impact) + COUNT(criteria) FROM data GROUP BY id;
IF you're using SQL Server, use DATALENGTH().
SELECT improve, timeframe, impact, criteria
FROM data
WHERE DATALENGTH(improve) > 0
AND DATALENGTH(timeframe) > 0
AND DATALENGTH(impact) > 0
AND DATALENGTH(criteria) >0;
DATALENGTH returns the length of the string in bytes, including trailing spaces. It sounded as though you're OK with blank fields, so DATALENGTH does the job. Otherwise, you could also use LEN(), which would trim any trailing space.
IF you are using MySQL, you can use CHARACTER_LENGTH, which removes trailing white space and then gives you a character count of the field you want to check.
SELECT Sum(case when improve is null then 0 else 1 end +
case when timeframe is null then 0 else 1 end +
case when impact is null then 0 else 1 end +
case when criteria is null then 0 else 1 end)
FROM data
group by improve, timeframe, impact, criteria

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)