Generalized way to remove empty row given by aggregate functions in SQLite - sql

The below query would give an empty NULL row when no data is present in the Table or when WHERE fails.
SELECT MAX(Number) AS Number
FROM Table
WHERE Number > 10;
The NULL row result looks like this:-
Number
1
NULL
So to detect if the query gives actual data or not I had to do this:-
SELECT 1
FROM (
SELECT MAX(Number) AS Number
FROM Table
WHERE Number > 10
)
WHERE Number IS NOT NULL;
Now, this will give me 1 if a max number exists and 0 if (max it doesn't exist or empty table or WHERE Number < 10).
So, is this how we generally tackle the empty row produced by the aggregate function, or is there a more generalized way to do that.
An example SQLite fiddle showcasing the use case.

You are overcomplicating things with the use of the aggregate function MAX().
This part of your code:
EXISTS (
SELECT 1
FROM (
SELECT MAX(TourCompletionDateTime) AS TourCompletionDateTime
FROM Details
WHERE TourCompletionDateTime < '2022-07-26T09:36:00.730589Z'
)
WHERE TourCompletionDateTime IS NOT NULL
)
is equivalent to just:
EXISTS (
SELECT 1
FROM Details
WHERE TourCompletionDateTime < '2022-07-26T09:36:00.730589Z'
)
See the demo.

Related

Access query finding the duplicates without using DISTINCT

i have this query
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo;
and the PersonalInfo.k-commission is a multi value field. the CommissionAbsent shows duplicate values for each k-commission value. when i use DISTINCT i get an error saying that the keyword cannot be used with a multi value field.
now i want to remove the duplicates and show only one result for each. i tried using a WHERE but i dont know how.
edit: i have a lot more columnes and in the example i only showed the few i need.
You can use GROUP BY and COUNT to solve your problem, here is an example for it
SELECT clmn1, clmn2, COUNT(*) as count
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1;
the query groups the rows in the table by the clmn1 and clmn2 columns, and counts the number of occurrences of each group. The HAVING clause is then used to filter the groups and only return the groups that have a count greater than 1, which indicates duplicates.
If you want to select all, then you can do like this
SELECT *
FROM table
WHERE (clmn1, clmn2) IN (SELECT clmn1, clmn2
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1)
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo
GROUP BY PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value]))
HAVING COUNT(*) > 1

Why count doesn't return 0 on empty table

I need to count a table's rows but I was prompt with an unusual behavior of count(*).
count(*) does not return results when I use a multi column select on an empty table. But returns expected results (0 rows) if I remove the other columns from the select statement (Single column Select).
In the code below you will find multiple tests to show you what I'm talking about.
The structure of the code bellow is:
1) Creation of a table
2) Multi column select on empty table tests, which returns unexpected results
3) Single column select on empty table test, which returns the expected result
4) Multi column select on filled table test, which returns the expected result
Question
Given this results my question is:
Why does the a multi column select on empty table doesn't return 0, and a single column select returns it?
Expected Results definition
Expected results to me means:
if a table is empty, count(*) returns 0.
If a table is not empty count returns the row count
--CREATE TEST TABLE
CREATE TABLE #EMPTY_TABLE(
ID INT
)
DECLARE #ID INT
DECLARE #ROWS INT
--MULTI COLUMN SELECT WITH EMPTY TABLE
--assignment attempt (Multi-column SELECT)
SELECT #ID = ID, #ROWS = COUNT(*)
FROM #EMPTY_TABLE
--return Null instead of 0
SELECT #ROWS Test_01 , ISNULL(#ROWS, 1 )'IS NULL'
--Set variable with random value, just to show that not even the assignment is happening
SET #ROWS = 29
--assignment attempt (Multi-column SELECT)
SELECT #ID = ID, #ROWS = COUNT(*)
FROM #EMPTY_TABLE
--return 29 instead of 0
SELECT #ROWS Test_02
--SINGLE COLUMN SELECT WITH EMPTY TABLE
--assignment attempt (Single-column SELECT)
SELECT #ROWS = COUNT(*)
FROM #EMPTY_TABLE
--returns 0 the expected result
SELECT #ROWS Test_03
--MULTI COLUMN SELECT WITH FILLED TABLE
--insert a row
INSERT INTO #EMPTY_TABLE(ID)
SELECT 1
--assignment attempt
SELECT #ID = ID, #ROWS = COUNT(*)
FROM #EMPTY_TABLE
--Returns 1
SELECT #ROWS Test_04
So I read up on the grouping mechanisms of sybase, and came to the conclusion, that in your query you have a "Transact-SQL extended column" (see: docs on group by under Usage -> Transact-SQL extensions to group by and having):
A select list that includes aggregates can include extended columns that are not arguments of aggregate functions and are not included in the group by clause. An extended column affects the display of final results, since additional rows are displayed.* (emphasis mine)
(regarding the *: this last statement is actually wrong in your specific case, since one rows turn into zero rows)
also in the docs on group by under Usage -> How group by and having queries with aggregates work you'll find:
The group by clause collects the remaining rows into one group for each unique value in the group by expression. Omitting group by creates a single group for the whole table. (emphasis mine)
So essentially:
having a COUNT(*) will trigger the whole query to be an aggregate, since it is an aggregate function (causing an implicit GROUP BY NULL)
adding ID in the SELECT clause, will then expand the first group (consisting of no rows) into its contained rows (none) and join it together with the aggregate result columns.
in your case: the count is 0, since you also query for the id, for every id a row will be generated to which the count is appended. however, since your table has no rows, there are no result rows whatsoever, thus no assignments. (Some examples are in the linked docs, and since there is no id and an existing id must be in the id column of your result, ...)
to always get the count, you should probably only SELECT #ROWS = COUNT(*) and select ids separately.
If you are counting rows and trying to get ID when there are no rows - you need to check if they EXISTS.
Something like this:
SELECT COUNT(*),
(CASE WHEN EXISTS(SELECT ID FROM EMPTY_TABLE) THEN (SELECT ID FROM EMPTY_TABLE) ELSE 0 END) AS n_id
FROM EMPTY_TABLE
In case with more than 1 row you will get subquery error.
This query:
SELECT #ID = ID, #ROWS = COUNT(*)
FROM #EMPTY_TABLE
The problem is that COUNT(*) makes this an aggregation query, but you also want to return ID. There is no GROUP BY.
I suspect your ultimate problem is that you are ignoring such errors.
This SQL Fiddle uses SQL Server (which is similar to Sybase). However, the failure is quite general and due to a query that would not work in almost any database.

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:
The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.
HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total
In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go

Sqlite: Setting default value for a max sub-query if result is null

I want to increment a sequence number for subgroups within a table, but if the subgroup does not exist then the sequence should start with 1:
For example, in the following, we want sequence to be set to 1 if there exists no records in the table with class=5; if there exists such records, then sequence should take the value max sequence (in the subgroup class=5) + 1:
update order set class=5, sequence=(select max(sequence) from order
where class=5)+1 where order_id=104;
The problem is the above doesn't work for the initial case.
In these situations, function COALESCE() comes very handy:
UPDATE order
SET class = 5,
sequence = coalesce(
(SELECT max(sequence)
FROM order
WHERE class=5),
0
) + 1
WHERE order_id = 104
Another good thing about COALESCE that it is supported by most other SQL engines - MySQL, Postgres, etc...
Just surround your query with the IFNULL( QUERY, 0 ) function
http://www.sqlite.org/lang_corefunc.html#ifnull

Fast approximate counting in Postgres

I'm querying my database (Postgres 8.4) with something like the following:
SELECT COUNT(*) FROM table WHERE indexed_varchar LIKE 'bar%';
The complexity of this is O(N) because Postgres has to count each row. Postgres 9.2 has index-only scans, but upgrading isn't an option, unfortunately.
However, getting an exact count of rows seems like overkill, because I only need to know which of the following three cases is true:
Query returns no rows.
Query returns one row.
Query returns two or more rows.
So I don't need to know that the query returns 10,421 rows, just that it returns more than two.
I know how to handle the first two cases:
SELECT EXISTS (SELECT COUNT(*) FROM table WHERE indexed_varchar LIKE 'bar%');
Which will return true if one or more rows exists and false is none exist.
Any ideas on how to expand this to encompass all three cases in a efficient manner?
SELECT COUNT(*) FROM (
SELECT * FROM table WHERE indexed_varchar LIKE 'bar%' LIMIT 2
) t;
Should be simple. You can use LIMIT to do what what you want and return data (count) using a CASE statement.
SELECT CASE WHEN c = 2 THEN 'more than one' ELSE CAST(c AS TEXT) END
FROM
(
SELECT COUNT(*) AS c
FROM (
SELECT 1 AS c FROM table WHERE indexed_varchar LIKE 'bar%' LIMIT 2
) t
) v