Access SQL unique records with latest date including null dates from single table

Access SQL unique records with latest date including null dates from single table - sql

I have a table with the following sample structure:
Identifier| Latitude | Longitude |...many columns...|DateWhenStatusObserved|ID|
----------+----------+---------- +------------------+----------------------+--+
2823DC012 | 28.76285 | 23.70195 | ... | 1994/10/28| 1|
2823DC012 | 28.76285 | 23.70195 | ... | 1995/04/05| 2|
2822DD030 | 28.76147 | 22.98270 | ... | NULL| 3|
...
There are many more columns, but these columns do not have to be evaluated, all columns should just be returned from the query.
I would like the SQL query to return only unique records for the Identifier column with the latest date per unique Identifier. Unfortunately there are also records were date is NULL in the DateWhenStatusObserved column and in many instances the only record for an Identifier (geosite) has a NULL date.
There are already many answers for similar SQL questions such as:
How can I include null values in a MIN or MAX?
SELECT only rows with either the MAX date or NULL
http://bytes.com/topic/access/answers/719627-create-query-evaluate-max-date-recognizing-null-high-value
These are however not specific on how exactly does one use the iif statement with an aggregate Max function to allow the NULL date records to pass through while maintaining unique identifier (geosite) records.
I only get non-NULL max date records returned using a subquery and combination of Max(IIF()). I finally got a reasonable result from a basic subquery without Joins and relied on WHERE clauses, but I get duplicate identifier records from NULL dates, because I have to use OR instead of AND to get any rows returned.
Here is one of my attempts returning only non-NULL max date records:
SELECT BasicInfoTable.*
FROM Basic_information_WUA AS BasicInfoTable
INNER JOIN
(
SELECT Identifier, MAX (IIF(DateWhenStatusObserved IS NULL, 0, DateWhenStatusObserved)) AS MaxDate
FROM Basic_information_WUA
GROUP BY Identifier
)
AS Table2 ON BasicInfoTable.Identifier = Table2.Identifier AND BasicInfoTable.DateWhenStatusObserved = Table2.MaxDate;
So why is this not working for the NULL date cases?
I would appreciate any help with finding the near-optimal query for this problem.
Thanks

You need to provide similar (is NULL) logic to BasicInfoTable.DateWhenStatusObserved = Table2.MaxDate. Nulls cannot be "compared".

Related

Is the ordering of a GROUP BY with a MAX aggregate well defined?

Let's assume I run the following in SQLite:
CREATE TABLE my_table
(
id INTEGER PRIMARY KEY,
NAME VARCHAR(20),
date DATE,
num INTEGER,
important VARCHAR(20)
);
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-01-01', 10, 'Important 1');
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-02-01', 20, 'Important 2');
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '1999-12-01', 30, 'Important 3');
The table looks like this:
id
NAME
date
num
important
1
A
2000-01-01
10
Important 1
2
A
2000-02-01
20
Important 2
3
A
1999-12-01
30
Important 3
If I execute:
SELECT id
FROM my_table
GROUP BY NAME;
the results are:
+----+
| id |
+----+
| 1 |
+----+
If I execute:
SELECT id, MAX(date)
FROM my_table
GROUP BY NAME;
The results are:
+----+------------+
| id | max(date) |
+----+------------+
| 2 | 2000-02-01 |
+----+------------+
And if I execute:
SELECT id,
MAX(date),
MAX(num)
FROM my_table
GROUP BY NAME;
The results are:
+----+------------+----------+
| id | max(date) | max(num) |
+----+------------+----------+
| 3 | 2000-02-01 | 30 |
+----+------------+----------+
My question is, is this well defined? Specifically, am I guaranteed to always get id = 2 when doing the second query (with the single Max(date) aggregate), or is this just a side effect of how SQLite is likely ordering the table to grab the Max before grouping?
I ask this because I specifically do want id = 2. I will then execute another query that selects the important field for that row (for my actual problem the first query would return multiple ids and I'd select all important fields for all those rows at once.
Additionally, this is all happening in an iOS Core Data query, so I'm not able to do more complicated subqueries. If I knew that the ordering of a GROUP BY is defined by an aggregate then I'd feel pretty confident my queries wouldn't break (until Apple moves away from SQLite for Core Data).
Thanks!

From the Sqlite manual
2.5. Bare columns in an aggregate query
The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:
SELECT a, b, sum(c) FROM tab1 GROUP BY a;
In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.
Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way.
If bare columns appear in an aggregate query that lacks a GROUP BY clause, and the number of input rows is zero, then the values of the bare columns are arbitrary. For example, in this query:
SELECT count(*), b FROM tab1;
If the tab1 table contains no rows (of count(*) evaluates to 0) then the bare column "b" will have an arbitrary and meaningless value.
Most other SQL database engines disallow bare columns. If you include a bare column in a query, other database engines will usually raise an error. The ability to include bare columns in a query is an SQLite-specific extension.
https://www.sqlite.org/lang_select.html

am I guaranteed to always get id = 2 when doing the second query (with
the single Max(date) aggregate), or is this just a side effect of how
SQLite is likely ordering the table to grab the Max before grouping?
Yes, the result that you get is guaranteed because it is documented in Bare columns in an aggregate query.
The value for the column id that you get is from the row that contains the max date.

SQL Query with part of the key possibly being NULL

I've been working on a SQL query which needs to pull a value with a two-column key, where one of the columns may be null.And if it's null, I want to pick that value only if there is no row with the specific key
So.
CUSTOM_____PLAN_____COST
VENDCO_____LMNK_____50
VENDCO_____null_____25
BALLCO_____null_____10
I'm trying to run a query that will pull this into one field, i.e., the value of VENDCO at 50, and the value of BUYCO at 10, ignoring the VENDCO row with 25. This would be as part of a joined subquery, so I can't use the actual keys of VENDCO/BUYCO etc. Essentially, pick the cost value with the plan if it exists, but the one where it's null if the plan is not there.
It might also be worthwhile to point out that if I "select * from table where PLAN is null" I don't get results -- I have to select where PLAN=''. I'm not sure if that indicates anything weird about the data.
Hope I'm making myself clear.

I think that not exists should do what you want:
select t.*
from mytable t
where
plan is not null
or not exists (
select 1 from mytable t1 where t1.custom = t.custom and t1.plan is not null
)
Basically this gives priority to rows where plan is not null in groups sharing the same custom.
Demo on DB Fiddle:
CUSTOM | PLAN | COST
:----- | :--- | ---:
VENDCO | LMNK | 50
BALLCO | null | 10

How do I select the latest rows for all users?

I have a table similar to the following:
=> \d table
Table "public.table"
Column | Type | Modifiers
-------------+-----------------------------+-------------------------------
id | integer | not null default nextval( ...
user | bigint | not null
timestamp | timestamp without time zone | not null
field1 | double precision |
As you can see, it contains many field1 values over time for all users. Is there a way to efficiently get the latest field1 value for all users in one query (i.e. one row per user)? I'm thinking I might have to use some combination of group by and select first.

Simplest with DISTINCT ON in Postgres:
SELECT DISTINCT ON (id)
id, timestamp, field1
FROM tbl
ORDER BY id, timestamp DESC;
More details:
https://dba.stackexchange.com/questions/49540/how-do-i-efficiently-get-the-most-recent-corresponding-row/49555#49555
Select first row in each GROUP BY group?
Aside: Don't use timestamp as column name. It's a reserved word in SQL and a basic type name in Postgres.

Why do WHERE and HAVING exist as separate clauses in SQL?

I understand the distinction between WHERE and HAVING in a SQL query, but I don't see why they are separate clauses. Couldn't they be combined into a single clause that could handle both aggregated and non-aggregated data?

Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.
Here's another rule: You can't use HAVING unless you also use GROUP BY.
The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.
ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
Here's the difference:
SELECT `value` v FROM `table` WHERE `v`>5;
Error #1054 - Unknown column 'v' in 'where clause'
SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows
WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
This is because WHERE clause filters data before select, but HAVING clause filters data after select.
So put the conditions in WHERE clause will be more effective if you have many many rows in a table.
Try EXPLAIN to see the key difference:
EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table | range | value | value | 4 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE | table | index | NULL | value | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
You can see either WHERE or HAVING uses index, but the rows are different.
So there is a need of both of them especially when we need grouping and additional filters.

This question seems to illustrate a misunderstanding that WHERE and HAVING are both missing up to 1/2 of the information necessary to fully process a query.
Consider the following SQL:
drop table if exists foo; create table foo (
ID int,
bar int
); insert into foo values (1, 1);
select now() as d, bar as b
from foo
where bar = 1 and d <= now()
having bar = 1 and ID = 1
;
In the where clause, d is not available because the selected items have not been processed to create it yet.
In the having clause ID has been discarded because it was not selected. In aggregate queries ID may not even have meaning in context of multiple rows combined into one. ID may also be meaningless when joining different tables into a single result.

Could it be done? Sure, but on the back-end it'd do the same as it does now, because you have to aggregate something before you can filter based on that aggregation. Ultimately that's the reason, it's a logical separation of different processes. Why waste resources aggregating records you could have filtered with a WHERE?

The question could only be fully answered by the designer since it asks intent. But the implication is that both clauses do the same thing only against aggregated vs. non-aggregated data. That's not true. "The HAVING clause is typically used together with the GROUP BY clause to filter the results of aggregate values. However, HAVING can be specified without GROUP BY."
As I understand it, the important thing is that "The HAVING clause specifies additional filters that are applied after the WHERE clause filters."
http://technet.microsoft.com/en-us/library/ms179270(v=sql.105).aspx

Oracle - SQL - Count multiple fields

Using Oracle 10G
Say for example I have a table with three fields in it, I'd like one query which selects the counts of each column where they are not null. Field name
----------------------------------
| strTest1 | strTest2 | strTest3 |
----------------------------------
I know how to get the count of each one individually:
select count(*) from tablename where strTest1 is not null
but I'd like to know if it's possible to do this within one query for all 3 fields.
Thanks

It sounds like you want:
SELECT COUNT(STRTEST1), COUNT(STRTEST2), COUNT(STRTEST3) FROM YOUR_TABLE

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Access SQL unique records with latest date including null dates from single table - sql

You need to provide similar (is NULL) logic to BasicInfoTable.DateWhenStatusObserved = Table2.MaxDate. Nulls cannot be "compared".

Related

Is the ordering of a GROUP BY with a MAX aggregate well defined?

SQL Query with part of the key possibly being NULL

How do I select the latest rows for all users?

Why do WHERE and HAVING exist as separate clauses in SQL?

Oracle - SQL - Count multiple fields

Categories

Resources