Setting NULL values to a custom value in Access-SQL - sql

When LEFT JOINing two tables, is there a way to set the cells which can not be matched (NULL) to a custom value? So e.g. when the result returns, the NULL-cells actually HAVE a value, e.g. "N/A" or "Not found"?
I want to do this in MS Access 2003
Example:
| id | value | | id | other value |
|----|-------| LEFT JOIN |----|-------------|
| 1 | hello | -- id --> | 2 | world |
| 2 | you |
results in:
| id | value | other value |
| 1 | hello | NULL |
| 2 | you | world |
but should be:
| id | value | other value |
| 1 | hello | custom-val |
| 2 | you | world |

You can use Nz() to substitute an arbitrary value for a NULL;
SELECT Nz(F, "Not Present") FROM T
Would return either the value of field F, or "Not Present" if F were NULL.

Bear in mind that SQL’s outer join is a kind of relational union which is explicitly designed to project null values. You want to avoid using the null value (a good thing too, in my opinion), therefore you should avoid using outer joins. Note that modern relational languages have dispensed with the concept of null and outer join entirely (see endnote).
This outer join:
SELECT DISTINCT T1.id, T1.value, T2.other_value
FROM T1
LEFT OUTER JOIN T2
ON T1.id = T2.id;
…is semantically equivalent to this SQL code:
SELECT T1.id, T1.value, T2.other_value
FROM T1
INNER JOIN T2
ON T1.id = T2.id
UNION
SELECT T1.id, T1.value, NULL
FROM T1
WHERE NOT EXISTS (
SELECT *
FROM T2
WHERE T1.id = T2.id
);
The second query may look long winded but that’s only because of the way SQL has been designed/evolved. The above is merely a natural join, a union and a semijoin. However, SQL has no semijoin operator, requires you to specify column lists in the SELECT clause and to write JOIN clauses if your product hasn’t implemented Standard SQL’s NATURAL JOIN syntax (Access hasn’t), which results in a lot of code to express something quite simple.
Therefore, you could write code such as the second query above but using an actual default value rather than the null value.
The only relational game in town is the specification of a D language know as "The Third Manifesto" by Chris Date and Hugh Darwen. It explicitly rejects Codd's nulls (latterly Codd proposed two kinds of null) doesn't accommodate an outer join operator (in more recent writings the authors have proposed relation-valued attributes as an alternative to outer join). Specific citations:
C. J. Date (2009): SQL and Relational Theory: How to Write Accurate SQL Code: Ch 4, 'A remark on outer join' (p.84)
Darwen, Hugh (2003): The Importance of Column Names: "Note that in Tutorial D, the only 'join' operator is called JOIN, and it means 'natural join'." (p.16)
C. J. Date and Hugh Darwen (2006): Databases, Types and the Relational Model: The Third Manifesto: Proscription 4: "D shall include no concept of a 'relation' in which some 'tuple' includes some 'attribute' that does not have a value."

Related

How to join two tables when a matching row may not potentially be in one of the tables

I need to merge two tables, spring_stats and summer_stats, into one table in which some of the columns are the same and thus should be summed.
Each table contains (among others) the fields hunter_id, year, fowl, deer, bears where the last three represent numeric amount each hunter has caught.
The end result should be
hunter_id, year, spring.fowl + summer.fowl, spring.deer + summer.deer, etc
HOWEVER, some of the hunters may not have participated in the summer session but participated in the spring session (or vice versa). In this case the standard
SELECT hunter_id, year, spring.fowl + summer.fowl AS total_fowl, ... FROM spring, summer
WHERE spring.hunter_id = summer.hunter_id AND spring.year = summer.year
would not work as hunters who were active in only the spring or summer session would not be recorded and included, whereas I need all hunters included, regardless of whether they were active in only one session or both.
You are using an ancient type of table join. Instead adopt the newer (since the early 90s) join syntax. Here you want a FULL OUTER JOIN:
SELECT COALESCE(summer.hunter_id, spring.hunter_id) as hunter_id,
COALESCE(summer.year, spring.year) as year,
spring.fowl + summer.fowl AS total_fowl, ...
FROM spring
FULL OUTER JOIN summer
ON spring.hunter_id = summer.hunter_id
AND spring.year = summer.year
You can read up about FULL OUTER JOIN here at the postgres documentation site:
First, an inner join is performed. Then, for each row in T1 that does
not satisfy the join condition with any row in T2, a joined row is
added with null values in columns of T2. Also, for each row of T2 that
does not satisfy the join condition with any row in T1, a joined row
with null values in the columns of T1 is added.
The COALESCE() function will first use the hunter_id from the summer table unless it's NULL (due to the FULL OUTER JOIN) in which case it will pick the hunter_id from the spring table.
I think it's also worth mentioning that having tables specific to your seasons may not make sense. Instead a single table where Season is just an added column may be a better schema:
season | year | hunter_id | animal | animal_count
summer | 2020 | 1 | fowl | 3
spring | 2020 | 1 | deer | 2
summer | 2020 | 2 | fowl | 4
spring | 2021 | 3 | fowl | 1
Now your query is:
SELECT
hunter_id,
year,
sum(CASE WHEN animal='fowl' THEN animal_count END) as fowl_total,
sum(CASE WHEN animal='dear' THEN animal_count END) as deer_total
FROM this_new_table
GROUP BY hunter_id, year

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

SQL multipart messages in two tables. Doesn't work if second table is empty

I am working on a query that will fetch multipart messages from 2 tables. However, it only works IF there are multiple parts. If there is only a one part message then the the join condition won't be true anymore. How could I make it to work for both single and multipart messages?
Right now it fails if there is an entry in outbox and nothing in outbox_multipart.
My first table is "outbox" that looks like this.
TextDecoded | ID | CreatorID
Helllo, m.. | 123 | Martin
Yes, I wi.. | 124 | Martin
My second table is "outbox_multipart" that looks very similar.
TextDecoded | ID | SequencePosition
my name i.. | 123 | 2
s Martin. | 123 | 3
ll do tha.. | 124 | 2
t tomorrow. | 124 | 3
My query so far
SELECT
CONCAT(ob.TextDecoded,
GROUP_CONCAT(obm.TextDecoded
ORDER BY obm.SequencePosition ASC
SEPARATOR ''
)
) AS TextDecoded,
ob.ID,
ob.creatorID
FROM outbox AS ob
JOIN outbox_multipart AS obm ON obm.ID = ob.ID
GROUP BY
ob.ID,
ob.creatorID
Use a left join instead of an (implicit) inner join. Then, also use COALESCE on the TextDecoded alias to make sure that empty string (and not NULL) appears in the expected output.
SELECT
CONCAT(ob.TextDecoded,
COALESCE(GROUP_CONCAT(obm.TextDecoded
ORDER BY obm.SequencePosition
SEPARATOR ''), '')) AS TextDecoded,
ob.ID,
ob.creatorID
FROM outbox AS ob
LEFT JOIN outbox_multipart AS obm
ON obm.ID = ob.ID
GROUP BY
ob.ID,
ob.creatorID,
ob.TextDecoded;
Note: Strictly speaking, outbox.TextDecoded should also appear in the GROUP BY clause, since it is not an aggregate. I have made this change in the query.

Left Outer Joining Tables with Multiple Property Keys

Long time lurker, first time poster.
I have two tables 'case' and 'case_char'.
case
case_id | status | date
1 | closed | 01/01/2014
2 | open | 02/01/2014
case_char
case_id | property_key | value
1 | email | xx#xx.com
1 | phone | 1234567
2 | email | x2#xx.com
2 | phone | 987654
2 | issue | Unhappy
Say I want to return the 'issue' for each case. Not all cases have issues so I will need to do a left outer join. Unfortunately it is not working for me, it is returning only cases with the 'issue' characteristic. I need it to return all cases regardless of whether the 'issue' characteristic exists for a case in the case_char table.
Below is an example of the way I have written the code ( bearing in mind I am using an Oracle DB).
Could any of you whizzes help a brother out?
SELECT c.case_id, char.value
FROM case c, case_char char
WHERE c.case_id = char.case_id (+)
AND char.property_key = 'issue'
Just add a Join(+) to your property key as below:
SELECT C.CASE_ID, CHAR.VALUE
FROM CASE C, CASE_CHAR CHAR
WHERE
C.CASE_ID = CHAR.CASE_ID (+)
AND
CHAR.PROPERTY_KEY(+) = 'ISSUE';
^
|
You shoud use an explicit join, and put the property_key in the ON clause.
SELECT c.case_id, char.value
FROM case AS c
LEFT JOIN case_char AS char ON c.case_id = char.case_id AND char.property_key = 'issue'
I'm not very familiar with the syntax for implicit out joins. My guess is you need to put (+) after char.property_key = 'issue' to keep it from filtering out the null rows.
I assume you want 1 row per case regardless of whether or not it has an issue, but all the issues for each case?
If that's what you want, something close to this should work (I'm a SQL Server guy, so I'm not totally sure that this will work with Oracle).
SELECT
c.case_id
,char.value
FROM case AS c
LEFT JOIN case_char AS char
ON
c.case_id = char.case_id
AND char.property_key = 'issue'
Basically, we've moved the filter logic to the join condition, otherwise the WHERE clause will filter out anything that's not an 'issue'.
Does that answer your question?

How can I optimize this query...?

I have two tables, one for routes and one for airports.
Routes contains just over 9000 rows and I have indexed every column.
Airports only 2000 rows and I have also indexed every column.
When I run this query it can take up to 35 seconds to return 300 rows:
SELECT routes.* , a1.name as origin_name, a2.name as destination_name FROM routes
LEFT JOIN airports a1 ON a1.IATA = routes.origin
LEFT JOIN airports a2 ON a2.IATA = routes.destination
WHERE routes_build.carrier = "Carrier Name"
Running it with "DESCRIBE" I get the followinf info, but I'm not 100% sure on what it's telling me.
id | Select Type | Table | Type | possible_keys | Key | Key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | routes_build | ref | carrier,carrier_2 | carrier | 678 | const | 26 | Using where
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a1 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a2 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
The only alternative I can think of is to run two separate queries and join them up with PHP although, I can't believe something like this being something that could kill a mysql server. So as usual, I suspect I'm doing something stupid. SQL is my number 1 weakness.
Personally, I would start by removing the left joins and replacing them with inner joins as each route must have a start and end point.
It's telling you that it's not using an index for joining on the airports table. See how the "rows" column is so huge, 5000 odd? that's how many rows it's having to read to answer your query.
I don't know why, as you have claimed you have indexed every column. What is IATA? Is it Unique? I believe if mysql decides the index is inefficient it may ignore it.
EDIT: if IATA is a unique string, maybe try indexing half of it only? (You can select how many characters to index) That may give mysql an index it can use.
SELECT routes.*, a1.name as origin_name, a2.name as destination_name
FROM routes_build
LEFT JOIN
airports a1
ON a1.IATA = routes_build.origin
LEFT JOIN
airports a2
ON a2.IATA = routes_build.destination
WHERE routes_build.carrier = "Carrier Name"
From your EXPLAIN PLAN I can see that you don't have an index on airports.IATA.
You should create it for the query to work fast.
Name also suggests that it should be a UNIQUE index, since IATA codes are unique.
Update:
Please post your table definition. Issue this query to show it:
SHOW CREATE TABLE airports
Also I should note that your FULLTEXT index on IATA is useless unless you have set ft_max_word_len is MySQL configuration to 3 or less.
By default, it's 4.
IATA codes are 3 characters long, and MySQL doesn't search for such short words using FULLTEXT with default settings.
After you implement Martin Robins's excellent advice (i.e. remove every instance of the word LEFT from your query), try giving routes_build a compound index on carrier, origin, and destination.
It really depends on what information you're trying to get to. You probably don't need to join airports twice and you probably don't need to use left joins. Also, if you can search on a numeric field rather than a text field, that would speed things up as well.
So what are you trying to fetch?