Left Join is filtering rows out of my query in MySQL 5.7 without any left join columns in the where clause - sql

I have a query that joins 4 tables. It returns 35 rows every time I run it. Here it is..
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
If I left join a table (using a subquery in the on clause of the join), MySQL does some very unexpected things. Here's the modified query with the left join...
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
LEFT OUTER JOIN taxrecordpayment trp ON trp.taxRecordId = tr.Tax_ID AND trp.paymentId = (
SELECT p.id
FROM taxrecordpayment trpi
JOIN payments p ON p.id = trpi.paymentId
WHERE trpi.taxRecordId = tr.Tax_ID AND p.isFullYear = 0
ORDER BY p.dueDate, p.paymentSendTo
LIMIT 1
)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
I would like to note that the left join table does not appear in the where clause of the query at all, and I am not using the left join table in the select clause. In real life, I actually use the left join records in the select clause, but in my effort to get to the essential elements causing this problem, I have simplified the query and removed everything but the essential parts that cause trouble.
Here's what is happening...
Where I used to get 35 records, now I get a random number of records approaching 35. Sometimes, I get 33. Other times, I get 27, or 29, or 31, and so on. I would never expect a left join like this to filter out any records from my result set. A left join should only add additional columns to the result set, particularly when - as is the case here - the left join table is not part of the where clause.
I have determined that the problem really only happens if the subquery has a non-deterministic sort. In other words, if I have two taxrecordpayment records that match the subquery and both have the same due date and the same "paymentSendTo" value, then I see the issue. If the inner subquery has a deterministic sort, the issue goes away.
I would imagine that some people will look at my simplified example and recommend that I simply remove the subquery. If my query were this simple in real life, that would be the way to go.
In reality, the entire query is more complicated, is hitting a LOT of data, and modifying it is possible, but costly. Removing the subquery is even more costly.
Has anyone seen this sort of behavior before? I would expect a non-deterministic subquery to simply produce inconsistent results and I would never expect a left join like this to actually filter records out when the left joined table is not used at all in the where clause.
Here is the query plan, as provided by EXPLAIN...
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
parcels
NULL
range
PRIMARY,Loan_ID,state_county,ParcelsCounty,county_state,Location,CountyLoan
county_state
106
NULL
590
1
Using index condition; Using where
1
PRIMARY
tr
NULL
eq_ref
parcel_year,ParcelsTax_Record,Year
parcel_year
8
infoexchange.parcels.Parcel_ID,const
1
100
Using index
1
PRIMARY
Loans
NULL
eq_ref
PRIMARY,Bank_ID,Bank,DateSub,loan_number
PRIMARY
4
infoexchange.parcels.Loan_ID
1
21.14
Using where
1
PRIMARY
Lender
NULL
eq_ref
PRIMARY
PRIMARY
8
infoexchange.Loans.bank_id
1
100
Using index
1
PRIMARY
trp
NULL
eq_ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
8
infoexchange.tr.Tax_ID,func
1
100
Using where; Using index
2
DEPENDENT SUBQUERY
trpi
NULL
ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
4
infoexchange.tr.Tax_ID
1
100
Using index; Using temporary; Using filesort
2
DEPENDENT SUBQUERY
p
NULL
eq_ref
PRIMARY
PRIMARY
4
infoexchange.trpi.paymentId
1
10
Using where
I have attempted to recreate this with a contrived data setup and an analogous query, but with my contrived data set, I cannot get the subquery behave non-deterministically even though it suffers from the same problem as my subquery above (there are multiple records that match the subquery and the order by is not unique for those records).
This seems to require a massive data set to start misbehaving. It happens on multiple distinct instances of MySQL 5.7, while a MySQL 5.6 instance does not demonstrate the problem at all. I am hoping someone can spot something in the above query plan to help me understand why the subquery is non-deterministic and - more importantly - why that causes records to get dropped from the result set.
I feel like this is either a data set issue (perhaps we need to do a table optimize or do some maintenance on our tables), or a bug in MySQL.

I have submitted a bug for this behavior.
https://bugs.mysql.com/bug.php?id=104824
You can recreate this behavior as follows...
CREATE TABLE tableA (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(10)
);
CREATE TABLE tableB (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
tableAId INTEGER NOT NULL,
name VARCHAR(10),
CONSTRAINT tableBFKtableAId FOREIGN KEY (tableAId) REFERENCES tableA (id)
);
INSERT INTO tableA (name)
VALUES ('he'),
('she'),
('it'),
('they');
INSERT INTO tableB (tableAId, name)
VALUES (1, 'hat'),
(2, 'shoes'),
(4, 'roof');
Run this query multiple times and the number of rows returned will vary:
SELECT COALESCE(b.id, -1) AS tableBId,
a.id AS tableAId
FROM tableA a
LEFT JOIN tableB b ON (b.tableAId = a.id AND 0.5 > RAND());

Related

Is it optimal to use multiple joins in update query?

My update query checks whether the column “Houses” is null in any of the rows in my source table by joining the id between the target & source table (check query one). The column Houses being null in this case indicates that the row has expired; thus, I need to expire the row id in my target table and set the expired date. The query works fine, but I was wondering if it can be improved; I'm new to SQL, so I don't know if using two joins is the best way to accomplish the result I want. My update query will later be used against millions of rows. No columns has been indexed yet.
Query:
(Query one)
Update t
set valid_date = GETDATE()
From Target T
JOIN SOURCE S ON S.ID = T.ID
LEFT JOIN SOURCE S2 ON S2.Houses = t.Houses
WHERE S2.Houses is null
Target:
ID
namn
middlename
Houses
date
1
demo
hello
2
null
2
demo2
test
4
null
3
demo3
test1
5
null
Source:
ID
namn
middlename
Houses
1
demo
hello
null
3
demo
world
null
Expected output after running update query :
ID
namn
middlename
Houses
date
1
demo
hello
2
2022-12-06
2
demo2
test
4
null
3
demo3
test1
5
2022-12-06
I would recommend exists:
update t
set valid_date = getdate()
from target t
where exists (select 1 from source s where s.id = t.id and s.houses is null)
Note that your original query does not exactly do what you want. It cannot distinguish source rows that do not exist from source rows that exist and whose houses column is null. In your example, it would update row 2, which is not what you seem to want. You would need an INNER JOIN instead of the LEFT JOIN.
With EXISTS, you want an index on source(id, houses) so the subquery can execute efficiently against target rows. This index is probably worthwhile for the the JOIN as well.
I don't see why you'd need to join on the column houses at all.
Find all rows in source that have value NULL in the column houses.
Then update all rows in target that have the IDs of the source rows.
I prefer to write these kind of complex updates using CTEs. It looks more readable to me.
WITH
CTE
AS
(
SELECT
Target.ID
,Target.Date
FROM
Source
INNER JOIN Target ON Target.ID = Source.ID
WHERE Source.Houses IS NULL
)
UPDATE CTE
SET Date = GETDATE();
To efficiently find rows in source that have value NULL in the column houses you should create an index, something like this:
CREATE INDEX IX_Houses ON Source
(
Houses
);
I assume that ID is a primary key with a clustered unique index, so ID would be included in the IX_Houses index implicitly.

Select rows from table where a certain value in a joined table does not exist

I have two tables, playgrounds and maintenance, which are linked with a foreign key. Whenever there is a maintenance on a playground, it will be saved in the table and connected to the respective playground.
Table A (playgrounds):
playground_number
Table B (maintenance):
playground_number (foreign key),
maintenance_type (3 different types),
date
What I now want is to retrieve all the playgrounds on which a certain type of maintenance has NOT been performed yet IN a certain year. For instance all playgrounds that do not have a maintenance_type = 1 in the year 2022 connected yet, although there could be multiple other maintenance_types because they are more frequent.
This is what I have tried (pseudo):
SELECT DISTINCT A.playground_number
FROM table A
JOIN table B ON A.playground_number = B.playground_number (FK)
WHERE NOT EXISTS (SELECT B.maintenance_type FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022
However this will return nothing as soon as there is only one entry with maintenance_type 1 within the table.
I am struggling with this query for a while, so would appreciate some thoughts :) Many thanks.
You need to correlate the exists subquery to the outer B table. Also, you don't even need the join.
SELECT DISTINCT a.playground_number
FROM table_a a
WHERE NOT EXISTS (
SELECT 1
FROM table_b b
WHERE b.playground_number = a.playground_number AND
b.maintenance_type = 1 AND
YEAR(b.date) = 2022
);
Please consider this. I don't think you need JOIN.
SELECT DISTINCT A.playground_number
FROM table A
WHERE A.playground_number NOT IN (SELECT B.playground_number FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022)
Please let me know if I understand it incorrectly.

SQL select from two tabels wrong result

I can not understand what I am doing wrong.
My array:
ps_product
id_product
active
1
1
2
1
and
ps_product_sync
id_product
status
1
0
2
1
and my SQL code
SELECT pr_product.id_product, pr_product.active
FROM pr_product, pr_product_sync
WHERE pr_product.active = pr_product_sync.status
I get a result like this:
id_product
status
2
1
2
1
2
1
...
..
24 rows
I try the same with inner but result is the same, I don't have duplicates in the arrays... I don't understand why I get one row 24 times
PS. all tables looks good before posting/saving
If you query two tables and include both in the FROM clause, you create a Cartesian product of these tables. In other words, if one table has 4 rows and the other 6 rows, the result is 24 rows.
It is better to create an INNER JOIN using the key of the first table and the foreign key of the second table.
Change your query accordingly
SELECT pr_product.id_product, pr_product.active
FROM pr_product
INNER JOIN pr_product_sync
ON pr_product.id_product = ps_product_sync.id_product
WHERE pr_product.active = pr_product_sync.status
Of course, you could also compare the Keys in the WHERE clause or eliminate duplicates using DISTINCT. IMHO the most understandable solution is an INNER JOIN.
I hope this solves your problem.
Missing the primary key join.
Add:
WHERE
pr_product.id_product=pr_product_sync.id_product
AND pr_product.active=pr_product_sync.status

Merge and order rows

I have a table in the following structure. I am writing a query to get all item_ids where key_name='topic' and key_string_value='investing', which is the simple part.
select item_id from table where key_name='topic' and key_string_value='investing'
But then for all the item_ids returned above, I want to order them by the values set for each item_id in key_name='importance' and key_name='product'.The table structure is making it very difficult as I am not an SQL expert. Any help would be appreciated.
item_id key_name key_string_value Key_float_value
1 topic investing null
1 importance null 500
1 product A null
1 product B null
2 topic Starting null
2 product B null
2 importance null 300
2 topic retail null
3 importance null 400
3 topic investing null
3 product C null
4 topic Starting null
4 topic investing null
4 importance null 400
4 product D null
#Schwern is on right - your structure should be normalized, and the names should be better too. All this makes me think: homework.
The answer to the homework question is a self join, and looks like this:
select t1.item_id , imp.key_float_value, prd.key_string_value
from [table] t1
LEFT OUTER JOIN [table] imp on imp.item_id = t1.item_id and imp.key_name='importance'
LEFT OUTER JOIN [table] prd on prd.item_id = t1.item_id and prd.key_name='product'
where t1.key_name='topic' and t1.key_string_value='investing'
ORDER BY imp.key_float_value, prd.key_string_value
The square brackets on `[table] are because the use of the table keyword as the table name requires the name to be delimited. Square brackets for TSQL. Others use double quotes (")
You have a very poorly design table that will be slow and difficult to work with. SQL is not a key/value store; it works on rows, columns and relationships. Rather than fight it, I would suggest redesigning it. Either use a NoSQL database which is easier to use and works more like normal programming data structures, or redesign it.
Here's the redesign I would suggest.
CREATE TABLE item (
id INTEGER PRIMARY KEY,
importance INTEGER DEFAULT 0
);
CREATE TABLE item_topics (
item_id INTEGER REFERENCES item(id),
topic TEXT NOT NULL
);
CREATE TABLE item_products (
item_id INTEGER REFERENCES item(id),
product TEXT NOT NULL
);
The item itself, and any scalar (ie. single value) attributes go into one table. Anything which can be a list (products and topics) needs its own table relating each item to its elements. If this seems clunky, that's because it is, but that's how SQL works.
To find all items whose topic is investing, you have to join on the item_topics table.
SELECT item.id
FROM item
JOIN item_topics ON item.id = item_topics.id
WHERE topic = 'investing'
Then to order them, add ORDER BY item.importance.

MySQL LEFT JOIN SELECT not selecting all the left side records?

I'm getting odd results from a MySQL SELECT query involving a LEFT JOIN, and I can't understand whether my understanding of LEFT JOIN is wrong or whether I'm seeing a genuinely odd behavior.
I have a two tables with a many-to-one relationship: For every record in table 1 there are 0 or more records in table 2. I want to select all the records in table 1 with a column that counts the number of related records in table 2. As I understand it, LEFT JOIN should always return all records on the LEFT side of the statement.
Here's a test database that exhibits the problem:
CREATE DATABASE Test;
USE Test;
CREATE TABLE Dates (
dateID INT UNSIGNED NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
UNIQUE KEY (dateID)
) TYPE=MyISAM;
CREATE TABLE Slots (
slotID INT UNSIGNED NOT NULL AUTO_INCREMENT,
dateID INT UNSIGNED NOT NULL,
UNIQUE KEY (slotID)
) TYPE=MyISAM;
INSERT INTO Dates (date) VALUES ('2008-10-12'),('2008-10-13'),('2008-10-14');
INSERT INTO Slots (dateID) VALUES (3);
The Dates table has three records, and the Slots 1 - and that record points to the third record in Dates.
If I do the following query..
SELECT d.date, count(s.slotID) FROM Dates AS d LEFT JOIN Slots AS s ON s.dateID=d.dateID GROUP BY s.dateID;
..I expect to see a table with 3 rows in - two with a count of 0, and one with a count of 1. But what I actually see is this:
+------------+-----------------+
| date | count(s.slotID) |
+------------+-----------------+
| 2008-10-12 | 0 |
| 2008-10-14 | 1 |
+------------+-----------------+
The first record with a zero count appears, but the later record with a zero count is ignored.
Am I doing something wrong, or do I just not understand what LEFT JOIN is supposed to do?
You need to GROUP BY d.dateID. In two of your cases, s.DateID is NULL (LEFT JOIN) and these are combined together.
I think you will also find that this is invalid (ANSI) SQL, because d.date is not part of a GROUP BY or the result of an aggregate operation, and should not be able to be SELECTed.
I think you mean to group by d.dateId.
Try removing the GROUP BY s.dateID
The dateid for 10-12 and 10-13 are groupd together by you. Since they are 2 null values the count is evaluated to 0
I don't know if this is valid in MySQL but you could probably void this mistake in the future by using the following syntax instead
SELECT date, count(slotID) as slotCount
FROM Dates LEFT OUTER JOIN Slots USING (dateID)
GROUP BY (date)
By using the USING clause you don't get two dateID's to keep track of.
replace GROUP BY s.dateID with d.dateID.