MySQL LEFT JOIN SELECT not selecting all the left side records? - sql

I'm getting odd results from a MySQL SELECT query involving a LEFT JOIN, and I can't understand whether my understanding of LEFT JOIN is wrong or whether I'm seeing a genuinely odd behavior.
I have a two tables with a many-to-one relationship: For every record in table 1 there are 0 or more records in table 2. I want to select all the records in table 1 with a column that counts the number of related records in table 2. As I understand it, LEFT JOIN should always return all records on the LEFT side of the statement.
Here's a test database that exhibits the problem:
CREATE DATABASE Test;
USE Test;
CREATE TABLE Dates (
dateID INT UNSIGNED NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
UNIQUE KEY (dateID)
) TYPE=MyISAM;
CREATE TABLE Slots (
slotID INT UNSIGNED NOT NULL AUTO_INCREMENT,
dateID INT UNSIGNED NOT NULL,
UNIQUE KEY (slotID)
) TYPE=MyISAM;
INSERT INTO Dates (date) VALUES ('2008-10-12'),('2008-10-13'),('2008-10-14');
INSERT INTO Slots (dateID) VALUES (3);
The Dates table has three records, and the Slots 1 - and that record points to the third record in Dates.
If I do the following query..
SELECT d.date, count(s.slotID) FROM Dates AS d LEFT JOIN Slots AS s ON s.dateID=d.dateID GROUP BY s.dateID;
..I expect to see a table with 3 rows in - two with a count of 0, and one with a count of 1. But what I actually see is this:
+------------+-----------------+
| date | count(s.slotID) |
+------------+-----------------+
| 2008-10-12 | 0 |
| 2008-10-14 | 1 |
+------------+-----------------+
The first record with a zero count appears, but the later record with a zero count is ignored.
Am I doing something wrong, or do I just not understand what LEFT JOIN is supposed to do?

You need to GROUP BY d.dateID. In two of your cases, s.DateID is NULL (LEFT JOIN) and these are combined together.
I think you will also find that this is invalid (ANSI) SQL, because d.date is not part of a GROUP BY or the result of an aggregate operation, and should not be able to be SELECTed.

I think you mean to group by d.dateId.

Try removing the GROUP BY s.dateID

The dateid for 10-12 and 10-13 are groupd together by you. Since they are 2 null values the count is evaluated to 0

I don't know if this is valid in MySQL but you could probably void this mistake in the future by using the following syntax instead
SELECT date, count(slotID) as slotCount
FROM Dates LEFT OUTER JOIN Slots USING (dateID)
GROUP BY (date)
By using the USING clause you don't get two dateID's to keep track of.

replace GROUP BY s.dateID with d.dateID.

Related

Left join with multiple matches on right table with conditions

I am trying to do a SQL Server query where in my left table (database 1 - Db1), I have some Data that need to be checked some times.
I can`t write in this table (Db1, read only) so I have created other database Db2 that when I check the information related to some record in Db1, I save the date of this check at Db2.
Then, I have a list on the left table that shows me all the records that have to be check. When I create the check record in db2, for some time, 3-4 month, the list does not have to show me this record again.
So the query must handle that, if there is not record in db2, if is null, the record must be showed in the list, then if I create one check, for a time (3-4 month) does not have to show this record, when the 3-4 months have passed, the record have to be again in the list to remember to do a new check, then if I create a new check in Db2 with new date, the record at left has to hide again for 3-4 month more and so on.
I have tried this, that works if there is only one record at Db2, but as soon as I create another check, the old record, the first one, does not allow to hide the left record on the list.
I hope some one can give me a clue.
Thank you in advance
SELECT uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
FROM [GI].[dbo].[alq03] uno
LEFT JOIN [GI_impuestos].[dbo].[checkar] tres ON uno.n_contrato = tres.n_contrato
WHERE tres.n_contrato IS NULL
/*checks*/
AND (tres.fecha_checkar IS NULL)
OR (tres.fecha_checkar NOT BETWEEN (GETDATE() - 120) AND (GETDATE()))
GROUP BY uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
ORDER BY uno.fecha_fin asc
Here is a simple example of what you are asking for. You will have to modify it to suit your needs.
create table readOnly(
id int primary key,
info varchar(100) not null);
insert into readOnly values
(1,'info to check'),
(2,'more info'),
(3,'not finished yet!');
create table chk (
rid int primary key,
checked date);
insert into chk values
(1,'2021-07-01'),
(2,'2021-12-01')
select
id,
coalesce(cast(checked as char),'never') last_check,
info info_to_check
from
readOnly
left join
chk on id = rid
where
datediff(m,checked,getdate())>4
or checked is null
id | last_check | info_to_check
-: | :----------- | :----------------
1 | 2021-07-01 | info to check
3 | never | not finished yet!
db<>fiddle here

Left Join is filtering rows out of my query in MySQL 5.7 without any left join columns in the where clause

I have a query that joins 4 tables. It returns 35 rows every time I run it. Here it is..
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
If I left join a table (using a subquery in the on clause of the join), MySQL does some very unexpected things. Here's the modified query with the left join...
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
LEFT OUTER JOIN taxrecordpayment trp ON trp.taxRecordId = tr.Tax_ID AND trp.paymentId = (
SELECT p.id
FROM taxrecordpayment trpi
JOIN payments p ON p.id = trpi.paymentId
WHERE trpi.taxRecordId = tr.Tax_ID AND p.isFullYear = 0
ORDER BY p.dueDate, p.paymentSendTo
LIMIT 1
)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
I would like to note that the left join table does not appear in the where clause of the query at all, and I am not using the left join table in the select clause. In real life, I actually use the left join records in the select clause, but in my effort to get to the essential elements causing this problem, I have simplified the query and removed everything but the essential parts that cause trouble.
Here's what is happening...
Where I used to get 35 records, now I get a random number of records approaching 35. Sometimes, I get 33. Other times, I get 27, or 29, or 31, and so on. I would never expect a left join like this to filter out any records from my result set. A left join should only add additional columns to the result set, particularly when - as is the case here - the left join table is not part of the where clause.
I have determined that the problem really only happens if the subquery has a non-deterministic sort. In other words, if I have two taxrecordpayment records that match the subquery and both have the same due date and the same "paymentSendTo" value, then I see the issue. If the inner subquery has a deterministic sort, the issue goes away.
I would imagine that some people will look at my simplified example and recommend that I simply remove the subquery. If my query were this simple in real life, that would be the way to go.
In reality, the entire query is more complicated, is hitting a LOT of data, and modifying it is possible, but costly. Removing the subquery is even more costly.
Has anyone seen this sort of behavior before? I would expect a non-deterministic subquery to simply produce inconsistent results and I would never expect a left join like this to actually filter records out when the left joined table is not used at all in the where clause.
Here is the query plan, as provided by EXPLAIN...
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
parcels
NULL
range
PRIMARY,Loan_ID,state_county,ParcelsCounty,county_state,Location,CountyLoan
county_state
106
NULL
590
1
Using index condition; Using where
1
PRIMARY
tr
NULL
eq_ref
parcel_year,ParcelsTax_Record,Year
parcel_year
8
infoexchange.parcels.Parcel_ID,const
1
100
Using index
1
PRIMARY
Loans
NULL
eq_ref
PRIMARY,Bank_ID,Bank,DateSub,loan_number
PRIMARY
4
infoexchange.parcels.Loan_ID
1
21.14
Using where
1
PRIMARY
Lender
NULL
eq_ref
PRIMARY
PRIMARY
8
infoexchange.Loans.bank_id
1
100
Using index
1
PRIMARY
trp
NULL
eq_ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
8
infoexchange.tr.Tax_ID,func
1
100
Using where; Using index
2
DEPENDENT SUBQUERY
trpi
NULL
ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
4
infoexchange.tr.Tax_ID
1
100
Using index; Using temporary; Using filesort
2
DEPENDENT SUBQUERY
p
NULL
eq_ref
PRIMARY
PRIMARY
4
infoexchange.trpi.paymentId
1
10
Using where
I have attempted to recreate this with a contrived data setup and an analogous query, but with my contrived data set, I cannot get the subquery behave non-deterministically even though it suffers from the same problem as my subquery above (there are multiple records that match the subquery and the order by is not unique for those records).
This seems to require a massive data set to start misbehaving. It happens on multiple distinct instances of MySQL 5.7, while a MySQL 5.6 instance does not demonstrate the problem at all. I am hoping someone can spot something in the above query plan to help me understand why the subquery is non-deterministic and - more importantly - why that causes records to get dropped from the result set.
I feel like this is either a data set issue (perhaps we need to do a table optimize or do some maintenance on our tables), or a bug in MySQL.
I have submitted a bug for this behavior.
https://bugs.mysql.com/bug.php?id=104824
You can recreate this behavior as follows...
CREATE TABLE tableA (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(10)
);
CREATE TABLE tableB (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
tableAId INTEGER NOT NULL,
name VARCHAR(10),
CONSTRAINT tableBFKtableAId FOREIGN KEY (tableAId) REFERENCES tableA (id)
);
INSERT INTO tableA (name)
VALUES ('he'),
('she'),
('it'),
('they');
INSERT INTO tableB (tableAId, name)
VALUES (1, 'hat'),
(2, 'shoes'),
(4, 'roof');
Run this query multiple times and the number of rows returned will vary:
SELECT COALESCE(b.id, -1) AS tableBId,
a.id AS tableAId
FROM tableA a
LEFT JOIN tableB b ON (b.tableAId = a.id AND 0.5 > RAND());

SQL: At least one value exists in another table

I am trying to create a table that has columns called user_id and top5_foods (binary column). I currently have two tables, one has all of the user_ids and the foods associated with those user_ids and one table that only contains the top5 foods according to a type of calculation to select the top5 foods.
The table that I am trying to create if to have the column of the user_id and if at least one of their favorite foods is in the top_5_food table, put the value of the top5_foods as 1 and if not, 0.
Something like the following:
user_id top5_foods
----------------------
34223 1
43225 0
34323 1
I have tried to use the CASE command but it just duplicated the user_ids and mark 1 or 0 whenever it finds a food that is in the top_5_foods table. But I don't want it to duplicate. Could you please help ?
Thank you very much
If I understand correctly, a left join and aggregation:
select uf.user_id,
(count(t.food_id) > 0) as top5_foods
from user_foods uf left join
top5_foods t
on uf.food_id = t.food_id
group by uf.user_id;

finding consecutive date pairs in SQL

I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.

Insert results of subquery into table with a constant

The outline of the tables in question are as follows:
I have a table, lets call it join, that has two columns, both foreign keys to other tables. Let's call the two columns userid and buildingid so join looks like
+--------------+
| join |
|--------------|
|userid |
|buildingid |
+--------------+
I basically need to insert a bunch of rows into this table. Each user will be assigned to multiple buildings by having multiple entries in this table. So user 13 might be assigned to buildings 1, 2, and 3 by the following
13 1
13 2
13 3
I'm trying to figure out how to do this in a query if the building numbers are constant, that is, I'm assigning a group of people to the same buildings. Basically, (this is wrong) I want to do
insert into join (userid, buildingid) values ((select userid from users), 1)
Does that make sense? I've also tried using
select 1
The error I'm running into is that the subquery returns more than one result. I also attempted to create a join, basically with a static select query that was also unsuccessful.
Any thoughts?
Thanks,
Chris
Almost! When you want to insert to values of a query, don't try to put them in the values clause. insert can take a select as an argument for the values!
insert into join (userid, buildingid)
select userid, 1 from users
Also, in the spirit of learning more, you can create a table that doesn't exist by using the following syntax:
select userid, 1 as buildingid
into join
from users
That only works if the table doesn't exist, though, but it's a quick and dirty way to create table copies!