Is it optimal to use multiple joins in update query? - sql

My update query checks whether the column “Houses” is null in any of the rows in my source table by joining the id between the target & source table (check query one). The column Houses being null in this case indicates that the row has expired; thus, I need to expire the row id in my target table and set the expired date. The query works fine, but I was wondering if it can be improved; I'm new to SQL, so I don't know if using two joins is the best way to accomplish the result I want. My update query will later be used against millions of rows. No columns has been indexed yet.
Query:
(Query one)
Update t
set valid_date = GETDATE()
From Target T
JOIN SOURCE S ON S.ID = T.ID
LEFT JOIN SOURCE S2 ON S2.Houses = t.Houses
WHERE S2.Houses is null
Target:
ID
namn
middlename
Houses
date
1
demo
hello
2
null
2
demo2
test
4
null
3
demo3
test1
5
null
Source:
ID
namn
middlename
Houses
1
demo
hello
null
3
demo
world
null
Expected output after running update query :
ID
namn
middlename
Houses
date
1
demo
hello
2
2022-12-06
2
demo2
test
4
null
3
demo3
test1
5
2022-12-06

I would recommend exists:
update t
set valid_date = getdate()
from target t
where exists (select 1 from source s where s.id = t.id and s.houses is null)
Note that your original query does not exactly do what you want. It cannot distinguish source rows that do not exist from source rows that exist and whose houses column is null. In your example, it would update row 2, which is not what you seem to want. You would need an INNER JOIN instead of the LEFT JOIN.
With EXISTS, you want an index on source(id, houses) so the subquery can execute efficiently against target rows. This index is probably worthwhile for the the JOIN as well.

I don't see why you'd need to join on the column houses at all.
Find all rows in source that have value NULL in the column houses.
Then update all rows in target that have the IDs of the source rows.
I prefer to write these kind of complex updates using CTEs. It looks more readable to me.
WITH
CTE
AS
(
SELECT
Target.ID
,Target.Date
FROM
Source
INNER JOIN Target ON Target.ID = Source.ID
WHERE Source.Houses IS NULL
)
UPDATE CTE
SET Date = GETDATE();
To efficiently find rows in source that have value NULL in the column houses you should create an index, something like this:
CREATE INDEX IX_Houses ON Source
(
Houses
);
I assume that ID is a primary key with a clustered unique index, so ID would be included in the IX_Houses index implicitly.

Related

Left Join is filtering rows out of my query in MySQL 5.7 without any left join columns in the where clause

I have a query that joins 4 tables. It returns 35 rows every time I run it. Here it is..
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
If I left join a table (using a subquery in the on clause of the join), MySQL does some very unexpected things. Here's the modified query with the left join...
SELECT Lender.id AS LenderId,
Loans.Loan_ID AS LoanId,
Parcels.Parcel_ID AS ParcelId,
tr.Tax_ID AS TaxRecordId,
tr.Tax_Year AS TaxYear
FROM parcels
INNER JOIN Loans ON (Parcels.Loan_ID = Loans.Loan_ID AND Parcels.Escrow = 1)
INNER JOIN Lender ON (Lender.id = Loans.Bank_ID)
INNER JOIN Tax_Record tr ON (tr.Parcel_ID = Parcels.Parcel_ID AND tr.Tax_Year = :taxYear)
LEFT OUTER JOIN taxrecordpayment trp ON trp.taxRecordId = tr.Tax_ID AND trp.paymentId = (
SELECT p.id
FROM taxrecordpayment trpi
JOIN payments p ON p.id = trpi.paymentId
WHERE trpi.taxRecordId = tr.Tax_ID AND p.isFullYear = 0
ORDER BY p.dueDate, p.paymentSendTo
LIMIT 1
)
WHERE Loans.Active = 1
AND Loans.Date_Submitted IS NOT NULL
AND Parcels.Municipality = :municipality
AND Parcels.County = :county
AND Parcels.State LIKE :stateCode
I would like to note that the left join table does not appear in the where clause of the query at all, and I am not using the left join table in the select clause. In real life, I actually use the left join records in the select clause, but in my effort to get to the essential elements causing this problem, I have simplified the query and removed everything but the essential parts that cause trouble.
Here's what is happening...
Where I used to get 35 records, now I get a random number of records approaching 35. Sometimes, I get 33. Other times, I get 27, or 29, or 31, and so on. I would never expect a left join like this to filter out any records from my result set. A left join should only add additional columns to the result set, particularly when - as is the case here - the left join table is not part of the where clause.
I have determined that the problem really only happens if the subquery has a non-deterministic sort. In other words, if I have two taxrecordpayment records that match the subquery and both have the same due date and the same "paymentSendTo" value, then I see the issue. If the inner subquery has a deterministic sort, the issue goes away.
I would imagine that some people will look at my simplified example and recommend that I simply remove the subquery. If my query were this simple in real life, that would be the way to go.
In reality, the entire query is more complicated, is hitting a LOT of data, and modifying it is possible, but costly. Removing the subquery is even more costly.
Has anyone seen this sort of behavior before? I would expect a non-deterministic subquery to simply produce inconsistent results and I would never expect a left join like this to actually filter records out when the left joined table is not used at all in the where clause.
Here is the query plan, as provided by EXPLAIN...
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
parcels
NULL
range
PRIMARY,Loan_ID,state_county,ParcelsCounty,county_state,Location,CountyLoan
county_state
106
NULL
590
1
Using index condition; Using where
1
PRIMARY
tr
NULL
eq_ref
parcel_year,ParcelsTax_Record,Year
parcel_year
8
infoexchange.parcels.Parcel_ID,const
1
100
Using index
1
PRIMARY
Loans
NULL
eq_ref
PRIMARY,Bank_ID,Bank,DateSub,loan_number
PRIMARY
4
infoexchange.parcels.Loan_ID
1
21.14
Using where
1
PRIMARY
Lender
NULL
eq_ref
PRIMARY
PRIMARY
8
infoexchange.Loans.bank_id
1
100
Using index
1
PRIMARY
trp
NULL
eq_ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
8
infoexchange.tr.Tax_ID,func
1
100
Using where; Using index
2
DEPENDENT SUBQUERY
trpi
NULL
ref
taxRecordPayment_key,IDX_trp_pymtId_trId
taxRecordPayment_key
4
infoexchange.tr.Tax_ID
1
100
Using index; Using temporary; Using filesort
2
DEPENDENT SUBQUERY
p
NULL
eq_ref
PRIMARY
PRIMARY
4
infoexchange.trpi.paymentId
1
10
Using where
I have attempted to recreate this with a contrived data setup and an analogous query, but with my contrived data set, I cannot get the subquery behave non-deterministically even though it suffers from the same problem as my subquery above (there are multiple records that match the subquery and the order by is not unique for those records).
This seems to require a massive data set to start misbehaving. It happens on multiple distinct instances of MySQL 5.7, while a MySQL 5.6 instance does not demonstrate the problem at all. I am hoping someone can spot something in the above query plan to help me understand why the subquery is non-deterministic and - more importantly - why that causes records to get dropped from the result set.
I feel like this is either a data set issue (perhaps we need to do a table optimize or do some maintenance on our tables), or a bug in MySQL.
I have submitted a bug for this behavior.
https://bugs.mysql.com/bug.php?id=104824
You can recreate this behavior as follows...
CREATE TABLE tableA (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(10)
);
CREATE TABLE tableB (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
tableAId INTEGER NOT NULL,
name VARCHAR(10),
CONSTRAINT tableBFKtableAId FOREIGN KEY (tableAId) REFERENCES tableA (id)
);
INSERT INTO tableA (name)
VALUES ('he'),
('she'),
('it'),
('they');
INSERT INTO tableB (tableAId, name)
VALUES (1, 'hat'),
(2, 'shoes'),
(4, 'roof');
Run this query multiple times and the number of rows returned will vary:
SELECT COALESCE(b.id, -1) AS tableBId,
a.id AS tableAId
FROM tableA a
LEFT JOIN tableB b ON (b.tableAId = a.id AND 0.5 > RAND());

compare user supplied list against table, with nulls for non-matching rows

Given a list of user-supplied numbers, which I'll refer to as myList, I want to find out which ones have a match against table MasterList, and which ones are null (no match)
so, given db contents
MasterList
----------
ID Number
1 3333333
2 4444444
3 5555555
If myList is ['1111111','2222222','3333333','4444444']
I want the following output:
1111111, null
2222222, null
3333333, 1
4444444, 2
Ideas I've tried:
This, of course, yields only the ones that match.
select Number, ID
from MasterList
where Number in('1111111','2222222','3333333','4444444')
My next idea is no more helpful:
select temp.Number, master.Number
from MasterList master
left join MasterList temp
on master.id=temp.id
and temp.Number in('1111111','2222222','3333333','4444444')
If the list were itself a table temp, it would be trivial to get the desired output:
select temp.number, master.id
from temp -- (ie, the list, where temp.number is the given list)
left join master on master.number=temp.number
-- optional where
where temp.number in('1111111','2222222','3333333','4444444')
This idea never materialized:
select temp.number, master.id
from (select '1111111','2222222','3333333','4444444') temp
left join master on master.number on....
Can this be done without a temporary table?
If not, how do you make a temporary table in DB2? (IBM documentation is helpful if you already know how to do it...)
You want an outer join, here a left outer join (we want all the rows from the left and any rows on the right that match the join condition), as you rightly say in the question. Here I'm using a Common Table Expression (CTE) to basically create a temp table on the fly.
WITH inlist (num) as ( VALUES
(111111),
(222222),
(333333),
(444444) )
SELECT num, id
FROM inlist LEFT OUTER JOIN masterlist
ON masterlist.number = inlist.num
ORDER BY num;
That yields:
NUM ID
----------- -----------
111111 -
222222 -
333333 1
444444 2
4 record(s) selected.
I'm not super-familiar with DB2 (haven't written SQL for that in at least 15 years, and not much back then), so I don't know how much you'll need to edit this to make it work, but I think this will do what you want (Edited SQL to use VALUES clause):
SELECT
my.Number1,
CASE WHEN ml.Number1 IS NULL
THEN NULL
ELSE ROW_NUMBER() OVER (ORDER BY my.Number1) END AS Indicator
FROM
(VALUES ('1111111'),('2222222'),('3333333'),('4444444'))
AS my(Number1)
LEFT JOIN
MasterList AS ml
ON
my.Number1 = ml.Number1;

SQL - include results you are looking for in a column and set all other values to null

I have two tables, one with orders and another with order comments. I want to join these two tables. They are joined on a column "EID" which exists in both tables. I want all orders. I also want to see all comments with only certain criteria AND all other comments should be set to null. How do I go about this?
Orders Table
Order_Number
1
2
3
4
Comments Table
Comments
Cancelled On
Ordered On
Cancelled On
Cancelled On
In this example I would like to see for my results:
Order_Number | Comments
1 | Cancelled On
2 | Null
3 | Cancelled On
4 | Cancelled On
Thanks!
This seems like a rather trivial left join.
select o.order_number, c.comments
from orders o
left join comments c
on o.eid = c.eid
and (here goes your criteria for comments)
Tested on Oracle, there might be subtle syntax differences for other DB engines.
It depends on one condition:
Are you trying to SET the other comments to null? (replace the values in the table)
or
Are you trying to DISPLAY the other comments as null? (dont display them)
If you want to change the values in the table use
UPDATE `table` SET `column` = null WHERE condition;
otherwise use:
SELECT column FROM table JOIN othertable WHERE condition;

Oracle ORA-01427: single-row subquery returns more than one row, but actually no rows

I can already hear the groans looking at my title but please bear with me a moment. :)
I have two tables that have a few columns in common and are updated through different means. Given a specific identifier I want to update the first table with values from the second table if the first table is missing some information.
Table A looks something like:
Dept_ID Reviewer Reviewer_Team Reviewer_Code
ACM Null Null Null
EOT Null Null Null
QQQ Joe Joe's Group XYZ
ACM Null Null Null
ZZZ Null Null Null
Table B looks something like:
Dept_ID Reviewer Reviewer_Team Reviewer_Code
AAA Al Al's Group 123
BBB Bob Bob's Group 234
ZZZ Zoe Zoe's Group 567
If Reviewer_Code is Null in Table A we want to find Table A's Dept_ID in Table B, and update Table A's other fields to match Table B. Note that Table A might have multiple records with the same Dept_ID in which case we'd expect them to have the same values updated from Table B.
Sounds easy. Using the above tables as an example there are no matches in Table B, so the ACM and EOT records would not be updated at this step. Table A's ZZZ record though would get updated based on Table B's ZZZ record.
However there's a chance that there would be no matches in Table B. So pretend Table A doesn't have the ZZZ record, just the ACM and EOT that have Nulls.
I'm new to Oracle (coming from SQL Server) so maybe I'm testing this wrong, but what I have is a bunch of queries one after another in a .sql window of Oracle SQL Developer. This seems to work for me just fine normally. When it gets to this query though I get the dreaded "single-row subquery" error.
Here's the query I've tried a few different ways:
UPDATE VchrImpDetailCombined vchr
SET (Reviewer, Reviewer_Team, Reviewer_Code) =
(SELECT DISTINCT b.Reviewer, b.Reviewer_Team, b.Reviewer_Code
FROM GlobPMSDeptIdMapping b
WHERE b.Dept_Id = vchr.Dept_Id)
WHERE vchr.Reviewer_Code IS NULL
AND vchr.Business_L1 = 'CF'
AND vchr.Dept_ID IS NOT NULL;
or
UPDATE VchrImpDetailCombined vchr
SET (Reviewer, Reviewer_Team, Reviewer_Code) =
(SELECT DISTINCT b.Reviewer, b.Reviewer_Team, b.Reviewer_Code
FROM GlobPMSDeptIdMapping b
inner join VchrImpDetailCombined a
on b.Dept_Id = a.Dept_Id
WHERE b.Dept_Id = vchr.Dept_Id)
WHERE vchr.Reviewer_Code IS NULL
AND vchr.Business_L1 = 'CF'
AND vchr.Dept_ID IS NOT NULL;
I've tried a few other things as well such as doing "WHERE EXISTS SELECT blahblah", or "WHERE b.Dept_ID IS NOT NULL", etc.
Now, given my example data above, the subquery should have 0 records, keeping in mind there actually isn't a ZZZ record in Table A like my example, just the ACM and EOT. Table B simply doesn't have records with the matching Dept_ID in Table A. So my expectation would be for a 0 record update and happily moving along to the next query.
When I run these queries in a string of other queries I get the error. If I run the query all by its lonesome I simply get a "3 rows updated" which seems odd that anything is updating considering there should be no matches. But the 3 rows updated would seem to match the 3 ACM and EOT records even though Table B has nothing to update from given the criteria.
I must be missing something obvious, but I just can't seem to grasp it. There's a bajillion of these ORA-01427 questions so I was so sure I could find the answer already out there, but couldn't seem to find it.
Any ideas?
You need to instruct Oracle that it should perform the update only when there are data with which to do so (and I would have expected such to be needed for SQL Server as well, but I'm uncertain). This will overcome that obstacle, at the expense of performing an additional subquery:
UPDATE VchrImpDetailCombined vchr
SET (Reviewer, Reviewer_Team, Reviewer_Code) = (
SELECT b.Reviewer, b.Reviewer_Team, b.Reviewer_Code
FROM GlobPMSDeptIdMapping b
WHERE b.Dept_Id = vchr.Dept_Id
)
WHERE vchr.Reviewer_Code IS NULL
AND vchr.Business_L1 = 'CF'
AND vchr.Dept_ID IN (
SELECT Dept_Id
FROM GlobPMSDeptIdMapping
);
Per my comment on the question, I removed the DISTINCT from the (original) subquery, as it's either unneeded or ineffective.

SQL Query - Ensure a row exists for each value in ()

Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.