I am interested in if and what the difference is between these 2 queries are:
This one has SELECT TOP 1
UPDATE tblTemp SET [SOH] = (Select top 1 (tblstock.[Stock On Hand (Base UOM)])
FROM tblstock
WHERE tblTEMP.[ID] = tblstock.[ID])
and this one does not
UPDATE tblTemp SET [SOH] = (Select tblstock.[Stock On Hand (Base UOM)])
FROM tblstock
WHERE tblTEMP.[ID] = tblstock.[ID]
The first one affects more rows.
So my question is: Are they doing the same thing?
First, the second is more appropriately written as:
UPDATE tblTemp
SET [SOH] = tblstock.[Stock On Hand (Base UOM)]
FROM tblstock
WHERE tblTEMP.[ID] = tblstock.[ID];
(There is no reason to have a subquery with no FROM clause; it is a wasted SELECT.)
Second, they do not do the same thing. The first will set SOH to NULL if there is no match. The second will not change the existing value.
This ignores what happens when there are multiple matches. SQL Server does not specify which row is used for the update. So, even the same query might produce different results on different runs on the same data.
In your first query, your sub-query includes your where clause. So it will be updating all in the tblTemp table.
The second query has the where clause supplied outside of the sub-query, so applies this where to the tblTemp table.
Related
I have two tables, on one there are all the races that the buses do
dbo.Courses_Bus
|ID|ID_Bus|ID_Line|DateHour_Start_Course|DateHour_End_Course|
On the other all payments made in these buses
dbo.Payments
|ID|ID_Bus|DateHour_Payment|
The goal is to add the notion of a Line in the payment table to get something like this
dbo.Payments
|ID|ID_Bus|DateHour_Payment|Line|
So I tried to do this :
/** I first added a Line column to the dbo.Payments table**/
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN [dbo].[Courses_Bus] AS Table_B
ON Table_A.ID_Bus = Table_B.ID_Bus
AND Table_A.DateHour_Payment BETWEEN Table_B.DateHour_Start_Course AND Table_B.DateHour_End_Course
And this
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN (
SELECT
P.*,
CP.ID_Line AS ID_Line
FROM
[dbo].[Payments] AS P
INNER JOIN [dbo].[Courses_Bus] CP ON CP.ID_Bus = P.ID_Bus
AND CP.DateHour_Start_Course <= P.Date
AND CP.DateHour_End_Course >= P.Date
) AS Table_B ON Table_A.ID_Bus = Table_B.ID_Bus
The main problem, apart from the fact that these requests do not seem to work properly, is that each table has several million lines that are increasing every day, and because of the datehour filter (mandatory since a single bus can be on several lines everyday) SSMS must compare each row of the second table to all rows of the other table.
So it takes an infinite amount of time, which will increase every day.
How can I make it work and optimise it ?
Assuming that this is the logic you want:
UPDATE p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course;
To optimize this query, then you want an index on Courses_Bus(ID_Bus, DateHour_Start_Course, DateHour_End_Course).
There might be slightly more efficient ways to optimize the query, but your question doesn't have enough information -- is there always exactly one match, for instance?
Another big issue is that updating all the rows is quite expensive. You might find that it is better to do this in loops, one chunk at a time:
UPDATE TOP (10000) p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course
WHERE p.Line IS NULL;
Once again, though, this structure depends on all the initial values being NULL and an exact match for all rows.
Thank you Gordon for your answer.
I have investigated and came with this query :
MERGE [dbo].[Payments] AS p
USING [dbo].[Courses_Bus] AS cb
ON p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment>= cb.DateHour_Start_Course AND
p.DateHour_Payment<= cb.DateHour_End_Course
WHEN MATCHED THEN
UPDATE SET p.Line = cb.ID_Ligne;
As it seems to be the most suitable in an MS-SQL environment.
It also came with the error :
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I understood this to mean that it finds several lines with identical
[p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment >= cb.DateHour_Start_Course AND
p.DateHour_Payment <= cb.DateHour_End_Course]
Yes, this is a possible case, however the ID is different each time.
For example, if two blue cards are beeped at the same time, or if there is a loss of network and the equipment has been updated, thus putting the beeps at the same time. These are different lines that must be treated separately, and you can obtain for example:
|ID|ID_Bus|DateHour_Payments|Line|
----------------------------------
|56|204|2021-01-01 10:00:00|15|
----------------------------------
|82|204|2021-01-01 10:00:00|15|
How can I improve this query so that it takes into account different payment IDs?
I can't figure out how to do this with the help I find online. Maybe this method is not the right one in this context.
I'm a Postgresql newbie, so still struggling a little bit here. Please be gentle.
I'm left joining three tables, and would like to be able to use a case statement to introduce another column that brings across a desired value from one column based on another. I'm going to guess that my INNER JOIN and CASE statements are back to front, but I'm not sure how to rearrange them without breaking the intent.
Basically: Where model_best_fit == SUNNY, then I'd like a new column with name applied_f_model_hours_above4k to have the value from the column hoursabove4k_sunny
Code sample:
SELECT *
FROM px_fuel_weathercell
INNER JOIN f_descriptions ON px_f_weathercell.px_id = f_descriptions.fuel_id
INNER JOIN dailywx ON px_f_weathercell.fid_new_wx_cells = dailywx.location
CASE best_model_fit
WHEN 'SUNNY' then hoursabove4k_sunny
END applied_f_model_hours_above4k
WHERE best_model_fit = 'SUNNY' /* limiting my test case here, clause will be removed later */
LIMIT 1000;
The error is as follows:
ERROR: syntax error at or near "CASE"
LINE 5: CASE best_model_fit
^
SQL state: 42601
Character: 210
Thank you for any help you can offer.
Bonus points: CASE seems slow. 45 seconds to run this query. dailywx has 400,000 rows, px_f_weathercell has 6,000,000 rows. Is there a faster way to do this?
EDIT:
Made the following edit, no getting a column full of nulls when the column desired has numbers (including 0) in it. Both columns are of type double.
EDIT2: Updated a couple of table names to indicate where columns are coming from. Also updated to show left join. I've also used PGTune to set some recommended settings in order to address a situation where the process was disk bound. I've also set an index on px_f_weathercell.fid_new_wx_cells and px_f_weathercell.px_id. This has resulted in 100,000 records returning in approx 5-7 seconds. I'm still recieving null values from the CASE statement, however.
SELECT *,
CASE best_model_fit
WHEN 'SUNNY' then dailywx.hoursabove4k_sunny
END applied_f_model_hours_above4k
FROM px_fuel_weathercell
LEFT JOIN f_descriptions ON px_f_weathercell.px_id = f_descriptions.fuel_id
LEFT JOIN dailywx ON px_f_weathercell.fid_new_wx_cells = dailywx.location
WHERE fuel_descriptions.best_model_fit = 'SUNNY' /* limiting my test case here, clause will be removed later */
LIMIT 1000;
In a table, all rows have the same columns. You cannot have a column that exists only for some rows. Since a query result is essentially a table, that applies there as well.
So having NULL or 0 as a result for the rows where the information does not apply is your only choice.
The reason why the CASE expression is returning NULL is that you have no ELSE branch. If none of the WHEN conditions applies, the result is NULL.
The performance of the query is a different thing. You'd need to supply EXPLAIN (ANALYZE, BUFFERS) output to analyze that. But when joining big tables, it is often beneficial to set work_mem high enough.
I'm working in a query window in SSMS.
Using 3 tables:
WORK_ORDER wo
An order to fabricate a part
OPERATION op
An operation in the fabrication of the part (laser, grinding, plating, etc.)
PART pt
A unique record defining the part
My objective is to report on the status of an operation (say #3) (#total parts ordered, #completed parts), but additionally to include the number of parts that have completed the previous operation (#2) in the sequence and are ready for the process. My solution was to use the LAG function, which works perfectly when the nested select statement below is run independently, but I get an avg of 4X duplication in my results, and my Completed_QTY_PREV_OP column is not displayed. I am aware that's because it's not in the parent select statement, but I wanted to correct the join first. I'm guessing the two problems are related.
Footnote: The WHERE contains a filter that you can ignore. The parent select statement works perfectly without the joined subquery.
Here's my sql:
SELECT op.RESOURCE_ID, pt.USER_5 AS PRODUCT, wo.PART_ID, wo.TYPE, wo.BASE_ID,
wo.LOT_ID, wo.SPLIT_ID, wo.SUB_ID, op.SEQUENCE_NO, pt.DESCRIPTION,
wo.DESIRED_QTY, op.FULFILLED_QTY AS QTY_COMP, op.SERVICE_ID, op.DISPATCHED_QTY, wo.STATUS
FROM dbo.WORK_ORDER wo INNER JOIN
dbo.OPERATION op ON wo.TYPE = op.WORKORDER_TYPE
AND wo.BASE_ID = op.WORKORDER_BASE_ID
AND wo.LOT_ID = op.WORKORDER_LOT_ID
AND wo.SPLIT_ID = op.WORKORDER_SPLIT_ID
AND wo.SUB_ID = op.WORKORDER_SUB_ID INNER JOIN
dbo.PART pt ON wo.PART_ID = pt.ID
LEFT OUTER JOIN
--The nested select statement works by itself in a query window,
--but the JOIN throws an error.
(SELECT
pr.WORKORDER_TYPE, pr.WORKORDER_BASE_ID, pr.WORKORDER_LOT_ID,
pr.WORKORDER_SPLIT_ID, pr.WORKORDER_SUB_ID, pr.SEQUENCE_NO,
LAG (COMPLETED_QTY, 1) OVER (ORDER BY pr.WORKORDER_TYPE, pr.WORKORDER_BASE_ID,
pr.WORKORDER_LOT_ID, pr.WORKORDER_SPLIT_ID, pr.WORKORDER_SUB_ID, pr.SEQUENCE_NO) AS COMP_QTY_PREV_OP
FROM dbo.OPERATION AS pr) AS prev
--End of nested select
ON
op.WORKORDER_TYPE = prev.WORKORDER_TYPE AND
op.WORKORDER_BASE_ID = prev.WORKORDER_BASE_ID AND
op.WORKORDER_LOT_ID = prev.WORKORDER_LOT_ID AND
op.WORKORDER_SPLIT_ID = prev.WORKORDER_SPLIT_ID AND
op.WORKORDER_SUB_ID = prev.WORKORDER_SUB_ID
WHERE (NOT (op.SERVICE_ID IS NULL)) AND (wo.STATUS = N'R')
You haven't given enough information for a definitive answer, so instead I will give you an approach to debugging this.
You are getting unexpected rows as a result of a JOIN. This means that your JOIN condition is not matching the two sides of the JOIN on a one-to-one basis. There are multiple rows in the table being JOINed that meet the JOIN conditions.
To find these rows, temporarily change your SELECT list to SELECT *. Do this both in the outer SELECT, and in the derived table. Look through the columns being returned by the JOINed table, and find the values that you didn't expect to be returned.
Since the JOIN that causes the issue is the last one, they will be all the way to right of the result of a SELECT *.
Then add more conditions to the JOIN to eliminate the unwanted rows from the results.
I simplified the whole query by first creating a temp table filled by the previously nested SELECT, and then joining to it from the parent SELECT.
Works perfectly now. Thanks for looking.
PS: I apologize for the confusion about an error message. I noticed after I posted that I had an old comment in the code regarding an error. The error had been resolved before posting, but I neglected to remove the comment.
I'm testing this UPDATE statement to update all 4%, 8%, and 9% parts in our database. I'm trying to get the QTY_MULTIPLE value to match the CASES per layer value.
So, the REGEXP_LIKE, will eventually read:
> Regexp_like ( sp.part_no, '^4|^8|^9' )
It doesn't right now because I'm testing three specific parts. I want to make sure the rest of the statement works the way that it's supposed to.
Here's what I'm testing with:
UPDATE SALES_PART_TAB sp
SET sp.qty_multiple = ( SELECT cases_per_layer
FROM HH_INV_PART_CHARS
WHERE sp.part_no = HH_INV_PART_CHARS.part_no AND
sp.contract = HH_INV_PART_CHARS.contract )
WHERE Regexp_like ( sp.part_no, '^410-0017|^816-0210|^921-0003' ) AND
EXISTS
( SELECT sp.contract,
sp.part_no,
sp.qty_multiple,
HH_INV_PART_CHARS.cases_per_layer
FROM SALES_PART sp
inner join HH_INV_PART_CHARS
ON sp.part_no = HH_INV_PART_CHARS.part_no AND
sp.contract = HH_INV_PART_CHARS.contract
WHERE sp.qty_multiple != HH_INV_PART_CHARS.cases_per_layer );
When I run this statement, it updates 16 rows.
However, I'm expecting it to update 15 rows. I reached this conclusion by running the following SELECT statement:
SELECT sp.contract,
sp.catalog_no,
sp.qty_multiple,
HH_INV_PART_CHARS.cases_per_layer
FROM SALES_PART sp
inner join HH_INV_PART_CHARS
ON sp.part_no = HH_INV_PART_CHARS.part_no AND
sp.contract = HH_INV_PART_CHARS.contract
WHERE sp.qty_multiple != HH_INV_PART_CHARS.cases_per_layer AND
Regexp_like ( sp.part_no, '^410-0017|^816-0210|^921-0003' )
I think the problem I'm having is the UPDATE statement is updating all rows where the part_no and contract from the sales_part table exist on HH_INV_PART_CHARS. It's not limiting the update to part where the qty_multiple isn't equal to the cases_per_layer (which is what I want).
I'm a little stumped right now. I've been trying to work on both the subqueries but I'm not having any luck identifying where the problem is.
The Regexp_like ( sp.part_no,...) in the update query refers to SALES_PART_TAB.spart_no, while in the second query it refers to SALES_PART.spart_no.
One of the causes of the fog is that you redefine the alias sp in the same query, and so the exists subquery does not relate in any way to the record that is being updated. This means that if you would throw away the exists condition, you would still update 16 records. It seems very unlikely that this is what you want.
Use a different alias, so you can distinguish which table you want to refer to.
I have an update query that has an inner join. I expect this query to return two columns because of the join, but it seems that the QUERY is taking only the first row and using that to update the data while ignoring the rest.
Here is my update command
UPDATE [mamd]
SET [Brand_EL] = IIF(CHARINDEX('ELECT', UPPER([mml].[Brand_Desc])) > 0, 'YES', [Brand_EL])
FROM [mamd] [m]
INNER JOIN [ior] [ir] ON [ir].[CLIENT_CUSTOMER_ID] = [m].[CustomerId] COLLATE Latin1_General_CI_AS
INNER JOIN [maslist] [mml] ON [mml].[Model] = [ir].[MODEL] COLLATE Latin1_General_CI_AS
If I do a select like this
SELECT [ir].[CLIENT_CUSTOMER_ID], IIF(CHARINDEX('ELECT', UPPER([mml].[Brand_Desc])) > 0, 'YES', [Brand_EL])
FROM [mamd] [m]
INNER JOIN [ior] [ir] ON [ir].[CLIENT_CUSTOMER_ID] = [m].[CustomerId] COLLATE Latin1_General_CI_AS
INNER JOIN [maslist] [mml] ON [mml].[Model] = [ir].[MODEL] COLLATE Latin1_General_CI_AS
I get the following data returned
CLIENT_CUSTOMER_ID | Brand_EL
-------------------+----------
980872 | NO
980872 | YES
The reason I think it's only taking one record is because
The value NEVER changes to "YES"
When I run the update command it says only 1 row updated even though it should have gone through two
One thing that might be contributing to the problem is that [mamd] does NOT contain multiple records for that same user; It's a unique field. Since it's a unique field and therefore only has one row, does that mean that it will run that join only once? If that's the case, is there a better way I can do this without nested selects to generate the results?
UPDATE
Hey Everyone,
Just as an update, I took Gordons Advice and use aggregation. In this example that I have, I only cared if the value was "YES' because I only need to know if the customer bought a specific product. So what I ended up doing was grouping by the Customer ID and using the MAX function. If the customer bought a product, "YES" would bubble up to the top. If he didn't it would stay as NO or NULL. In that event, it wouldn't matter.
The behavior is correct and documented, although not in a very clear way:
Use caution when specifying the FROM clause to provide the criteria
for the update operation. The results of an UPDATE statement are
undefined if the statement includes a FROM clause that is not
specified in such a way that only one value is available for each
column occurrence that is updated, that is if the UPDATE statement is
not deterministic. For example, in the UPDATE statement in the
following script, both rows in Table1 meet the qualifications of the
FROM clause in the UPDATE statement; but it is undefined which row
from Table1 is used to update the row in Table2.
What this is trying to say is that a row is only updated once by the update. Which value gets used is indeterminate. So, if you need to decide how you want to handle the multiple matches.