I have photos table onto which I'm trying to insert a new column from another table by filtering data with a WHERE clause. Here is the query I'm trying to execute:
ALTER TABLE photos ADD COLUMN new_count BIGINT;
INSERT INTO photos (new_count)
SELECT c.cnt
FROM c
WHERE
c.created = photos.created;
But in the result I am getting an error:
Error : ERROR: invalid reference to FROM-clause entry for table "photos"
LINE 5: c.created = photos.created
How can avoid the error and copy the column?
I'm using PostgreSQL 9.4.
Join in the referenced table c with a FROM clause.
UPDATE photos p
SET new_count = c.cnt
FROM c
WHERE c.created = p.created
AND c.cnt IS NOT NULL; -- optional, depends on additional circumstances
Details in the manual.
Correlated subqueries (like demonstrated by #Andomar) are generally slower, and they also have a side effect that's typically undesirable:
If no matching row is found in c, the UPDATE proceeds regardless and new_count is set to NULL. This happens to be not wrong in this particular case (since the column is NULL by definition of the case). It's still pointless extra cost. Often, the effect is actively harmful, though.
The added condition AND c.cnt IS NOT NULL further avoids empty updates if the new value is no different from the old. Useless if c.cnt cannot be NULL. It's the simplified form of the generic check: AND c.cnt IS DISTINCT FROM p.new_count. Details (last paragraph):
How do I (or can I) SELECT DISTINCT on multiple columns?
Use insert to add rows to a table, and update to change existing rows. Here's an example of how to update a column in photos based on a link to the c table:
update photos p
set new_count =
(
select cnt
from c
where c.created = p.created
)
Related
A follow up question to SQL Server Merge: update only changed data, tracking changes?
we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.
So, the two questions we have are: is it possible to combine this very simple update into a single statement?
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
UPDATE tbladsgroups
SET tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.DisplayName <> t.DisplayName
....for each column.
Second question.
Can we record into a separate table/variable which record has been updated?
Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.
edit complete merge:
M
ERGE tblADSGroups AS TARGET
USING tblADSGroups_STAGING AS SOURCE
ON (TARGET.[SID] = SOURCE.[SID])
WHEN MATCHED
THEN UPDATE SET
TARGET.[Description]=CASE
WHEN source.[Description] != target.[Description] THEN(source.[Description]
)
ELSE target.[Description] END,
TARGET.[displayname] = CASE
WHEN source.[displayname] != target.[displayname] THEN source.[displayname]
ELSE target.[displayname] END
...other columns cut for brevity
WHEN NOT MATCHED BY TARGET
THEN
INSERT (
[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
VALUES (
source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
WHEN NOT MATCHED BY SOURCE
THEN
UPDATE SET ACTION='Deleted'
You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.
For example
UPDATE t
SET Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
AND NOT EXISTS (
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
);
You can do a similar thing with MERGE, if you also want to INSERT
MERGE tbladsgroups t
USING tbladsgroups_staging s
ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS ( -- do NOT place this condition in the ON
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
)
THEN UPDATE SET
Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
WHEN NOT MATCHED
THEN INSERT (ID, Description, DisplayName)
VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
;
We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.
What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/
Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.
To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.
Our code ends up looking like the following:
INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
Compare stage vs dimension using the INTERSECT and store all matches in a table variable
Using the table variable, "close" all matching rows in the Dimension
Using the table variable, INSERT the new rows
If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.
I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?
groups description groups display staging description staging display
SomeValue Show Me SOME other Value Show Me
Try This Attempt Try This Working on it
Both Diff Changes Both Are Diff Change Me
So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.
In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
OR s.DisplayName <> t.DisplayName
Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..
select
t.*,
s.Description,
s.DisplayName
from
tblAdsGroups t
JOIN tblAdsGroups_Staging s
on t.sid = s.sid
Then you always have the correct description and display name and dont have to keep synching updates between them.
Can SQLite distinguish between a column from some aliased table, e.g. table1.column and a column that is aliased with the same name, i.e. column, in the SELECT statement?
This is relevant because I need to refer to the column that I construct in the SELECT statement later on in a HAVING clause, but must not confuse it with the column in aliased table. To my knowledge, I cannot alias the table to be constructed in my SELECT statement (without reverting to some nasty work-around like SELECT * FROM (SELECT ...) AS alias) to ensure both are distinguishable.
Here's a stripped down version of the code I am concerned with:
SELECT
a.entity,
b.DATE,
TOTAL(a.dollar_amount*b.ret_usd)/TOTAL(a.dollar_amount) AS ret_usd
FROM holdings a
LEFT JOIN returns b
ON a.stock = b.stock AND
a.DATE = b.DATE
GROUP BY
a.entity,
b.DATE
HAVING
ret_usd NOT NULL
Essentially, I want to get rid of groups for which I cannot find any returns and thus would show up with NULL values. I am not using an INNER JOIN because in my production code I merge multiple types of returns - for some of which I may have no data. I only want to drop those groups for which I have no returns for any of the return types.
To my understanding, the SQLite documentation does not address this issue.
LEFT JOIN all the return tables, then add a WHERE something like
COALESCE(b.ret_used, c.ret_used, d.ret_used....) is not NULL
You might need a similar strategy to determine which ret_used in the TOTAL. FYI, TOTAL never returns NULL.
I have seen many answers that will update a table 1 when rows exist in table 2, but not one that works using a LEFT JOIN in selecting the rows, (for better performance). I have a solution to the update, but it will perform badly as it uses NOT IN.
So this SQL will update the tables as required, but looks to be very costly when run against large tables making it difficult to use.
update header
set status='Z'
where status='A'
and header.id not in (
select headerid
from detail
where detail.id between 0 and 9999999
);
Now I have a well performing query using a LEFT JOIN which returns the correct ids, but I have not been able to insert it into an update statement to give the same results.
The select statement is
select header.id
from header
left join detail on detail.headerid = header.id
where detail.headerid is null
and header.status='A'
So if I use this in the update statement as in:
update header
set status = 'Z'
where header.id = (
select header.id
from header
left join detail on detail.headerid = header.id
where detail.headerid is null and header.status='A'
)
Then I fail with:
ORA-01427: single-row subquery returns more than one row
I am expecting multiple header.id to be returned and want to update all these rows.
So I am still searching for a solution which will update the returned rows, using a well performing SQL select to return rows in table header, that do not have related rows in the detail table.
Any help would be appreciated, otherwise I will be left with the badly performing update.
Since you are expecting multiple header ID & the sub query returns multiple ID as you expected you should use IN
Try this
Update
header
Set status = 'Z'
Where
header.id IN (select
header.id
From
header
Left join
detail
On
detail.headerid = header.id
Where
detail.headerid is null
And
header.status='A')
I wouldn't put the condition on the outer table in the subquery. I am more comfortable writing this logic as:
update header h
set status = 'Z'
where not exists (select 1
from detail d
where d.headerid = h.id
) and
h.status = 'A';
If performance is an issue, indexes on detail(headerid) and header(status, id) and help.
typical, the next place I looked, I found an answer...
update header set status='Z' where not exists (select detail.headerid from detail where detail.headerid = header.id) and status = 'A'
Oh well, at least its here if anyone else wants to find it.
As the error states your subquery is returning more than one rows and you are using a = sign in your update query. = sign is not allowed if your query returns more than one records use either IN, NOT IN , EXISTS, NOT EXISTS as per your requirement
In an update statement for a temp table, how does SQL Server decide which value to use when there are multiple values returned, for example:
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
In this situation the problem is more than one dte_start_date is returned for each id value in the temp table. There is there's no index or unique value in the tables I'm working on so I need to know how SQL Server will choose between the different values.
It is non-deterministic. See the following example for a better understanding. Though it is not exactly the same scenario explained here, it is pretty similar
When the single value is to be retrieved from the database also use the SET statement with a query to set the value. For example:
SET #v_user_user_id = (SELECT u.user_id FROM users u WHERE u.login = #v_login);
Reason: Unlike Oracle, SQL Server does not raise an error if more than one row is returned from a SELECT query that is used to populate variables. The above query will throw an exception whereas the following will not throw an exception and the variable will contain a random value from the queried table(s).
SELECT #v_user_user_id = u.user_id FROM users u WHERE u.login = #v_login;
It is non-deterministic which value is used if you have a one two many relationship.
In MS-SQL-Sever (>=2005) i would use a CTE since it's a readable way to specify what i want using ROW_NUMBER. Another advantage of a CTE is that you can change it easily to do a select instead of an update(or delete) to see what will happen.
Assuming that you want the latest record(acc.to dte_start_date) for every id:
WITH CTE AS
(
SELECT a.*, rn = ROW_NUMBER() OVER (PARTITION BY a.id
ORDER BY a.dte_start_date DESC)
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
)
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A INNER JOIN CTE ON A.ID = CTE.ID
WHERE CTE.RN = 1
I need to update a field called FamName in a table called Episode with randomly generated Germanic names from a different table called Surnames which has a single column called Surname.
To do this I have first added an ID field and NONCLUSTERED INDEX to my Surnames table
ALTER TABLE Surnames
ADD ID INT NOT NULL IDENTITY(1, 1);
GO
CREATE UNIQUE NONCLUSTERED INDEX idx ON Surnames(ID);
GO
I then attempt to update my Episode table via
UPDATE E
SET E.FamName = S.Surnames
FROM Episode AS E
INNER LOOP JOIN Surnames AS S
ON S.ID = (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames)));
GO
where I am attempting to force the query to 'loop' using the LOOP join hint. Of course if I don't force the optimizer to loop (using LOOP) I will get the same German name for all rows. However, this query is strangely returning zero rows affected.
Why is this returning zero affected rows and how can this be amended to work?
Note, I could use a WHILE loop to perform this update, but I want a succinct way of doing this and to find out what I am doing wrong in this particular case.
You cannot (reliably) affect query results with join hints. They are performance hints, not semantic hints. You try to rely on undefined behavior.
Moving the random number computation out of the join condition into one of the join sources prevents the expression to be treated as a constant:
UPDATE E
SET E.FamName = S.Surnames
FROM (
SELECT *, (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames))) AS SurnameID
FROM Episode AS E
) E
INNER LOOP JOIN Surnames AS S ON S.ID = E.SurnameID
The derived table E adds the computed SurnameID as a new column.
You don't need join hints any longer. I just tested that this works in my specific test case although I'm not whether this is guaranteed to work.