Using correlated subquery in SQL Server update statement gives unexpected result - sql

I'm introducing a primary key column to a table that doesn't have one yet. After I have added a normal field Id (int) with a default value of 0 I tried using the following update statement to create unique values for each record:
update t1
set t1.id = (select count(*) from mytable t2 where t2.id <> t1.id)
from mytable t1
I would expect the subquery to be executed for each row because I'm referencing t1. Each time the subquery would be executed the count should be one less but it doesn't work.
The result is that Id is still 0 in every record. I have used this before on other DBMS with success. I'm using SQL Server 2008 here.
How do I generate unique values for each record and update the Id field?

Trying to explain why it doesn't work as you expect:
I would expect the subquery to be executed for each row because I'm referencing t1.
It is executed and it can affect all rows. But an UPDATE stetement is one statement and it is executed as one statement that affects a whole table (or a part of it if you have a WHERE clause).
Each time the subquery would be executed the count should be one less but it doesn't work.
You are expecting the UPDATE to be executed with one evaluation of the subquery per row. But it is one statement that is first evaluated - for all affected rows - and then the rows are changed (updated). (A DBMS may do it otherwise but the result should be nonetheless as if it was doing it this way).
The result is that Id is still 0 in every record.
That's the correct and expected behaviour of this statement when all rows have the same 0 value before execution. The COUNT(*) is 0.
I have used this before on other DBMS with success.
My "wild" guess is that you have used it in MySQL. (Correction/Update: my guess was wrong, this syntax for Update is not valid for MySQL, apparently the query was working "correctly" in Firebird). The UPDATE does not work in the standard way in that DBMS. It works - as you have learned - row by row, not with the full table.
I'm using SQL Server 2008 here.
This DBMS works correctly with UPDATE. You can write a different Update statement that would have the wanted results or, even better, use an autogenerated IDENTITY column, as others have advised.

The SQL is updating every row with the number of records where the ID doesn't equal 0. As all the rows ID equal 0 then there are no rows that are not equal to 0, and hence nothing gets updated.
Try looking at this answer here:
Adding an identity to an existing column

Related

Counting results in SQLite, given query with functions

As you may (or may not) already know, SQLite does not provide information about total number of results from the query. One has to wrap the query in SELECT count(*) FROM (original query); in order to get row count.
This worked perfectly fine for me, until one of users created custom SQL function (you can define your own functions in SQLite) that does INSERT into another, unrelated table. Then he executes query:
SELECT customFunction() FROM primaryTable WHERE primaryKeyColumnId = 1;
The query returns always 1 row, that is certain. It turns out that customFunction() was called twice (and inserted to that other table 2 rows) and that's because my application called his query as usuall and then called count(*) on that query as a followup.
How to approach this problem? How to execute only the original query and still have a row count from SQLite?
I'm using SQLite (3.13.0) C API.
You either have to remove such function calls from the query, or you cannot get the row count before actually having stepped through all the result rows.

Can SQL return different results for two runs of the same query using ORDER BY?

I have the following table:
CREATE TABLE dbo.TestSort
(
Id int NOT NULL IDENTITY (1, 1),
Value int NOT NULL
)
The Value column could (and is expected to) contain duplicates.
Let's also assume there are already 1000 rows in the table.
I am trying to prove a point about unstable sorting.
Given this query that returns a 'page' of 10 results from the first 1000 inserted results:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value
My intuition tells me that two runs of this query could return different rows if the Value column contains repeated values.
I'm basing this on the facts that:
the sort is not stable
if new rows are inserted in the table between the two runs of the query, it could possibly create a re-balancing of B-trees (the Value column may be indexed or not)
EDIT: For completeness: I assume rows never change once inserted, and are never deleted.
In contrast, a query with stable sort (ordering also by Id) should always return the same results, since IDs are unique:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value, Id
The question is: Is my intuition correct? If yes, can you provide an actual example of operations that would produce different results (at least "on your machine")? You could modify the query, add indexes on the Values column etc.
I don't care about the exact query, but about the principle.
I am using MS SQL Server (2014), but am equally satisfied with answers for any SQL database.
If not, then why?
Your intuition is correct. In SQL, the sort for order by is not stable. So, if you have ties, they can be returned in any order. And, the order can change from one run to another.
The documentation sort of explains this:
Using OFFSET and FETCH as a paging solution requires running the query
one time for each "page" of data returned to the client application.
For example, to return the results of a query in 10-row increments,
you must execute the query one time to return rows 1 to 10 and then
run the query again to return rows 11 to 20 and so on. Each query is
independent and not related to each other in any way. This means that,
unlike using a cursor in which the query is executed once and state is
maintained on the server, the client application is responsible for
tracking state. To achieve stable results between query requests using
OFFSET and FETCH, the following conditions must be met:
The underlying data that is used by the query must not change. That is, either the rows touched by the query are not updated or all
requests for pages from the query are executed in a single transaction
using either snapshot or serializable transaction isolation. For more
information about these transaction isolation levels, see SET
TRANSACTION ISOLATION LEVEL (Transact-SQL).
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
Although this specifically refers to offset/fetch, it clearly applies to running the query multiple times without those clauses.
If you have ties when ordering the order by is not stable.
LiveDemo
CREATE TABLE #TestSort
(
Id INT NOT NULL IDENTITY (1, 1) PRIMARY KEY,
Value INT NOT NULL
) ;
DECLARE #c INT = 0;
WHILE #c < 100000
BEGIN
INSERT INTO #TestSort(Value)
VALUES ('2');
SET #c += 1;
END
Example:
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
DBCC DROPCLEANBUFFERS; -- run to clear cache
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
The point is I force query optimizer to use parallel plan so there is no guaranteed that it will read data sequentially like Clustered index probably will do when no parallelism is involved.
You cannot be sure how Query Optimizer will read data unless you explicitly force to sort result in specific way using ORDER BY Id, Value.
For more info read No Seatbelt - Expecting Order without ORDER BY.
I think this post will answer your question:
Is SQL order by clause guaranteed to be stable ( by Standards)
The result is everytime the same when you are in a single-threaded environment. Since multi-threading is used, you can't guarantee.

Bahaviour of SQL update using one-to-many join

Imagine I have two tables, t1 and t2. t1 has two fields, one containing unique values called a and another field called value. Table t2 has a field that does not contain unique values called b and a field also called value.
Now, if I use the following update query (this is using MS Access btw):
UPDATE t1
INNER JOIN t2 ON t1.a=t2.b
SET t1.value=t2.value
If I have the following data
t1 t2
a | value b | value
------------ ------------
'm' | 0.0 'm'| 1.1
'm'| 0.2
and run the query what value ends up in t1.value? I ran some tests but couldn't find consistent behaviour, so I'm guessing it might just be undefined. Or this kind of update query is something that just shouldn't be done? There is a long boring story about why I've had to do it this way, but it's irrelevant to the technical nature of my enquiry.
This is known as a non deterministic query, it means exactly what you have found that you can run the query multiple times with no changes to the query or underlying data and get different results.
In practice what happens is the value will be updated with the last record encountered, so in your case it will be updated twice, but the first update will be overwritten by last. What you have absolutely no control over is in what order the SQL engine accesses the records, it will access them it whatever order it deems fit, this could be simply a clustered index scan from the begining, or it could use other indexes and access the clustered index in a different order. You have no way of knowing this. It is quite likely that running the update multiple times would yield the same result, because with no changes to the data the sql optimiser will use the same query plan. But again there is no guarantee, so you should not rely on a non determinstic query to get deterministic results.
EDIT
To update the value in T1 to the Maximum corresponding value in T2 you can use DMax:
UPDATE T1
SET Value = DMax("Value", "T2", "b=" & T1.a);
When you execute the query as you’ve indicated, the “value” that ends up in “t1” for the row ‘m’ will be, effectively, random, due to the fact that “t2” has multiple rows for the identity value ‘m’.
Unless you specifically specify that you want the maximum (max function), minimum (min function) or some-other aggregate of the collection of rows with the identity ‘m’ the database has no ability to make a defined choice and as such returns whatever value it first comes across, hence the inconsistent behaviour.
Hope this helps.

SQL error ORA 01427

I am trying to update one of the columns in my table by collecting the values from another table in the data store using this query
UPDATE tablename PT
SET DID = (select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
Note: Both the column names in the table are the same as entered
I get ORA 01427 error. Any idea about the issue?
I am trying to understand the other posts of this ORA error
As you can see here
SQL Error: ORA-01427: single-row subquery returns more than one row
This means that your sub-query
select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
is returning more than one row.
So, are you sure that distinct (did) is unique? Looks like it's not. I don't recommend using where rownum = 1 because you don't know which one of the values will be used to update; unless you use ORDER BY.
Your getting this error because your select statement can return more than one result. You can not update a single cell with a query that can potentially return more than one result.
A common approach to avoid this with many SQL languages is to use a top 1 or something like that to assure the engine that you will only return one result. Note that you have to do this even if you know the query will only return one result. Just because YOU know it doesn't mean that the engine knows it. The engine also has to protect you from future possibilities not just things as they are right this moment.
Update:
I noticed you updated your question to Oracle. So in that case you could limit the subquery to a single result using the where rownum = 1 clause. As other answer pointed out you'd have to use further logic to ensure that top 1 coming back is the right one. If you don't know which one is the right one then solve that first.
The thought also occurs to me that you might be misunderstanding what DISTINCT does. This ensures that the return results are unique - but there could still be multiple unique results.

Multiple rows returned by a subquery used as an expression

I have the following query which runs perfectly well on both Oracle and SQL Server 2008 however it doesn't seem to run on PostgreSQL. The query is intended to return a count of records that match the given criteria. Can someone explain the reason for this and also offer a solution to how this query can be modified to allow it to produce the expected result.
Query:
select count(*)
from tma_notices
where TNOT_NOTICE_TYPE ='0400'
and TNOT_NOTICE_STATUS = 'OK'
and tnot_notice_id >=
(
select NOTICE_NUM_AT_MIDNIGHT
from RWOL_COUNTER_QUERY_TYPE
where QUERY_TYPE = 'START_NOTICES_TODAY'
and USER_NAME = 'PUBLIC'
)
UPDATE: This error was caused by unforeseen duplicate records in the PostgreSQL database. Where the duplicates came from needs to be investigated.
It's pretty clear that the subquery could return a set of rows and the condition tnot_notice_id >= isn't valid if compared with a set of rows and not with only a single value.
Are you sure that exist a unique record that satisfy your where conditions?
If you want to avoid that behaviour, I suggest you to use tnot_notice_id >= ALL ( subquery )