Update table with using NEWID() function - sql

CREATE TABLE Products(Id INT, Name CHAR(100), DefaultImageId INT NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(1, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(2, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(3, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(4, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(5, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(1, 'B', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(2, 'B', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(3, 'B', NULL);
In general, I would update a table randomly like the following scripts.
update a
set DefaultImageId=1
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where a.name = b.name
order by newid()
)
However, I get some issue. It would update more/less then 2 rows. I try execute the following scripts many times for debug.
The results are not always only two records. If I remove the order by newid(), the number of output result will be fine. It seems like the problem in newid(). How can I solve this problem? Thanks
select *
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where a.name = b.name
order by newid()
)

You can try this:
UPDATE U SET DefaultImageId = 1
FROM(
SELECT TOP 2 * FROM dbo.Products WHERE Name = 'A' ORDER BY NEWID()
) AS U
However if you filter out the DefaultImageId=1 in inner query, it would be even better too. You should note that in this case, the inner query might produce less than 2 records.
UPDATE U SET DefaultImageId = 1
FROM(
SELECT TOP 2 * FROM dbo.Products WHERE Name = 'A' AND DefaultImageId <> 1 ORDER BY NEWID()
) AS U

I think you're just looking for a simple JOIN as
UPDATE A
SET DefaultImageId = 1
FROM Products A
JOIN
(
SELECT TOP 2 Id, Name
FROM Products B
ORDER BY NEWID()
) B ON A.Id = B.Id AND A.Name = B.Name;
Demo

SQL SERVER BUG:
I think you just hit the following SQL server bug, that is still under review since 2018.
https://feedback.azure.com/forums/908035-sql-server/suggestions/33272404-cte-containing-top-and-newid-producing-incorrect
WORK AROUND:
The only thing that seems to work, based on my stesting, is if you eliminate the outside reference from the subquery, things works as expected. You can eliminate the outside reference by doing any of the following as suggested by others here.
Option 1: Keeping the query as-is, except hard-coding the filter criteria inside the subquery instead of using an outside reference:
SELECT ...
FROM t1
WHERE
t1.id IN (SELECT TOP 2 id subquery with NO outside reference WHERE name = 'A' ORDER BY ORDER BY newid() )
AND t1.name = 'A'
....
Option 2: Using a JOIN with proper ON matching criteria instead of using an outside reference inside the subquery:
SELECT ...
FROM t1
INNER JOIN (SELECT TOP 2 id subquery with NO outside reference ORDER BY newid() ) t2 ON t1.id = t2.id and t1.name = t2.name
WHERE
t1.name = 'A'
....
OR
DETAILS:
Based on my testing, when all of the following are aligned, the SQL Engine returns an unpredictable result:
An outside table is referenced inside the subquery (or through CTE) as a filtering criteria
NEWID() function is used in the ORDER BY with a SELECT TOP statement (It does NOT happen with other non-deterministic functions like RAND() for example. Something specific to NEWID())
The SELECT TOP statement uses a number less than the number of records exists in that table
This seems to throw off the SQL engine. The result changes at every run: The number of records returned varies at every run between zero records and max number of records that would have been returned if SELECT TOP was not used.
The query plan does not make sense at all. See the screenshot here. TOP N returns more than it is supposed to and id IN () predicate returns random number of records (zero in this example). How could that be possible? I have no clue. Also, note that the id IN () predicate is applied before the left table scan, which shouldn't happen. I think the unreliable result has something to do with that. We'll have to wait until SQL Server team fixes the bug to be able write the queries the way you wrote.

As you are using Correlated subquery, where outer query name is referred by subquery name, you are getting different result set.
I would suggest you to use the name parameter twice to avoid this problem.
select *
from Products as a
where name='A'
and id in
(select top 2 id
from Products as b
where b.name = 'A'
order by newid())
Similarly, for update also, you change the query
update a
set DefaultImageId=1
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where b.name = 'A'
order by newid()
)

Something like this
This query selects 2 rows at random.
with top_2_cte(id, [name]) as (
select top 2 id, [name]
from Products
where name='A'
order by newid())
select a.*
from Products a
join
top_2_cte ttc on a.Id=ttc.id
and a.[Name]=ttc.[name];
Update statement
;with top_2_cte(id, [name]) as (
select top 2 id, [name]
from Products
where name='A'
order by newid())
update a
set DefaultImageId=1
from Products a
join
top_2_cte ttc on a.Id=ttc.id
and a.[Name]=ttc.[name];

Related

How To Query IDs that have different Patient Names

First, I understand that I should have a Primary Key on a Value Patient ID. A project was performed for ID conversions that did not go very well. So now I need to find all Patient IDs that have differnt Patient Names. There are 4 different DBs>Tables that contain info. For now I selected them into a Temp DB. Because I actually need all PIDs to be distinct across those DBs. Our application has tools to keep that synchronized. But due to some bad SQL work, I need to synchronize all the Data again.
PID NAME
1234 Johnson
1234 Johnson
4567 Jones
4567 Alexander
I am trying to write a query that will return the results of PID 4567 + NAME Values of Jones and Alexander.
You need to use a self-join. Please try this:
create table #temp
(id int, name varchar(30))
insert into #temp values (1,'johnson')
insert into #temp values (1,'johnson')
insert into #temp values (2,'james')
insert into #temp values (2,'Alex')
SELECT * FROM #temp WHERE id IN (
SELECT a.id FROM #temp a
JOIN #temp b on b.id = a.id AND b.name <> a.name
)
SELECT Min(PID), Name FROM [Table]
GROUP BY Name
HAVING Count(PID) = 1
SELECT PID,NAME
FROM TABLE
GROUP BY PID,NAME
HAVING COUNT(*) =1
I think this will do it
select p.pid, max(name), min(name), count(*) as cnt
from p
group by pid
having max(name) <> min(name)
or
select p1.pid, p1.name, p2.name
from p p1
join p p2
on p1.pid = p2.pid
and p1.name < p2.name
order by p1.pid, p1.name, p2.name
There are a lot of ways and some more optimized than others depending on which RDBMS system you are using. But typically this is a 2 step operations.
1) Find all of the PIDs that have more than 1 Name associated with it
2) Relate back to get the rest of the data you are seeking.
CREATE TABLE #T (
PID INT
,Name VARCHAR(25)
)
INSERT INTO #T (PID,Name) VALUES (1234,'Johnson'),(1234,'Johnson'),(4567,'Jones'),(4567,'Alexander')
SELECT
t2.*
FROM
(
SELECT
PID
FROM
#T t1
GROUP BY
PID
HAVING COUNT(DISTINCT Name) > 1
) dupes
INNER JOIN #T t2
ON dupes.PID = t2.PID
It is important when using a method such as the join or IN above that you use DISTINCT name because simplying counting * or name will return multiple occurrences of the same PID to name combination not simply duplicates.
If you only want the duplicate not all of the combinations. Using a RowNumber() or something can help you get to the answer a little more efficiently too. Or you can also use a method such as looking for existence of a non identical record, like so:
SELECT DISTINCT t1.PID, t1.Name
FROM
#T t1
WHERE
EXISTS (SELECT 1 FROM #t t2 WHERE t1.PID = t2.PID AND t1.Name <> t2.Name)
This way could perform faster for you depending on data sets etc. I would tend to stay away from solutions that use IN for cases like these.

SQL update statement with different table in from clause

Out of accident I noticed that the following query is actually valid:
UPDATE bikes
SET price = NULL
FROM inserted
WHERE inserted.owner_id = 123456
This is part of a trigger where someone forgot to join the original table to the inserted table. The result is that when the trigger is executed, all prices are set to NULL.
The correct SQL statement is this:
UPDATE bikes
SET price = NULL
FROM inserted
INNER JOIN bikes ON bikes.id=inserted.id
WHERE inserted.owner_id = 123456
How/why is this first statement valid?
Why wouldn't it be valid? SQL Server doesn't know what you're trying to do. It thinks you want to update all of the fields where some condition exists on another table. See the last update below.
SETUP
declare #table table
(
id int,
name varchar(10)
)
declare #itable table
(
id int,
name varchar(10)
)
insert into #table (id, name)
select 1,'abc' union
select 2,'def' union
select 3,'ghi' union
select 4,'jkl' union
select 5,'mno' union
select 6,'pqr'
insert into #itable (id, name)
select 1,'abc' union
select 2,'def' union
select 3,'ghi' union
select 4,'jkl' union
select 5,'mno' union
select 6,'pqr'
All names on #table will change to zzz
update #table
set name = 'zzz'
from #itable i
where i.id = 1
select * from #itable
select * from #table
All names where id = 1 on #table becomes yyy
update #table
set name = 'yyy'
from #itable i
inner join #table t on i.id = t.id
where i.id = 1
select * from #itable
select * from #table
This will NOT update anything
update #table
set name = 'aaa'
from #itable i
where i.id = 133
select * from #itable
select * from #table
The first statement does not work as expected because it is missing the entire INNER JOIN line with the bikes and inserted table. Without that SQL Server will update all rows as all rows will qualify for an update when the inserted.owner_id = 123456.
You can reproduce this outside of the trigger in TSQL like :
update bikes set price =null
from SomeOtherTable
where SomeOtherTable.SomeColumn = 'some_value_that_exists'
This is syntactically valid statement in SQL Server. If the Intention is to update bikes table based on existance of a row in some unrelated table that cant be joined because the 2 tables arent related then this is how you would do it. But that is not your requirement. Hence why it updates all records instead of only those that match the bikes.id In programming terms this is called as a logical bug.
The inner join makes it more restrictive and forces to to update only those rows that match the join condition between the 2 tables (bikes.id=inserted.id comparison )
In Simple terms the from clause is optional..Consider below query..
update table
set id=10
This has one table right after update clause ,sql just updates it..
now consider below query..
update table1
set id=40
from table2..
What do you think SQL does,it updates all the rows same as first query..
unless you refer to another table in from clause and join like below
update t1
set t1.id=40
from
table1 t1
join
table2 t2
on t1.id=t2.id
below is the from clause explanation in update syntax stripped down to show only to relevant parts..
If the object being updated is the same as the object in the FROM clause and there is only one reference to the object in the FROM clause, an object alias may or may not be specified. If the object being updated appears more than one time in the FROM clause, one, and only one, reference to the object must not specify a table alias. All other references to the object in the FROM clause must include an object alias
As long as above rules are valid (as in your case),SQL will happily update table found immediately after update clause

Find Modified/New/Deleted Records Between Two Tables

I want to find new, modified and deleted records in one table (tableA) by comparing it to another table (tableB). Both tables are of the same schema and has a unique ID field.
In my situation, tableA is originally the same as tableB but it has been edited by some external organisation and once they have done their edits, they send the table back via ZIP file, and we re-populate (truncate and insert) that data to tableA. So I want to find out what records have changed in tableA. I am using SQL Server 2012.
I can get new and modified records with the "except" keyword:
select * from tableA
except
select * form tableB
(Let's call the above results ResultsA)
I can also get deleted and modified records:
select * from tableB
except
select * form tableA
(Let's call the above results ResultsB)
The problem is, both ResultsA and ResultsB have the same records that have been modified/edited. So the modified/edited records are doubled up. I can use inner join or intersect on ResultsA and ResultsB to get just the modified records (call this results ResultsC). But then I will need to use join/except again between ResultsA and ResultsC to get just the new records, and join/except again between ResultsB and ResultsC to get just the deleted records... I tried this and this but they are not working for me.
Obviously this is not good. Are there any elegant and simpler ways to find out the records that have been deleted, modified or added in tableA compared to tableB?
How about:
-- DELETED
SELECT B.*, 'DELETED' AS 'CHANGE_TYPE'
FROM TableB B
LEFT JOIN TableA A ON B.PK_ID = A.PK_ID
WHERE A.PK_ID IS NULL
UNION
-- NEW
SELECT A.*, 'NEW' AS 'CHANGE_TYPE'
FROM TableA A
LEFT JOIN TableB B ON B.PK_ID = A.PK_ID
WHERE B.PK_ID IS NULL
UNION
-- MODIFIED
SELECT B.*, 'MODIFIED' AS 'CHANGE_TYPE'
FROM (
SELECT * FROM TableA
EXCEPT
SELECT * FROM TableB
) S1
INNER JOIN TableB B ON S1.PK_ID = B.PK_ID;
Not exactly elegant, but it works.
Based on what i understood i came up with the following solution.
DECLARE #tableA TABLE (ID INT, Number INT)
DECLARE #tableB TABLE (ID INT, Number INT)
INSERT INTO #tableA VALUES
(1,10),
(2,20),
(3,30),
(4,40)
INSERT INTO #tableB VALUES
(1,11),
(2,20),
(4,40),
(5,50)
SELECT *,'Modified or deleted' as 'Status' FROM
(
select * from #tableA
except
select * from #tableB
)a WHERE ID NOT IN
(
select ID from #tableB
except
select ID from #tableA
)
UNION
SELECT *,'New' as 'Status' FROM
(
select * from #tableB
except
select * from #tableA
)b WHERE ID NOT IN
(
SELECT ID FROM
(
select * from #tableA
except
select * from #tableB
)a WHERE ID NOT IN
(
select ID from #tableB
except
select ID from #tableA
)
)
You can use the OUTPUT clause:
Returns information from, or expressions based on, each row affected by an INSERT, UPDATE, or DELETE statement. These results can be returned to the processing application for use in such things as confirmation messages, archiving, and other such application requirements. Alternatively, results can be inserted into a table or table variable.
See the the following, sorry I don't have a practical code for you. But note the SQL output clause can be used to return any value from ‘inserted’ and ‘deleted’ (New value and Old value) tables when doing an insert or update. follow this for more info
declare #DBOrderItem table
(
OrderItemGuid UniqueIdentifier default newid(),
Name VarChar(100)
);
declare #PayloadOrderItem table
(
OrderItemGuid UniqueIdentifier default newid(),
Name VarChar(100)
);
insert into #DBOrderItem (Name) values ('Phone');
insert into #DBOrderItem (Name) values ('Laptop');
insert into #PayloadOrderItem
select top 1 * from #DBOrderItem;
insert into #PayloadOrderItem (Name) values ('Tablet');
select doi.OrderItemGuid,
doi.Name,
case when poi.OrderItemGuid is null then 'Delete' else 'Update' end ActionType
from #DBOrderItem doi
left join #PayloadOrderItem poi on doi.OrderItemGuid = poi.OrderItemGuid
union
select poi.OrderItemGuid,
poi.Name,
'Add' ActionType
from #PayloadOrderItem poi
left join #DBOrderItem doi on doi.OrderItemGuid = poi.OrderItemGuid
where doi.OrderItemGuid is null;
Another solution that works quite efficiently is to use a where not exists an intersect between the two tables. Its very compact.
SELECT
IsNull(tableB.ID,tableA.ID) as 'ID',
IsNull(tableB.Number,tableA.Number) as 'Number',
'Action' = CASE
WHEN tableB.ID IS NULL THEN 'Deleted'
WHEN tableA.ID IS NULL THEN 'Created'
ELSE 'Updated'
END
FROM tableA
FULL OUTER JOIN tableB
ON tableB.ID = tableA.ID
WHERE
NOT EXISTS (SELECT tableB.* INTERSECT SELECT tableA.*)
This keeps the table scans down to a minimum, and provides detection of new, deleted and changed records.
I put all three from here into fiddle, and its surprising how differently they all compile.
http://sqlfiddle.com/#!6/b1a5a/5
This one works without primary key also a bit more elegant .(in my opinion!)
WITh A AS (SELECT 1,2,3 FROM DUAL
UNION ALL
SELECT 1,3,2 FROM DUAL
UNION ALL
SELECT 1,3,1 FROM DUAL),
B AS (SELECT 1,3,2 FROM DUAL
UNION ALL
SELECT 1,2,3 FROM DUAL
UNION ALL
SELECT 1,3,5 FROM DUAL
)
,
C AS
(SELECT * FROM A
MINUS
SELECT * FROM B
),
D AS( SELECT * FROM b
MINUS
SELECT * FROM A)
SELECT C.* ,'Deleted' FROM c
UNION ALL
SELECT D.* ,'Added' FROM D

Oracle SQL - Comparing Rows

I have a problem I'm working on with Oracle SQL that goes something like this.
TABLE
PurchaseID CustID Location
----1------------1-----------A
----2------------1-----------A
----3------------2-----------A
----4------------2-----------B
----5------------2-----------A
----6------------3-----------B
----7------------3-----------B
I'm interested in querying the Table to return all instances where the same customer makes a purchase in different locations. So, for the table above, I would want:
OUTPUT
PurchaseID CustID Location
----3------------2-----------A
----4------------2-----------B
----5------------2-----------A
Any ideas on how to accomplish this? I haven't been able to think of how to do it, and most of my ideas seem like they would be pretty clunky. The database I'm using has 1MM+ records, so I don't want it to run too slowly.
Any help would be appreciated. Thanks!
SELECT *
FROM YourTable T
WHERE CustId IN (SELECT CustId
FROM YourTable
GROUP BY CustId
HAVING MIN(Location) <> MAX(Location))
You should be able to use something similar to the following:
select purchaseid, custid, location
from yourtable
where custid in (select custid
from yourtable
group by custid
having count(distinct location) >1);
See SQL Fiddle with Demo.
The subquery in the WHERE clause is returning all custids that have a total number of distinct locations that are greater than 1.
In English:
Select a row if another row exists with the same customer and a different location.
In SQL:
SELECT *
FROM atable t
WHERE EXISTS (
SELECT *
FROM atable
WHERE CustID = t.CustID
AND Location <> t.Location
);
Here's one approach using a sub-query
SELECT T1.PurchaseID
,T1.CustID
,T1.Location
FROM YourTable T1
INNER JOIN
(SELECT T2.CustID
,COUNT (DISTINCT T2.Location )
FROM YourTable T1
GROUP BY
T2.CustID
HAVING COUNT (DISTINCT T2.Location )>1
) SQ
ON SQ.CustID = T1.CustID
This should only require one full table scan.
create table test (PurchaseID number, CustID number, Location varchar2(1));
insert into test values (1,1,'A');
insert into test values (2,1,'A');
insert into test values (3,2,'A');
insert into test values (4,2,'B');
insert into test values (5,2,'A');
insert into test values (6,3,'B');
insert into test values (7,3,'A');
with repeatCustDiffLocations as (
select PurchaseID, custid, location, dense_rank () over (partition by custid order by location) r
from test)
select b.*
from repeatCustDiffLocations a, repeatCustDiffLocations b
where a.r > 1
and a.custid = b.custid;
This makes most sense to me as I was trying to return the rows with the same values throughout the table, specifically for two columns as shown in this stackoverflow answer here.
The answer to your problem in this format is:
SELECT DISTINCT a.*
FROM TEST a
INNER JOIN TEST b
ON a.CUSTOMERID = b.CUSTOMERID AND
a.LOCATION <> b.LOCATION;
However, the solution to a problem such as mine with two columns having matching values in multiple rows (2 in this instance, would yield no results because all PurchaseID's are unique):
SELECT DISTINCT a.*
FROM TEST a
INNER JOIN TEST b
ON a.CUSTOMERID = b.CUSTOMERID AND
a.PURCHASEID = b.PURCHASEID AND
a.LOCATION <> b.LOCATION;
Although, this wouldn't return the correct results based on the what needs to be queried, it shows that the query logic works
SELECT DISTINCT a.*
FROM TEST a
INNER JOIN TEST b
ON a.CUSTOMERID = b.CUSTOMERID AND
a.PURCHASEID <> b.PURCHASEID AND
a.LOCATION = b.LOCATION;
If anyone wants to try in Oracle here is the table and values to insert:
CREATE TABLE TEST (
PurchaseID integer,
CustomerID integer,
Location varchar(1));
INSERT ALL
INTO TEST VALUES (1, 1, 'A')
INTO TEST VALUES (2, 1, 'A')
INTO TEST VALUES (3, 2, 'A')
INTO TEST VALUES (4, 2, 'B')
INTO TEST VALUES (5, 2, 'A')
INTO TEST VALUES (6, 3, 'B')
INTO TEST VALUES (7, 3, 'B')
SELECT * FROM DUAL;

SQL OUTER JOIN with NEWID to generate random data for each row

I want to generate some test data so for each row in a table I want to insert 10 random rows in another, see below:
INSERT INTO CarFeatures
(carID, featureID)
SELECT C.ID, F.ID
FROM dbo.Cars AS C
OUTER APPLY (
SELECT TOP 10 ID
FROM dbo.Features
ORDER BY NEWID()
) AS F
Only trouble is this returns the same values for each row. How do I order them randomly?
What i usually do is create a temp table and define the PK as a GUID with the default value of newid(). You'll need a create table statement for this, no select into. Then I insert my records into it and then I can order by the Id field and select the top ten.
The problem is that any function you call will be evaluated only once. How about something like this:
SELECT ID, NEWID() AS guid
INTO #temp
FROM dbo.Features
INSERT INTO CarFeatures (carID, featureID)
SELECT C.ID, F.ID
FROM dbo.Cars AS C
OUTER APPLY (
SELECT TOP 10 *
FROM #temp
ORDER BY 2
) AS F