I have a table with 50 columns, lets's say a,b,c,d etc.
I want to update b from a and c from the new value of b
,so
b=b+a,c=c+b ,d = c+ d,.. etc
I don't want to make an update like
Update [table] set b=b+a, c= c+b+a.. etc
because for each column update I must write a huge calculated formula, which will get bigger and more complex for each column.
The table I want to update has about 50 million rows so I think that creating multiple update statements will be a suicide performance wise.
How should I handle this?
Do you think using variables is a good idea?
Thank you and sorry for my English.
Note: Sorry for not being clear about my question.
There are 50 new columns in an existing table.
The first new column, a, is calculated using a join with other tables.
The second new column, b, is calculated by adding the result to the new value of a, to a calculation I want to make using a join with other tables (the same tables as the first update).
The sames logic applies for calculating the values for all 50 columns.
UPDATED:
Thanks for updating your question. Definitely, use SUBQUERIES in your UPDATE statement to do most of the work. You can utilize indexes in your queries through SARGs in your ON/WHERE predicates.
Since your logic is self-dependent on the other tables for columns, something like the following will help:
CREATE TABLE #TABLE1 (ColA INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE1 (ColA ) VALUES (1)
CREATE TABLE #TABLE2 (ColB INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE2 (ColB) VALUES (4)
CREATE TABLE #TABLE3 (ColC INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE3 (ColC) VALUES (10)
SELECT ColC + B.ColB AS ColC, ColC, B.ColB, B.ColA, B.ORIGINAL, B.ID
FROM #TABLE3 A
RIGHT OUTER JOIN (SELECT ColB + B.ColA AS ColB, ColA, ORIGINAL, B.ID
FROM #TABLE2 A
RIGHT OUTER JOIN (SELECT ColA + ColB AS ColA, ColB AS ORIGINAL, A.ID
FROM #TABLE1 A
INNER JOIN #TABLE2 B ON A.ID = B.ID) B ON A.ID = B.ID ) B ON A.ID = B.ID
Note that for simplicity I just assumed the tables had the same ID columns, as I kept the inner subquery ID in the list. SQL goes from right to left (inner queries first), so keep that in mind.
It might be a long series of joins, but at least most of repetitive logic will be handles inside the subqueries themselves. OUTER JOINS make sense, since you want to keep the value of the inner query and compare that new computed value with the next outer subquery.
You can try using update from select like this.
UPDATE T
SET a = T.a
,b = T.b
,c = T.c
,d = T.d
...
...
SELECT a
,b = b + a
,c = c + b
,d = d + c
...
...
FROM yourtable T
Related
I have a table A which is having around 50000 records and a table B which is having 50000 records as well.
sample data:
A B
1 1
2 2
3 null
4 null
I want to find records 3, 4 which are present in Table A but not in Table B.
I am using
select id from A where id NOT IN(select id from B)
I have also tried NOT Exist, but as the records are very large in number, it still takes a lot of time.
select id from A where NOT Exists(select id from B and B.id = A.id)
Left Outer Join cant be used to find the missing records as the id is not present in Table B.
Is there any way to make the Query Work Faster in Sybase itself?
Or Shifting the database to MongoDB is the solution?
I'm not sure why you are not prepare LEFT JOIN, I tried with the LEFT JOIN it returns your expected result.
Sample execution with the given data:
DECLARE #TableA TABLE (Id INT);
DECLARE #TableB TABLE (Id INT);
INSERT INTO #TableA (Id) VALUES (1), (2), (3), (4);
INSERT INTO #TableB (Id) VALUES (1), (2), (NULL), (NULL);
SELECT T1.Id
FROM #TableA T1
LEFT JOIN #TableB T2 ON T2.Id = T1.Id
WHERE T2.Id IS NULL
Result
3
4
In performance perspective, always try to avoid using inverse keywords like NOT IN, NOT EXISTS. Because to check the inverse items DBMS need to runs through all the available records and drop the inverse selection.
LEFT JOIN / IS NULL and NOT EXISTS are semantically equivalent, while NOT IN is not. These method differ in how they handle NULL values in table_right.
Therefore, You should go for LEFT JOIN to improve your sql performance.
select A.id from A LEFT JOIN B
on A.id = B.id
where B.id is null
order by A.id;
If I have a sql statement like this:
select *
from tableA a
inner join tableB b on dbo.fn_something(a.ColX) = b.ColY
if you assume there are 5 rows in tableA with the same value for ColX will dbo.fn_something() be called with that value 5 times or just one time?
Clearly this is a trivial example, but I'm interested for the purposes of thinking about performance in a more complex scenario.
UPDATE
Thanks #DStanley, following from your answer I investigated further. Using SQL Profiler with the SP:StmtStarting event on the SQL below illustrates what happens. i.e. as you said: the function will be called once for each row in the join.
This has an extra join from the original question.
create table tableA
( id int )
create table tableB
( id_a int not null
, id_c int not null
)
create table tableC
( id int )
go
create function dbo.fn_something( #id int )
returns int
as
begin
return #id
end
go
-- add test data
-- 5 rows:
insert into tableA (id) values (1), (2), (3), (4), (5)
-- 5 rows:
insert into tableC (id) values (101), (102), (103), (104), (105)
-- 25 rows:
insert into tableB (id_a, id_c) select a.id, c.id from tableA a, tableC c
go
-- here dbo.fn_something() is called 25 times:
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c)
-- here dbo.fn_something() is called just 5 times,
-- as the 'b.id_c < 102' happens to be applied first.
-- That's likely to depend on whether SQL thinks it's
-- faster to evaluate the '<' or the function.
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c) and b.id_c < 102
go
drop table tableA ;
drop table tableB;
drop table tableC;
drop function dbo.fn_something;
go
It will be called for each row in a. I do not know of any optimization that would call the function just for unique inputs. If performance is an issue you could create a temp table with distinct input values and use thoce results in your join, but I would only do that it it was an issue - don't assume it's a problem and clutter your query unnecessarily.
If you declare your function as schema bound, it can be run one for each unique case. This requires that the function be deterministic and always has the same output for a given input.
CREATE FUNCTION dbo.fn_something (#id INT)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN #id
END
GO
I have a scenario whereby I have 3 tables (Table1, Table2, Table3)
Table1 contains data whereby each MEMBNO is unique
I would like to JOIN to Table2 and Table3 to display results but only have one row for each result
I tried
SELECT A.MEMBNO,A.FIELD1,B.FIELD1,B.FIELD2,C.FIELD1
FROM Table1 A
INNER join Table2 B ON A.MEMBNO = B.MEMBNO
INNER join Table3 C ON A.MEMBNO = C.MEMBNO
but I get multiple results. If the MEMBNO is in Table2 twice and Table3 four times, I get 8 rows returned.
Is my JOIN correct or is the only way to control this through the WHERE statement after the JOIN to control what is returned from Table2 and Table3 (ie: does SQL "dumb" join all the data and expect the WHERE statement to be the filer?)
Many thanks
What you are fighting with is the different relationships between the data. Table1 is the primary key table which has your one row per MEMBNO. Table2\3 have more than one row for each MEMBNO. What you therefore need to think about is what data you actually want to see before you attempt the joins. The difference in cardinality is causing your row duplication when the joins are happening. If you want the data in Table2\3 to be squished into a single row, have a think how that might look. i.e. do you want to sum the numbers from the different rows into a total? do you want to take the maximum date? etc
Best thing to do is give some data examples from each table and give an example result. More than happy to have a go if you add that info.
As I am concern about only MEMBNO. What if I use distinct of MEMBNO from both tables Table2 and Table3.
Check the below example:
create table #t1
(
F1 int,
F2 int
)
Insert into #t1 values(1, 111)
Create table #t2
(
F1 int,
F2 int
)
Insert into #t2 values(1, 111)
Insert into #t2 values(1, 222)
Create table #t3
(
F1 int,
F2 int
)
Insert into #t3 values(1, 333)
Insert into #t3 values(1, 444)
SELECT a.*
FroM #t1 a left join (Select distinct f1 from #t2) b on a.F1 = b.f1
left join (Select distinct f1 from #t3) c on a.F1 = c.f1
Where #t1, #t2, #t3 are table1, table2, table3 respecively
AND F1 is your MEMBNO in all the tables.
You get multiple results because of using inner join.
You should use left or right join.
Let's say I have the following statement and the inner join results in 3 rows where a.Id = b.Id, but each of the 3 rows have different b.Value's. Since only one row from tableA is being updated, which of the 3 values is used in the update?
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
INNER JOIN tableB as b
ON a.Id = b.Id
I don't think there are rules for this case and you cannot depend on a particular outcome.
If you're after a specific row, say the latest one, you can use apply, like:
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
CROSS APPLY
(
select top 1 *
from tableB as b
where b.id = a.id
order by
DateColumn desc
) as b
Usually what you end up with in this scenario is the first row that appears in the order of the physical index on the table. In actual practice, you should treat this as non-deterministic and include something that narrows your result to one row.
Here is what I came up with using SQL Server 2008
--drop table #b
--drop table #a
select 1 as id, 2 as value
into #a
select 1 as id, 5 as value
into #b
insert into #b
select 1, 3
insert into #b
select 1, 6
select * from #a
select * from #b
UPDATE #a
SET #a.Value = #b.Value
FROM #a
INNER JOIN #b
ON #a.Id = #b.Id
It appears that it uses the top value of a basic select each time (row 1 of select * from #b). So, it possibly depends on indexing. However, I would not rely on the implementation set by SQL, as that has the possibility of changing. Instead, I would suggest using the solution presented by Andomar to make sure you know what value you are going to choose.
In short, do not trust the default implementation, create your own. But, this was an interesting academic question :)
Best option in my case for updating multiple records is to use merge Query(Supported from SQL Server 2008), in this query you have complete control of what you are updating.
Also you can use output query to do further processing.
Example: Without Output clause(only update)
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10 ---- Select Multiple records
)
MERGE A
USING cteB
ON(A.Id = cteB.Id) -- Update condition
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1, --Note: Update condition i.e; A.Id = cteB.Id cant appear here again.
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3;
Example: With OputPut clause
CREATE TABLE #TempOutPutTable
{
PkId INT NOT NULL,
Col1 VARCHAR(50),
Col2 VARCHAR(50)
}
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10
)
MERGE A
USING cteB
ON(A.Id = cteB.Id)
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1,
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3
OUTPUT
INSERTED.Id, cteB.Col1, A.Col2 INTO #TempOutPutTable;
--Do what ever you want with the data in temporary table
SELECT * FROM #TempOutPutTable; -- you can check here which records are updated.
Yes, I came up with a similar experiment to Justin Pihony:
IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test ;
SELECT
1 AS Name, 0 AS value
INTO #test
IF OBJECT_ID('tempdb..#compare') IS NOT NULL DROP TABLE #compare ;
SELECT 1 AS name, 1 AS value
INTO #compare
INSERT INTO #compare
SELECT 1 AS name, 0 AS value;
SELECT * FROM #test
SELECT * FROM #compare
UPDATE t
SET t.value = c.value
FROM #test t
INNER JOIN #compare c
ON t.Name = c.name
Takes the topmost row in the comparison, right-side table. You can reverse the #compare.value values to 0 and 1 and you'll get the reverse. I agree with the posters above...its very strange that this operation does not throw an error message as it is completely hidden that this operation IGNORES secondary values
Lets say I have a database that looks like this:
tblA:
ID, Name, Sequence, tblBID
1 a 5 14
2 b 3 15
3 c 3 16
4 d 3 17
tblB:
ID, Group
14 1
15 1
16 2
17 3
I would like to sequence A so that the sequences go 1...n for each group of B.
So in this case, the sequences going down should be 1,2,1,1.
The ordering needs to be consistent with the current ordering, but there are no guarantees as to the current ordering.
I am not exactly a sql master and I am sure there is a fairly easy way to do this, but I really don't know the right route to take. Any hints?
If you are using SQL Server 2005+ or higher, you can use a ranking function:
Select tblA.Id, tblA.Name
, Row_Number() Over ( Partition By tblB.[Group] Order By tblA.Id ) As Sequence
, tblA.tblBID
From tblA
Join tblB
On tblB.tblBID = tblB.ID
Row_Number ranking function.
Here's another solution that would work in SQL Server 2000 and prior.
Select A.Id, A.Name
, (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1 As Sequence
, A.tblBID
From tblA As A
Join tblB As B
On B.Id = A.tblBID
EDIT
Also want to make it clear that I want to actually update tblA to reflect the proper sequences.
In SQL Server, you can use their proprietary From clause in an Update statement like so:
Update tblA
Set Sequence = (
Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID
) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
The Hoyle ANSI solution might be something like:
Update tblA
Set Sequence = (
Select (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
Where A.Id = tblA.Id
)
EDIT
Can we do that [the inner group] comparison based on A.Sequence instead of B.ID?
Select A1.*
, (Select Count(*)
From tblB As B2
Join tblA As A2
On A2.tblBID = B2.Id
Where B2.[Group] = B1.[Group]
And A2.Sequence < A1.Sequence) + 1
From tblA As A1
Join tblB As B1
On B1.Id = A1.tblBID
Because it's SQL 2000, we can't use a windowing function. That's okay.
Thomas's queries are good and will work. However, they will get worse and worse as the number of rows increases—with different characteristics depending on how wide (the number of groups) and how deep (the number of items per group). This is because those queries use a partial cross-join, perhaps we could call it a "pyramidal cross-join" where the crossing part is limited to right side values less than left side values rather than left crossing to all right values.
What to do?
I think you will be surprised to find that the following long and painful-looking script will outperform the pyramidal join at a certain size of data (which may not be all that big) and eventually, with really large data sets must be considered a screaming performer:
CREATE TABLE #tblA (
ID int identity(1,1) NOT NULL,
Name varchar(1) NOT NULL,
Sequence int NOT NULL,
tblBID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #tblA VALUES ('a', 5, 14)
INSERT #tblA VALUES ('b', 3, 15)
INSERT #tblA VALUES ('c', 3, 16)
INSERT #tblA VALUES ('d', 3, 17)
CREATE TABLE #tblB (
ID int NOT NULL PRIMARY KEY CLUSTERED,
GroupID int NOT NULL
)
INSERT #tblB VALUES (14, 1)
INSERT #tblB VALUES (15, 1)
INSERT #tblB VALUES (16, 2)
INSERT #tblB VALUES (17, 3)
CREATE TABLE #seq (
seq int identity(1,1) NOT NULL,
ID int NOT NULL,
GroupID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #seq
SELECT
A.ID,
B.GroupID
FROM
#tblA A
INNER JOIN #tblB B ON A.tblBID = b.ID
ORDER BY B.GroupID, A.Sequence
UPDATE A
SET A.Sequence = S.seq - X.MinSeq + 1
FROM
#tblA A
INNER JOIN #seq S ON A.ID = S.ID
INNER JOIN (
SELECT GroupID, MinSeq = Min(seq)
FROM #seq
GROUP BY GroupID
) X ON S.GroupID = X.GroupID
SELECT * FROM #tblA
DROP TABLE #seq
DROP TABLE #tblB
DROP TABLE #tblA
If I understood you correctly, then ORDER BY B.GroupID, A.Sequence is correct. If not, you can switch A.Sequence to B.ID.
Also, my index on the temp table should be experimented with. For a certain quantity of rows, and also the width and depth characteristics of those rows, clustering on one of the other two columns in the #seq table could be helpful.
Last, there is a possible different data organization possible: leaving GroupID out of the #seq table and joining again. I suspect it would be worse, but am not 100% sure.
Something like:
SELECT a.id, a.name, row_number() over (partition by b.group order by a.id)
FROM tblA a
JOIN tblB on a.tblBID = b.ID;