How to update table based on row index? - sql

I made a copy of an existing table like this:
select * into table_copy from table
Since then I've made some schema changes to table (added/removed columns, changed order of columns etc). Now I need to run an update statement to populate a new column I added like this:
update t
set t.SomeNewColumn = copy.SomeOldColumn
from t
However, how do I get the second table in here based on row index instead of some column value matching up?
Note: Both tables still have equal number of rows in their original positions.

You cannot join the tables without a key to define each row uniquely, the position of the data in the table has no bearing on the situation.
If you tables do not have a primary key you need to define one.

If you have an ID on it, you can do this:
update t set
t.SomeNewColumn = copy.SomeOldColumn
from
table t
inner join table_copy copy on
t.id = copy.id
If you have no way to uniquely identify the row and are relying on the order of the rows, you're out of luck, as row order is not reliable in any version of SQL Server (nor most other RDBMSes).

You could use this to update them by matching ids
UPDATE
t
SET
t.SomeNewColumn = other_table.SomeOldColumn,
FROM
original_table t
INNER JOIN
other_table copy
ON
t.id = copy.id
or if you don't have the ids you might be able to pull out something by using ROW_NUMBER function to enumerate the records, but that's a long shot(I haven't checked if it's possible).

If you're updating, you'll need a primary key to join on. Usually in that case, the others' answers will suffice. If for some reason you still need to update the table with a resultset in a certain order, you can do this:
UPDATE t SET t.SomeNewColumn = copy.SomeOldColumn
FROM table t
JOIN (SELECT ROW_NUMBER() OVER(ORDER BY id) AS row, id, SomeNewColumn FROM table) t2
ON t2.Id = t.Id
JOIN (SELECT ROW_NUMBER() OVER(ORDER BY id) AS row, SomeOldColumn FROM copytable) copy
ON copy.row = t2.row
You get the new table and its row numbers in the order you want, join the old table and its row numbers in the order you want, and join back to the new table so the query has something to directly update.

Related

MERGE vs. UPDATE

I was trying to look for it online but couldn't find anything that will settle my doubts.
I want to figure out which one is better to use, when and why?
I know MERGE is usually used for an upsert, but there are some cases that a normal update with with subquery has to select twice from the table(one from a where clause).
E.G.:
MERGE INTO TableA s
USING (SELECT sd.dwh_key,sd.serial_number from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key and sd.serial_number <> s.serial_number) t
ON(s.dwh_key = t.dwh_key)
WHEN MATCHED UPDATE SET s.serial_number = t.serial_number
In my case, i have to update a table with about 200mil records in one enviorment, based on the same table from another enviorment where change has happen on serial_number field. As you can see, it select onces from this huge table.
On the other hand, I can use an UPDATE STATEMENT like this:
UPDATE TableA s
SET s.serial_number = (SELECT t.serial_number
FROM TableA#to_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key)
WHERE EXISTS (SELECT 1
FROM TableA#To_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key
AND t.serial_number <> s.serial_number)
As you can see, this select from the huge table twice now. So, my question is, what is better? why?.. which cases one will be better than the other..
Thanks in advance.
I would first try to load all necessary data from remote DB to the temporary table and then work with that temporary table.
create global temporary table tmp_stage (
dwh_key <your_dwh_key_type#to_devstg>,
serial_number <your_serial_number_type##to_devstg>
) on commit preserve rows;
insert into tmp_stage
select dwh_key, serial_number
from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key;
/* index (PK on dwh_key) your temporary table if necessary ...*/
update (select
src.dwh_key src_key,
tgt.dwh_key tgt_key,
src.serial_number src_serial_number,
tgt.serial_number tgt_serial_number
from tmp_stage src
join TableA tgt
on src.dwh_key = tgt.dwh_key
)
set src_serial_number = tgt_serial_number;

Create new table by merging two existing tables based on matching field

I am attempting to create a new table using columns from two existing tables and it's not behaving the way I expected.
Table A has 91255063 records and table B has 2372294 records. Both tables have a common field named link_id. Link_id is not unique in either table and will not always exist in table B.
The end result I am looking for is a new table with 91255063 records, essentially all of Table A with any additional data from table B for the records with matching link_id's. I had thought outer join would accomplish this as follows:
use database1
SELECT a.*
,b.[AdditionalData1]
,b.[AdditionalData2]
,b.[AdditionalData3]
into dbo.COMBINEDTABLE
FROM Table1 a
left outer join Table2 b
ON a.LINK_ID = b.LINK_ID
This seems to work when looking at the resulting data however my row total for the newly created table COMBINEDTABLE now has 98011015 rows. Am I not using the correct join method here?
Most likely you have duplicate LINK_IDs on the right, thus for quite a few rows from Table1, there are multiplle rows from Table2. You could try using DISTINCT in your SELECT, or specify that you want only the records with the smallest or highest identifier column value (if you have one).

How to auto increment a value in one table when inserted a row in another table

I currently have two tables:
Table 1 has a unique ID and a count.
Table 2 has some data columns and one column where the value of the unique ID of Table 1 is inside.
When I insert a row of data in Table 2, the the count for the row with the referenced unique id in Table 1 should be incremented.
Hope I made myself clear. I am very new to PostgreSQL and SQL in general, so I would appreciate any help how to do that. =)
You could achieve that with triggers.
Be sure to cover all kinds of write access appropriately if you do. INSERT, UPDATE, DELETE.
Also be aware that TRUNCATE on Table 2 or manual edits in Table 1 could break data integrity.
I suggest you consider a VIEW instead to return aggregated results that are automatically up to date. Like:
CREATE VIEW tbl1_plus_ct AS
SELECT t1.*, t2.ct
FROM tbl1 t1
LEFT JOIN (
SELECT tbl1_id, count(*) AS ct
FROM tbl2
GROUP BY 1
) t2 USING (tbl1_id)
If you use a LEFT JOIN, all rows of tbl1 are included, even if there is no reference in tbl2. With a regular JOIN, those rows would be omitted from the VIEW.
For all or much of the table, it is fastest to aggregate tbl2 first in a subquery, then join to tbl1 - like demonstrated above.
Instead of creating a view, you could also just use the query directly, and if you only fetch a single row, or only few, this alternative form would perform better:
SELECT t1.*, count(t2.tbl1_id) AS ct
FROM tbl1 t1
LEFT JOIN tbl2 t2 USING (tbl1_id)
WHERE t1.tbl1_id = 123 -- for example
GROUP BY t1.tbl1_id -- being the primary key of tbl1!

Adding new fields to an existing table, inserting data into proper position, then joining

Scenario One
I have two new fields that I want to add to a table called existingTable. After I add these fields, I can update SOME but NOT ALL records with data for those fields. There will be blank entries, and I am fine with this.
Problem One
I want to make sure that the CORRECT records are updated. The primary key for the existing table and the incoming data table is Email.
Proposed Solution One
An UPDATE query looking like this is the solution.
UPDATE existingTable
SET existingTable.newField1 = incomingDataTable.newField1, existingTable.newField2 = incomingDataTable.newField2
WHERE existingTable.Email = incomingDataTable.Email
What do you think?
Scenario Two
After the table is updated with the new fields & data in the proper records, I want to join this table with two other ones. I want ALL entries, even if some fields are blank, to be in this join. I don't want ANY records excluded.
By the way, each record in these tables has a 1-to-1 relationship with its partner in the other tables. There SHOULD NOT BE ANY duplicate records. In the past, I've seen Access use an INNER JOIN, which excludes records that do not have values for newField1 and newField2. This is not what I want.
Problem
I'm inexperienced at joining tables. The different joins are a bit confusing to me.
Proposed Solution
Does the join I use necessarily matter since the three to-be-joined tables should have a one-to-one relationship?
SELECT * FROM existingTable
FULL JOIN tableToJoinWith1, tableToJoinWith2
On existingTable.Email = tableToJoinWith1.Email, tableToJoinWith1.Email = tableToJoiNWith2.Email
Clarifying your Scenario 2. I'm assuming you mean you want all the rows from existingTable even if there is no match on the Email field with either of the other tables. In this case, a LEFT JOIN is what you want:
SELECT * FROM existingTable
LEFT JOIN tableToJoinWith1 ON existingTable.email = tableToJoinWith1.email
LEFT JOIN tableToJoinWith2 ON existingTable.email = tableToJoinWith2.email
For scenario 1, the problem is that you haven't given it any sort of SELECT for incomingDataTable. In standard SQL, to my knowledge, there's no nice way to do this that supports multiple columns. So it depends what database you're using. Some will let you do this:
UPDATE existingTable
SET newField1 = incomingDataTable.newField1, newField2 = incomingDataTable.newField2
FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email
But some won't. Others will allow this:
UPDATE (Select * FROM existingTable JOIN incomingDataTable
ON existingTable.Email = incomingDataTable.Email)
SET existingTable.newField1 = incomingDataTable.newField1,
existingTable.newField2 = incomingDataTable.newField2
If it were only a single column, you could do this which is totally standard:
UPDATE existingTable SET newField1 = (SELECT newField1 FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email)

Determining best method to traverse a table and update another table

I am using Delphi 7, BDE, and Interbase (testing), Oracle (Production).
I have two tables (Master, Responses)
I need to step through the Responses table, use its Master_Id field to look it up in Master table (id) for matching record and update a date field in the Master table with a date field in the Responses table
Can this be done in SQL, or do i actually have to create two TTables or TQueries and step through each record?
Example:
Open two tables (Table1, Table2)
with Table1 do
begin
first;
while not EOF do
begin
//get master_id field
//locate in id field in table 2
//edit record in table 2
next;
end;
end;
thanks
One slight modification to Chris' query, throw in a where clause to select only the records that need the update. Otherwise it will set the rest of the dates to NULL
UPDATE Master m
SET
m.date = (SELECT r.date FROM Reponses r WHERE r.master_id = m.id)
WHERE m.id IN (SELECT master_id FROM Responses)
Updated to use aliases to avoid confusion which col comes from which table.
This is not ready made, copy-past'able query as UPDATE syntax differs from database to database.
You may need to consult your database sql reference for JOIN in UPDATE statement syntax.
When there are multiple responses to same master entry
UPDATE Master m
SET m.date = (
SELECT MAX(r.date) FROM Reponses r WHERE r.master_id = m.id)
WHERE m.id IN (SELECT master_id FROM Responses)
I used MAX() you can use whatever suits your business.
Again invest some time understanding SQL. Its hardly a few days effort. Get PLSQL Complete reference if you are into Oracle
Try this SQL (changing names to fit your situation)
UPDATE Master m
SET date = ( SELECT date FROM Responses WHERE id = m.id )