update a single column with join lookups - sql

I have a table adjustments with columns adjustable_id | adjustable_type | order_id
order_id is the target column to fill with values, this value should come from another table line_items which has a order_id column.
adjustable_id (int) and _type (varchar) references that table.
table: adjustments
id | adjustable_id | adjustable_type | order_id
------------------------------------------------
100 | 1 | line_item | NULL
101 | 2 | line_item | NULL
table: line_items
id | order_id | other | columns
--------------------------------
1 | 10 | bla | bla
2 | 20 | bla | bla
In the case above I guess I need a join query to update adjustments.order_id first row with value 10, second row with 20 and so on for the other rows using Postgres 9.3+.
In case the lookup fails, I need to delete invalid adjustments rows, for which they have no corresponding line_items.

There are two ways to do this. The first one using a co-related sub-query:
update adjustments a
set order_id = (select lorder_id
from line_items l
where l.id = a.adjustable_id)
where a.adjustable_type = 'line_item';
this is standard ANSI SQL as standard SQL does not define a join condition for the UPDATE statement.
The second way is using a join, which is a Postgres extension to the SQL standard (other DBMS also support that but with different semantics and syntax).
update adjustments a
set order_id = l.order_id
from line_items l
where l.id = a.adjustable_id
and a.adjustable_type = 'line_item';
The join is probably the faster one. Note that both versions (especially the first one) will only work if the join between line_items and adjustments will always return exactly one row from the line_items table. If that is not the case they will fail.
The reason why Arockia's query was "eating your RAM" is that his/her query creates a cross-join between table1 and table1 which is then joined against table2.
The Postgres manual contains a warning about that:
Note that the target table must not appear in the from_list, unless you intend a self-join

update a set A.name=B.name from table1 A join table2 B on
A.id=B.id

Related

In Oracle SQL is there a way to join on a value twice?

Lets say I have two tables, with the following columns:
cars
motorcycle_id | fuel_id | secondary_fuel_id ...
fuel_types
fuel_id | fuel_label | ...
In this case fuel_id and secondary fuel_id both refer to the fuel_types table.
Is it possible to include both labels in an inner join? I want to join on the fuel_id but I want to be able to have the fuel label twice as a new column. So the join would be something like:
motorcycle_id | fuel_id | fuel_label | secondary_fuel_id | secondary_fuel_label | ...
In this case I would have created the secondary_fuel_label column.
Is this possible to do in SQL with joins? Is there another way to accomplish this?
You would just join twice:
select c.*, f1.fuel_label, f2.fuel_label as secondary_fuel_label
from cars c left join
fuel_types f1
on c.fuel_id = f1.fuel_id left join
fuel_types f2
on c.fuel_id = f1.secondary_fuel_id ;
The key point here is to use table aliases, so you can distinguish between the two table references to fuel_types.
Note that this uses left join to be sure that rows are returned even if one of the ids are missing.

Self-Join with Natural Join

What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.

Oracle SQL - How to do massive updates more efficient and faster?

I'm trying to update 500.000 rows at once. I have a table with products like this:
+------------+----------------+--------------+-------+
| PRODUCT_ID | SUB_PRODUCT_ID | DESCRIPTION | CLASS |
+------------+----------------+--------------+-------+
| A001 | ACC1 | coffeemaker | A |
| A002 | ACC1 | toaster | A |
| A003 | ACC2 | coffee table | A |
| A004 | ACC5 | couch | A |
+------------+----------------+--------------+-------+
I've sets of individually statements, for example:
update products set class = 'A' where product_id = 'A001';
update products set class = 'B' where product_id = 'A005';
update products set class = 'Z' where product_id = 'A150';
I'm making a query putting one update statement below the other update statement and putting a commit statement each 1.000 rows.
It works fine (slow, but fine) but I wanna do it better if it can be possible in any way.
There is a better way to do this more efficient and faster?
One approach would be to create a temporary table holding your update information:
new_product_class:
product_id class
========== =====
A A001
B A005
Z A150
product_id should be an indexed primary key on this new table. Then you can do an UPDATE or a MERGE on the old table joined to this temporary table:
UPDATE (SELECT p.product_id, p.class, n.product_id, n.class
FROM product p
JOIN new_product_class n ON (p.product_id = n.product_id)
SET p.class = n.class
or
MERGE INTO product p
USING new_product_class n
ON (p.product_id = n.product_id)
WHEN MATCHED THEN
UPDATE SET p.class = n.class
Merge should be fast. Other things that you could look into depending on your environment: create a new table based on the old table with nologging followed by some renaming (should backup before and after), bulk updates.
Unless you have an index, each of your update statements scans the entire table. Even if you do have an index, there is a cost associated with the compilation and execution of each statement.
If you have a lot of conditions, and those conditions can vary, then I think Glenn's solution is clearly the way to go. This does everything in a single transaction, and there is no reason to run batches of 1,000 rows -- just do everything all at once.
If the number of conditions is relatively finite (as your example), and they don't change very often, then you can also do this as a simple case:
update products
set class =
case product_id
when 'A001' then 'A'
when 'A005' then 'B'
when 'A150' then 'C'
end
where
product_id in ('A001', 'A005', 'A150')
If it's possible your class field is already set to the correct value, then there is also great value in adding a condition to make sure you are not updating something to the same value. For example if this:
update products set class = 'A' where product_id = 'A001';
Updates 5,000 records, 4,000 of which are already set to 'A', then this would be significantly more efficient:
update products
set class = 'A'
where
product_id = 'A001' and
(class is null or class != 'A')

How to get sum of values per id and update existing records in other table

I have two tables like:
ID | TRAFFIC
fd56756 | 4398
645effa | 567899
894fac6 | 611900
894fac6 | 567899
and
USER | ID | TRAFFIC
andrew | fd56756 | 0
peter | 645effa | 0
john | 894fac6 | 0
I need to get SUM ("TRAFFIC") from first table AND set column traffic to the second table where first table ID = second table ID. ID's from first table are not unique, and can be duplicated.
How can I do this?
Table names from your later comment. Chances are, you are reporting table and column names incorrectly.
UPDATE users u
SET "TRAFFIC" = sub.sum_traffic
FROM (
SELECT "ID", sum("TRAFFIC") AS sum_traffic
FROM stats.traffic
GROUP BY 1
) sub
WHERE u."ID" = sub."ID";
Aside: It's unwise to use mixed-case identifiers in Postgres. Use legal, lower-case identifiers, which do not need to be double-quoted, to make your life easier. Start by reading the manual here.
Something like this?
UPDATE users t2 SET t2.traffic = t1.sum_traffic FROM
(SELECT sum(t1.traffic) t1.sum_traffic FROM stats.traffic t1)
WHERE t1.id = t2.id;

'Implicit' JOIN based on schema's foreign keys?

Hello all :) I'm wondering if there is way to tell the database to look at the schema and infer the JOIN predicate:
+--------------+ +---------------+
| prices | | products |
+--------------+ +---------------+
| price_id (PK)| |-1| product_id(PK)|
| prod_id |*-| | weight |
| shop | +---------------+
| unit_price |
| qty |
+--------------+
Is there a way (preferably in Oracle 10g) to go from:
SELECT * FROM prices JOIN product ON prices.prod_id = products.product_id
to:
SELECT * FROM pricesIMPLICIT JOINproduct
The closest you can get to not writing the actual join condition is a natural join.
select * from t1 natural join t2
Oracle will look for columns with identical names and join by them (this is not true in your case). See the documentation on the SELECT statement:
A natural join is based on all columns in the two tables that have the same name. It selects rows from the two tables that have equal values in the relevant columns. If two columns with the same name do not have compatible data types, then an error is raised
This is very poor practice and I strongly recommend not using it on any environment
You shouldnt do that. Some db systems allow you to but what if you modify the fk's (i.e. add foreign keys)? You should always state what to join on to avoid problems. Most db systems won't even allow you to do an implicit join though (good!).