How to improve an Update query in Oracle - sql

I'm trying to update two columns in an archaic Oracle database, but the query simply doesn't finish and nothing is updated. Any ideas to improve the query or something else that can be done? I don't have DBA skills/knowledge and unsure if indexing would help, so would appreciate comments in that area, too.
PERSON table: This table has 200 million distinct person_id's. There are no duplicates. The person_id is numeric and am trying to update the favorite_color and color_confidence columns, which are varchar2 and values currently NULLed out.
person table
person_id favorite_color color_confidence many_other_columns
222
333
444
TEMP_COLOR_CONFIDENCE table: I'm trying to get the favorite_color and color_confidence from this table and update to the PERSON table. This table has 150 million distinct person's, again nothing duplicated.
temp_color_confidence
person_id favorite_color color_confidence
222 R H
333 Y L
444 G M
This is my update query, which I realize only updates those found in both tables. Eventually I'll need to update the remaining 50 million with "U" -- unknown. Solving that in one shot would be ideal too, but currently just concerned that I'm not able to get this query to complete.
UPDATE person p
SET (favorite_color, color_confidence) =
(SELECT t.favorite_color, t.color_confidence
FROM temp_color_confidence t
WHERE p.person_id = t.person_id)
WHERE EXISTS (
SELECT 1
FROM temp_color_confidence t
WHERE p.person_id = t.person_id );
Here's where my ignorance shines... would indexing on person_id help, considering they are all distinct anyway? Would indexing on favorite_color help? There are less than 10 colors and only 3 confidence values.

For every person, it has to find the corresponding row in temp_color_confidence. The way to do that with the least I/O is to scan each table once and crunch them together in a single hash join, ideally all in memory. Indexes are unlikely to help with that, unless maybe temp_color_confidence is very wide and verbose and has an index on (person_id, favorite_color, color_confidence) which the optimiser can treat as a skinny table.
Using merge might be more efficient as it can avoid the second scan of temp_color_confidence:
merge into person p
using temp_color_confidence t
on (p.person_id = t.person_id)
when matched then update
set p.favorite_color = t.favorite_color, p.color_confidence = t.color_confidence;
If you are going to update every row in the table, though, you might consider instead creating a new table containing all the values you need:
create table person2
( person_id, favorite_color, color_confidence )
pctfree 0 compress
as
select p.person_id, nvl(t.favorite_color,'U'), nvl(t.color_confidence,0)
from person p
left join temp_color_confidence t
on t.person_id = p.person_id;

Related

Postgres - How to find id's that are not used in different multiple tables (inactive id's) - badly written query

I have table towns which is main table. This table contains so many rows and it became so 'dirty' (someone inserted 5 milions rows) that I would like to get rid of unused towns.
There are 3 referent table that are using my town_id as reference to towns.
And I know there are many towns that are not used in this tables, and only if town_id is not found in neither of these 3 tables I am considering it as inactive and I would like to remove that town (because it's not used).
as you can see towns is used in this 2 different tables:
employees
offices
and for table * vendors there is vendor_id in table towns since one vendor can have multiple towns.
so if vendor_id in towns is null and town_id is not found in any of these 2 tables it is safe to remove it :)
I created a query which might work but it is taking tooooo much time to execute, and it looks something like this:
select count(*)
from towns
where vendor_id is null
and id not in (select town_id from banks)
and id not in (select town_id from employees)
So basically I said, if vendor_is is null it means this town is definately not related to vendors and in the same time if same town is not in banks and employees, than it will be safe to remove it.. but query took too long, and never executed successfully...since towns has 5 milions rows and that is reason why it is so dirty..
In face I'm not able to execute given query since server terminated abnormally..
Here is full error message:
ERROR: server closed the connection unexpectedly This probably means
the server terminated abnormally before or while processing the
request.
Any kind of help would be awesome
Thanks!
You can join the tables using LEFT JOIN so that to identify the town_id for which there is no row in tables banks and employee in the WHERE clause :
WITH list AS
( SELECT t.town_id
FROM towns AS t
LEFT JOIN tbl.banks AS b ON b.town_id = t.town_id
LEFT JOIN tbl.employees AS e ON e.town_id = t.town_id
WHERE t.vendor_id IS NULL
AND b.town_id IS NULL
AND e.town_id IS NULL
LIMIT 1000
)
DELETE FROM tbl.towns AS t
USING list AS l
WHERE t.town_id = l.town_id ;
Before launching the DELETE, you can check the indexes on your tables.
Adding an index as follow can be usefull :
CREATE INDEX town_id_nulls ON towns (town_id NULLS FIRST) ;
Last but not least you can add a LIMIT clause in the cte so that to limit the number of rows you detele when you execute the DELETE and avoid the unexpected termination. As a consequence, you will have to relaunch the DELETE several times until there is no more row to delete.
You can try an JOIN on big tables it would be faster then two IN
you could also try UNION ALL and live with the duplicates, as it is faster as UNION
Finally you can use a combined Index on id and vendor_id, to speed up the query
CREATE TABLe towns (id int , vendor_id int)
CREATE TABLE
CREATE tABLE banks (town_id int)
CREATE TABLE
CREATE tABLE employees (town_id int)
CREATE TABLE
select count(*)
from towns t1 JOIN (select town_id from banks UNION select town_id from employees) t2 on t1.id <> t2.town_id
where vendor_id is null
count
0
SELECT 1
fiddle
The trick is to first make a list of all the town_id's you want to keep and then start removing those that are not there.
By looking in 2 tables you're making life harder for the server so let's just create 1 single list first.
-- build empty temp-table
CREATE TEMPORARY TABLE TEMP_must_keep
AS
SELECT town_id
FROM tbl.towns
WHERE 1 = 2;
-- get id's from first table
INSERT TEMP_must_keep (town_id)
SELECT DISTINCT town_id
FROM tbl.banks;
-- add index to speed up the EXCEPT below
CREATE UNIQUE INDEX idx_uq_must_keep_town_id ON TEMP_must_keep (town_id);
-- add new ones from second table
INSERT TEMP_must_keep (town_id)
SELECT town_id
FROM tbl.employees
EXCEPT -- auto-distincts
SELECT town_id
FROM TEMP_must_keep;
-- rebuild index simply to ensure little fragmentation
REINDEX TABLE TEMP_must_keep;
-- optional, but might help: create a temporary index on the towns table to speed up the delete
CREATE INDEX idx_towns_town_id_where_vendor_null ON tbl.towns (town_id) WHERE vendor IS NULL;
-- Now do actual delete
-- You can do a `SELECT COUNT(*)` rather than a `DELETE` first if you feel like it, both will probably take some time depending on your hardware.
DELETE
FROM tbl.towns as del
WHERE vendor_id is null
AND NOT EXISTS ( SELECT *
FROM TEMP_must_keep mk
WHERE mk.town_id = del.town_id);
-- cleanup
DROP INDEX tbl.idx_towns_town_id_where_vendor_null;
DROP TABLE TEMP_must_keep;
The idx_towns_town_id_where_vendor_null is optional and I'm not sure if it will actaully lower the total time but IMHO it will help out with the DELETE operation if only because the index should give the Query Optimizer a better view on what volumes to expect.

Oracle sql - identify and update primary and secondary rows in same table

I have a scenario where i need to identify a combination of records which are duplicates and use them to mark and identify which one is primary and which is secondary and then use an additional column to update the keys. This will help me update another child table which has referential integrity. Here's an example
Table Member
ID Name Birthdt MID
1 SK 09/1988 312
2 SK 09/1988 999
3 SL 06/1990 315
4 GK 08/1990 316
5 GK 08/1990 999
So from the above table my logic to identify duplicate is -- I do a group by on Name and Birthdate and when MID is 999 i consider that as a duplicate
so I created another temp table to capture dups.
Table member_dups
ID NAME BIRTHDT MID M_flg M_Key
1 SK 09/1988 P 1
2 SK 09/1988 S 1
4 GK 08/1990 P 4
5 GK 08/1990 S 4
For my table member_dups im able to load the duplicate records and update the flag . however I'm finding it difficult to get the right M_KEY for records marked as secondary. If i can achieve that then I can take that record and update the other table accordingly.
Thanks for any help offered.
If I understand your logic right the records that had MID = 999 is the secondary in member_dups.
If so you should be able to use an simple update with a join:
update member_dups
set m_key = m.id
from member_dups md
inner join Member m on m.name = mb.name and m.birthdt = mb.birthdt
where mb.m_flg = 's' and m.mid = 999
This example uses MSSQL syntax and thus isn't valid Oracle syntax, but you should get the idea, and hopefully you know Oracle better than I do. I'll try to work out the correct Oracle syntax and update the answer soon.
EDIT: I think this is the working Oracle syntax, but I haven't been able to test it:
MERGE INTO member_dups
USING
(
SELECT id,
name,
birthdt
FROM Member
where m.mid = 999
) m ON (m.name = mb.name and m.birthdt = mb.birthdt and mb.m_flg = 's')
WHEN MATCHED THEN UPDATE
SET member_dups.m_key = m.id

Overwrite data in one table with data from another if two keys match

EDIT: I'm using the PROC SQL functionality in SAS.
I'm trying to overwrite data in a primary table with data in a secondary table if two IDs match. Basically, there is a process modifying certain values associated with various IDs, and after that process is done I want to update the values associated with those IDs in the primary table. For a very simplified example:
Primary table:
PROD_ID PRICE IN_STOCK
1 5.25 17
2 10.24 200 [...additional fields...]
3 6.42 140
...
Secondary table:
PROD_ID PRICE IN_STOCK
2 11.50 175
3 6.42 130
And I'm trying to get the new Primary table to look like this:
PROD_ID PRICE IN_STOCK
1 5.25 17
2 11.50 175 [...additional fields...]
3 6.42 130
...
So it overwrites certain columns in the primary table if the keys match.
In non-working SQL code, what I'm trying to do is something like this:
INSERT INTO PRIMARY_TABLE (PRICE, IN_STOCK)
SELECT PRICE, IN_STOCK
FROM SECONDARY_TABLE
WHERE SECONDARY_TABLE.PROD_ID = PRIMARY_TABLE.PROD_ID
Is this possible to do in one statement like this, or will I have to figure out some workaround using temporary tables (which is something I'm trying to avoid)?
EDIT: None of the current answers seem to be working, although it's probably my fault - I'm using PROC SQL in SAS and didn't specify, so is it possible some of the functionality is missing? For example, the "FROM" keyword doesn't turn blue when using UPDATE, and throws errors when trying to run it, but the UPDATE and SET seem fine...
Do you really want to insert new data? Or update existing rows? If updating, join the tables:
UPDATE PT
SET
PT.PRICE = ST.PRICE,
PT.IN_STOCK = ST.IN_STOCK
FROM
PRIMARY_TABLE PT JOIN SECONDARY_TABLE ST ON PT.PROD_ID = ST.PROD_ID
One answer in SAS PROC SQL is simply to do it as a left join and use COALESCE, which picks the first nonmissing value:
data class;
set sashelp.class;
run;
data class_updates;
input name $ height weight;
datalines;
Alfred 70 150
Alice 59 92
Henry 65 115
Judy 66 95
;;;;
run;
proc sql;
create table class as select C.name, coalesce(U.height,C.height) as height, coalesce(U.weight,C.weight) as weight
from class C
left join class_updates U
on C.name=U.name;
quit;
In this case though the SAS solution outside of SQL is superior in terms of simplicity of coding.
data class;
update class class_updates(in=u);
by name;
run;
This does require both tables to be sorted. There are a host of different ways of doing this (hash table, format lookup, etc.) if you have performance needs.
Try this:
INSERT INTO PRIMARY_TABLE (PRICE, IN_STOCK) VALUES
(SELECT PRICE, IN_STOCK
FROM SECONDARY_TABLE
JOIN PRIMARY_TABLE ON SECONDARY_TABLE.PROD_ID = PRIMARY_TABLE.PROD_ID)
The only reason you would have to use an INSERT statement is if there are IDs present in the secondary table and not present in the primary table. If this is not the case then use a regular UPDATE statement. If it is the case then use the following:
INSERT INTO PRIMARY_TABLE (ID, PRICE, IN_STOCK)
SELECT ID, PRICE, IN_STOCK
FROM SECONDARY_TABLE s
ON DUPLICATE KEY UPDATE PRICE = s.PRICE, IN_STOCK = s.IN_STOCK

in sql how to return single row of data from more than one row in the same table

I have a single table of activities, some labelled 'Assessment' (type_id of 50) and some 'Counselling' (type_id of 9) with dates of the activities. I need to compare these dates to find how long people wait for counselling after assessment. The table contains rows for many people, and that is the primary key of 'id'. My problem is how to produce a result row with both the assessment details and the counselling details for the same person, so that I can compare the dates. I've tried joining the table to itself, and tried nested subqueries, I just can't fathom it. I'm using Access 2010 btw.
Please forgive my stupidity, but here's an example of joining the table to itself that doesn't work, producing nothing (not surprising):
Table looks like:
ID TYPE_ID ACTIVITY_DATE_TIME
----------------------------------
1 9 20130411
1 v 50 v 20130511
2 9 20130511
3 9 20130511
In the above the last two rows have only had assessment so I want to ignore them, and just work on the situation where there's both assessment and counselling 'type-id'
SELECT
civicrm_activity.id, civicrm_activity.type_id,
civicrm_activity.activity_date_time,
civicrm_activity_1.type_id,
civicrm_activity_1.activity_date_time
FROM
civicrm_activity INNER JOIN civicrm_activity AS civicrm_activity_1
ON civicrm_activity.id = civicrm_activity_1.id
WHERE
civicrm_activity.type_id=9
AND civicrm_activity_1.type_id=50;
I'm actually wondering whether this is in fact not possible to do with SQL? I hope it is possible? Thank you for your patience!
Sounds to me like you only want to get the ID numbers where you have a TYPE_ID entry of both 9 and 50.
SELECT DISTINCT id FROM civicrm_activity WHERE type_id = '9' AND id IN (SELECT id FROM civicrm_activity WHERE type_id = '50');
This will give you a list of id's that has entries with both type_id 9 and 50. With that list you can now go and get the specifics.
Use this SQL for the time of type_id 9
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '9'
Use this SQL for the time of type_id 50
SELECT activity_date_time FROM civicrm_activity WHERE id = 'id_from_last_sql' AND type_id = '50'
Your query looks OK to me, too. The one problem might be that you use only one table alias. I don't know, but perhaps Access treats the table name "specially" such that, in effect, the WHERE clause says
WHERE
civicrm_activity.type_id=9
AND civicrm_activity.type_id=50;
That would certainly explain zero rows returned!
To fix that, use an alias for each table. I suggest shorter ones,
SELECT A.id, A.type_id, A.activity_date_time,
B.type_id, B.activity_date_time
FROM civicrm_activity as A
JOIN civicrm_activity as B
ON A.id = B.id
WHERE A.type_id=9
AND B.type_id=50;

SQL Server 2005 -Join based on criteria in table column

Using SQL Server 2005, what is the most efficient way to join the two tables in the following scenario ?
The number of records in each table could be fairly large about 200000 say.
The only way I can currently think of doing this is with the use of cursors and some dynamic SQL for each item which will clearly be very inefficient.
I have two tables - a PERSON table and a SEARCHITEMS table. The SEARCHITEMS table contains a column with some simple criteria which is to be used when matching records with the PERSON table. The criteria can reference any column in the PERSON table.
For example given the following tables :
PERSON table
PERSONID FIRSTNAME LASTNAME GENDER AGE ... VARIOUS OTHER COLUMNS
1 Fred Bloggs M 16
....
200000 Steve Smith M 18
SEARCHITEMS table
ITEMID DESCRIPTION SEARCHCRITERIA
1 Males GENDER = 'M'
2 Aged 16 AGE=16
3 Some Statistic {OTHERCOLUMN >= SOMEVALUE AND OTHERCOLUMN < SOMEVALUE}
....
200000 Males Aged 16 GENDER = 'M' AND AGE = 16
RESULTS table should contain something like this :
ITEMID DESCRIPTION PERSONID LASTNAME
1 Males 1 Bloggs
1 Males 200000 Smith
2 Aged 16 1 Bloggs
....
200000 Males Aged 16 1 Bloggs
It would be nice to be able to just do something like
INSERT INTO RESULTSTABLE
SELECT *
FROM PERSON P
LEFT JOIN SEARCHITEMS SI ON (APPLY SI.SEARCHCRITERIA TO P)
But I can't see a way of making this work. Any help or ideas appreciated.
Seeing that the SEARCHITEMS table is non-relational by nature, it seems that the cursor and dynamic SQL solution is the only workable one. Of course this will be quite slow and I would "pre-calculate" the results to make it somewhat bearable.
To do this create the following table:
CREATE TABLE MATCHEDITEMS(
ITEMID int NOT NULL
CONSTRAINT fkMatchedSearchItem
FOREIGN KEY
REFERENCES SEARCHITEMS(ITEMID),
PERSONID int
CONSTRAINT fkMatchedPerson
FOREIGN KEY
REFERENCES PERSON(PERSONID)
CONSTRAINT pkMatchedItems
PRIMARY KEY (ITEMID, PERSONID)
)
The table will contain a lot of data, but considering it only stores 2 int columns the footprint on disk will be small.
To update this table you create the following triggers:
a trigger on the SEARCHITEMS table which will populate the MATCHEDITEMS table whenever a rule is changed or added.
a trigger on the PERSON table which will run the rules on the updated or added PERSON records.
Results can then simply be presented by joining the 3 tables.
SELECT m.ITEMID, m.DESCRIPTION, m.PERSONID, p.LASTNAME
FROM MATCHEDITEMS m
JOIN PERSON p
ON m.PERSONID = p.PERSONID
JOIN SEARCHITEMS s
ON m.ITEMID = s.ITEMID
You could build your TSQL dynamically, and then execute it with sp_executesql.