What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec? - sql

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*

Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?

For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?

Related

Snowflake ignores statement in where clause where I'm comparing timestamps

so I'm building a SCD type 2 in snowflake, but it ignores the where clause in which is comparision between "to_timestamp" and "expiry_date". Expiry_date is a variable that is set to '9999-08-17 07:31:29.901000000' (as infinity) and To_timestamp is a column in table. I want to query only the rows that have to_timestamp set to infinity (they are still active) but snowflake seems to ignore this part of where clause. Below is some of the code (it should update the rows that are expired - that means change their "to_timestamp" to current time. and it does but it does to rows with timestamps of all kind - it ignores last line)
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000';
SET CURRENT_DATE_NTZ = TO_TIMESTAMP_NTZ(CURRENT_TIMESTAMP());
UPDATE CUSTOMER_TARGET CT
SET CT.TO_TIMESTAMP = $CURRENT_DATE_NTZ
FROM POC.SNOWFLAKE_POC.CUSTOMER_STAGE CS
WHERE CT.C_CUSTOMER_ID = CS.C_CUSTOMER_ID
AND (CT.C_FIRST_NAME <> CS.C_FIRST_NAME OR CT.C_LAST_NAME <> CS.C_LAST_NAME OR CT.C_BIRTH_YEAR
<> CS.C_BIRTH_YEAR OR CT.C_BIRTH_COUNTRY <> CS.C_BIRTH_COUNTRY OR CT.C_LAST_REVIEW_DATE<>CS.C_LAST_REVIEW_DATE)
AND CT.TO_TIMESTAMP = $EXPIRY_DATE_NTZ;
I have two of these update statements (one for updates and one for deletes) and a merge statement for inserts. And it ignores the comparision in every single one, updating the rows that have "to_timestamp" set to something like "2021-08-24 07:11:53.510000000". I've tried every combination possible (between ... and ..., >= ... <=, <=, >=, comparing in "case" statement of update,...) - nothing. What could be the cause/solution?

As we do not know the structure of CUSTOMER_TARGET I would suggest to explicitly set the data type of EXPIRY_DATE_NTZ variable to match the column data type:
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000';
SELECT $EXPIRY_DATE_NTZ;
DESCRIBE RESULT LAST_QUERY_ID();
to:
-- TIMESTAMP_NTZ as an example
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000'::TIMESTAMP_NTZ;
SELECT $EXPIRY_DATE_NTZ;
DESCRIBE RESULT LAST_QUERY_ID();
By doing that way there are no "implicit conversions" involved in the process.
Another advice is usage of IS DISTINCT FROM instead of <>. IS DISTINCT FROM is NULL safe, which is important if columns are defined as nullable.
UPDATE CUSTOMER_TARGET CT
SET CT.TO_TIMESTAMP = $CURRENT_DATE_NTZ
FROM POC.SNOWFLAKE_POC.CUSTOMER_STAGE CS
WHERE CT.C_CUSTOMER_ID = CS.C_CUSTOMER_ID
AND (CT.C_FIRST_NAME IS DISTINCT FROM CS.C_FIRST_NAME
OR CT.C_LAST_NAME IS DISTINCT FROM CS.C_LAST_NAME
OR CT.C_BIRTH_YEAR IS DISTINCT FROM CS.C_BIRTH_YEAR
OR CT.C_BIRTH_COUNTRY IS DISTINCT FROM CS.C_BIRTH_COUNTRY
OR CT.C_LAST_REVIEW_DATE IS DISTINCT FROM CS.C_LAST_REVIEW_DATE)
AND CT.TO_TIMESTAMP = $EXPIRY_DATE_NTZ;

Your SQL does not have any issues with the filters (ORs are surrounded by the brackets etc). I assume that you have checked the execution profile, and did not see your filter (CT.TO_TIMESTAMP = '9999-08-17 07:31:29.901000000'). In this case, all rows in the target table should have this value in the column TO_TIMESTAMP.
I highly recommend you check the data first. If you are running multiple UPDATE/MERGE commands, you may miss that the data has already updated with this value.

Ordering by sub string of a field with OPNQRYF

I have a requirement where I need to change the order in which records are printed in a Report. I need to order the records by a substring of a field of the records.
There is an OPNQRYF as below before the call to the report RPG is made:
OVRDBF FILE(MOHDL35) SHARE(*YES)
BLDQRYSLT QRYSLT(&QRYSLT) +
SELECT((CHARDT *GE &FRDATE F2) +
(CHARDT *LE &TODATE F2) +
(HDPLVL *EQ 'FS' F2) +
(HDMPLT *EQ &PLANT F2))
OPNQRYF FILE((*LIBL/MOHDL35)) +
QRYSLT(&QRYSLT) +
KEYFLD(*FILE) +
MAPFLD((ZONEDT HDAEDT *ZONED 8 0) +
(CHARDT ZONEDT *CHAR 8))
One way I see how to do this is to do a RUNSQL to create a temp table in qtemp with the MOHDL35 records in the required order. The substr SQL function would help to achieve this much easier. This should have the same structure as that of MOHDL35 (FIELD NAMES, RECORD FORMAT)
Then replace the use of this file in the RPG program with the newly created table name. I havent tried this yet, but would this work? does it sound like a good idea? Are there any better suggestions?

You can do that with OPNQRYF by using the MAPFLD parameter like this:
OPNQRYF FILE((JCVCMP))
KEYFLD((*MAPFLD/PART))
MAPFLD((PART '%SST(VCOMN 2 5)'))
The fields in JCVCOMN are now sorted like this:
VENNO VCMTP VCMSQ VCOMN
----- ----- ----- -------------------------
1,351 ICL 3 Let's see what wow
1,351 ICL 1 This is a test
1,351 NDA 2 another comment
1,351 NDA 1 more records
Notice that the records are sorted by the substring of VCOMN starting with the second character.
So here is your OPNQRYF with multiple key fields specified
OPNQRYF FILE((*LIBL/MOHDL35))
QRYSLT(&QRYSLT)
KEYFLD((*MAPFLD/CHARDT) (*MAPFLD/HDPROD))
MAPFLD((ZONEDT HDAEDT *ZONED 8 0) (CHARDT ZONEDT *CHAR 8)
(HDPROD '%SST(HDPROD 1 2) *CAT %SST(HDPROD 10 12)
*CAT %SST(HDPROD 13 16)'))
Some notes: I am guessing that HDAEDT is a PACKED number. If so, you don't need to map it to a ZONED number just to get it to a character value. If you need the ZONED value, that is ok (but PACKED should work just as well). Otherwise, you can just use:
MAPFLD((CHARDT HDAEDT *CHAR 8))
Also in your OVRDBF, you need to make sure you choose the correct Override Scope OVRSCOPE. The IBM default is OVRSCOPE(*ACTGRPDFN). OPNQRYF also has a scope OPNSCOPE. You need to make sure that the OVRSCOPE, the OPNSCOPE, and the program using the table all use the same activation group. There are a lot of different combinations. If you can't make it work, you can always change them all to *JOB, and that will work. But there is nothing intrinsic about OPNQRYF that prevents it from working from a CLP.

I would try creating a view with all the table fields plus the substring'd column, and then use OPNQRYF with that instead of the table, specifying the substring'd column as the KEYFLD. That would probably be simpler (& potentially quicker) than copying the whole lot into QTEMP every time.

Check existence of given text in a table

I have a course code name COMP2221.
I also have a function finder(int) that can find all codes matching a certain pattern.
Like:
select * from finder(20004)
will give:
comp2211
comp2311
comp2411
comp2221
which match the pattern comp2###.
My question is how to express "whether comp2221 is in finder(20004)" in a neat way?

How to express "whether comp2221 is in finder(20004)" in a neat way?
Use an EXISTS expression and put the test into the WHERE clause:
SELECT EXISTS (SELECT FROM finder(20004) AS t(code) WHERE code = 'comp2221');
Returns a single TRUE or FALSE. Never NULL and never more than one row - even if your table function finder() returns duplicates.
Or fork the function finder() to integrate the test directly. Probably faster.

Foxpro String Variable combination in Forloop

As in title, there is an error in my first code in FOR loop: Command contains unrecognized phrase. I am thinking if the method string+variable is wrong.
ALTER TABLE table1 ADD COLUMN prod_n c(10)
ALTER TABLE table1 ADD COLUMN prm1 n(19,2)
ALTER TABLE table1 ADD COLUMN rbon1 n(19,2)
ALTER TABLE table1 ADD COLUMN total1 n(19,2)
There are prm2... until total5, in which the numbers represent the month.
FOR i=1 TO 5
REPLACE ALL prm+i WITH amount FOR LEFT(ALLTRIM(a),1)="P" AND
batch_mth = i
REPLACE ALL rbon+i WITH amount FOR LEFT(ALLTRIM(a),1)="R"
AND batch_mth = i
REPLACE ALL total+i WITH sum((prm+i)+(rbon+i)) FOR batch_mth = i
NEXT
ENDFOR
Thanks for the help.

There are a number of things wrong with the code you posted above. Cetin has mentioned a number of them, so I apologize if I duplicate some of them.
PROBLEM 1 - in your ALTER TABLE commands I do not see where you create fields prm2, prm3, prm4, prm5, rbon2, rbon3, etc.
And yet your FOR LOOP would be trying to write to those fields as the FOR LOOP expression i increases from 1 to 5 - if the other parts of your code was correct.
PROBLEM 2 - You cannot concatenate a String to an Integer so as to create a Field Name like you attempt to do with prm+i or rbon+1
Cetin's code suggestions would work (again as long as you had the #2, #3, etc. fields defined). However in Foxpro and Visual Foxpro you can generally do a task in a variety of ways.
Personally, for readability I'd approach your FOR LOOP like this:
FOR i=1 TO 5
* --- Keep in mind that unless fields #2, #3, #4, & #5 are defined ---
* --- The following will Fail ---
cFld1 = "prm" + STR(i,1) && define the 1st field
cFld2 = "rbon" + STR(i,1) && define the 2nd field
cFld3 = "total" + STR(i,1) && define the 3rd field
REPLACE ALL &cFld1 WITH amount ;
FOR LEFT(ALLTRIM(a),1)="P" AND batch_mth = i
REPLACE ALL &cFld2 WITH amount ;
FOR LEFT(ALLTRIM(a),1)="R" AND batch_mth = i
REPLACE ALL &cFld3 WITH sum((prm+i)+(rbon+i)) ;
FOR batch_mth = i
NEXT
NOTE - it might be good if you would learn to use VFP's Debug tools so that you can examine your code execution line-by-line in the VFP Development mode. And you can also use it to examine the variable values.
Breakpoints are good, but you have to already have the TRACE WINDOW open for the Break to work.
SET STEP ON is the Debug command that I generally use so that program execution will stop and AUTOMATICALLY open the TRACE WINDOW for looking at code execution and/or variable values.

Do you mean you have fields named prm1, prm2, prm3 ... prm12 that represent the months and you want to update them in a loop? If so, you need to understand that a "fieldName" is a "name" and thus you need to use a "name expression" to use it as a variable. That is:
prm+i
would NOT work but:
( 'pro'+ ltrim(str(m.i)) )
would.
For example here is your code revised:
For i=1 To 5
Replace All ('prm'+Ltrim(Str(m.i))) With amount For Left(Alltrim(a),1)="P" And batch_mth = m.i
Replace All ('rbon'+Ltrim(Str(m.i))) With amount For Left(Alltrim(a),1)="R" And batch_mth = m.i
* ????????? REPLACE ALL ('total'+Ltrim(Str(m.i))) WITH sum((prm+i)+(rbon+i)) FOR batch_mth = i
Endfor
However, I must admit, your code doesn't make sense to me. Maybe it would be better if you explained what you are trying to do and give some simple data with the result you expect (as code - you can use FAQ 50 on foxite to create code for data).

Oracle: updating column IF ONLY it is not in condition of Merge statement

I'm having trouble with this Oracle Merge query.
merge into NEW_DOCTORS NP
USING OLD_DOCTORS MD
on (MD.SSN=NP.SSN OR MD.NPI=NP.NPI OR MD.PIN=NP.PIN)
WHEN MATCHED THEN
UPDATE SET
NP.othervalues=MD.othervalues
NP.PIN/SSN/NPI=MD.PIN/SSN/NPI (**update two of these three that did not return TRUE in the above ON condition)
WHEN NOT MATCHED THEN
insert(NP.SSN,NP.NPI,NP.PIN,NP.other_values)
values(MD.SSN,MD.NPI,MD.PIN,MD.other_values)
Is this possible? Thanks!
EDIT: I'm asking because I read somewhere that fields that are in the ON condition can't be updated. However I'm not sure if the author was talking only in context of the field that evaluates true or for all fields. I'm hoping there's someone who might have an idea about this, as well as any workarounds.

You can use CASE WHEN ... :
UPDATE SET
NP.othervalues=MD.othervalues
,NP.PIN = CASE WHEN (MD.SSN=NP.SSN OR MD.NPI=NP.NPI) THEN MD.PIN ELSE NP.PIN END
,--add here the 2 others
This means, when you have MD.SSN=NP.SSN OR MD.NPI=NP.NPI true, then update NP.PIN with MD.PIN else let the same value NP.PIN
Or you can do 3 differents MERGE, so it will be more readable.
EDIT
Thinking about it, if you update only when this is not the same value (cause it will join only on same value) you can directly update them with the MD table value.
Do that :
NP.othervalues=MD.othervalues
,NP.PIN=MD.PIN
,NP.SSN=MD.SSN
,NP.NPI=MD.NPI
If they match, the update will keep same value, if not it will update the value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec? - sql

For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html. This is not necessarily a win - but it might well be in your case. Or perhaps you can just add that implicit check as an explicit one?

Related

Snowflake ignores statement in where clause where I'm comparing timestamps

Ordering by sub string of a field with OPNQRYF

Check existence of given text in a table

Foxpro String Variable combination in Forloop

Oracle: updating column IF ONLY it is not in condition of Merge statement

Categories

Resources