Regex comparison in Oracle between 2 varchar columns (from different tables) - sql

I am trying to find a way to capture relevant errors from oracle alertlog. I have one table (ORA_BLACKLIST) with column values as below (these are the values which I want to ignore from
V$DIAG_ALERT_EXT)
Below are sample data in ORA_BLACKLIST table. This table can grow based on additional error to ignore from alertlog.
ORA-07445%[kkqctdrvJPPD
ORA-07445%[kxsPurgeCursor
ORA-01013%
ORA-27037%
ORA-01110
ORA-2154
V$DIAG_ALERT_EXT contains a MESSAGE_TEXT column which contains sample text like below.
ORA-01013: user requested cancel of current operation
ORA-07445: exception encountered: core dump [kxtogboh()+22] [SIGSEGV] [ADDR:0x87] [PC:0x12292A56]
ORA-07445: exception encountered: core dump [java_util_HashMap__get()] [SIGSEGV]
ORA-00600: internal error code arguments: [qercoRopRowsets:anumrows]
I want to write a query something like below to ignore the black listed errors and only capture relevant info like below.
select
dae.instance_id,
dae.container_name,
err_count,
dae.message_level
from
ORA_BLACKLIST ob,
V$DIAG_ALERT_EXT dae
where
group by .....;
Can someone suggest a way or sample code to achieve it?
I should have provided the exact contents of blacklist table. It currently contains some regex (perl) and I want to convert it to oracle like regex and compare with v$diag_alert_ext message_text column. Below are sample perl regex in my blacklist table.
ORA-0(,|$| )
ORA-48913
ORA-00060
ORA-609(,|$| )
ORA-65011
ORA-65020 ORA-31(,|$| )
ORA-7452 ORA-959(,|$| )
ORA-3136(,|)|$| )
ORA-07445.[kkqctdrvJPPD
ORA-07445.[kxsPurgeCursor –

Your blacklist table looks like like patterns, not regular expressions.
You can write a query like this:
select dae.* -- or whatever columns you want
from V$DIAG_ALERT_EXT dae
where not exists (select 1
from ORA_BLACKLIST ob
where dae.message_text like ob.<column name>
);
This will not have particularly good performance if the tables are large.

Related

PgSQL insert into select after cast

I have a table with many columns stored as text. These columns have information which i am planning to cast and copy the data into a fresh table with the same column names but with correct data types ( float8 )
I tested the SELECT with CAST operator "::" and it works fine. All the columns are being converted as you can see in the picture
However , when I uncomment the INSERT statement ( above it ) to start writing in the target table, it throws an error. The target table has identical column names and only float8 column types.
The expression from the source table is indeed of type text but i am using the cast operator so why does it not work like before when only running the SELECT statement?
My query below:
INSERT INTO "PM"."new_VM_gcell_evolution_hourly_BSC"
(select
"CR3120:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"DL_Mean_Quality"::float8,
"A312Ca:Failed_Assignments_during_MTC_on_the_A_Interface_Includi"::float8,
"nsp_Urban_TA_4number"::float8,
"A3129C:Failed_Assignments_First_Assignment,_Assignment_Timed_Ou"::float8,
"R3120C:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"TSs_Interf_B5"::float8,
"R3120D:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Out_Internal_Succ_Rate"::float8,
"CH_Req_Protocol_Undefinednbnumber"::float8,
"UL_Drop_Congnumber"::float8,
"Timing_Adv"::float8,
"DL_Qual"::float8,
"A3100C:Assignment_Requests_TCHH_Only"::float8,
"MS_to_BTS_max_distance"::float8,
"TSs_Interf_B2"::float8,
"UL_Qual"::float8,
"UL_Drop_EDGE_Rate"::float8,
"Avg_BTS_Power_Level_AMR"::float8,
"DL_Drop_N3105number"::float8,
"nsp_Urban_TA_2_R"::float8,
"CH_Req_CallReestabnbnumber"::float8,
"TCH_Drops_cause_Timing_Advance"::float8,
"TCH_Drops_per_Erlang"::float8,
"UL_Drop_Flushnumber"::float8,
"A312F:Number_of_Assignment_Failures_No_Abis_Resource_Available"::float8,
"UL_Drop_GPRS_Rate"::float8,
"nsp_Urban_TA_3number"::float8,
"Call_Drop_rate"::float8,
"DL_Fail_Assingnumber"::float8,
"SDCCH_Radio_Failures"::float8,
"HO_Cmd_UL_Qual"::float8,
"UL/DL_RxQualnumber"::float8,
"UL/DL_RxLevnumber"::float8,
"UL_Fail_EDGE_Rate"::float8,
"Call_Drop_Total"::float8,
"HO_Out_Succ_Rate"::float8,
"Date"::text,
"A3127E:Failed_Assignments_during_Call_Reestablishment_on_the_Um"::float8,
"UL_Drop_N3103number"::float8,
"Configured_TCH"::float8,
"SDCCH_Drop_Call_Rate_only_LU"::float8,
"A312M:Failed_Assignments_Reconnection_to_Old_Channels,_No_Chann"::float8,
"SDCCH_Blocking_Rate"::float8,
"TCH_Drops_cause_DL_FER"::float8,
"TCH_Fail_rate"::float8,
"DL_Drop_Preemnumber"::float8,
"FR_Traffic_totalErl"::float8,
"DL_Congnumber"::float8,
"TCH_Assign_Unsuccnumber"::float8,
"nsp_Urban_TA_0_R"::float8,
"DL_Drop_GPRS_Rate"::float8,
"HR_TraficErl"::float8,
"SDCCH_Cong_Rate"::float8,
"Abis_and_Ater_Interface_Anlysistimes"::float8,
"A3100B:Assignment_Requests_TCHF_Only"::float8,
"AS4300D:Mean_Uplink_Level_during_Radio_Link_Failure_SDCCHdB"::float8,
"nsp_Urban_TA_89number"::float8,
"TCH_Drops_cause_DownQual"::float8,
"SDCCH_Drop_RxLev"::float8,
"Max_UL_Pwr_Duration"::float8,
"nsp_Channel_Req_LUnumber"::float8,
"TCH_HR_Initialy_Config"::float8,
"HO_Inc_Cong_Rate"::float8,
"DL_RxLev_avgdBm"::float8,
"TCH_Assign_Requestsnumber"::float8,
"A312K:Failed_Assignments_First_Assignment,_No_Channel_Available"::float8,
"Call_Drop_TCH_Quality_Rate"::float8,
"AMR_HR_TraficErl"::float8,
"nsp_Urban_TA_67_R"::float8,
"UL_Fail_MSnumber"::float8,
"Time"::text,
"TCH_Drops_cause_UpLevel"::float8,
"AMR_FR_TraficErl"::float8,
"TCH_FR_Initialy_Config"::float8,
"UL_Drop_N3101number"::float8,
"DL_FERnumber"::float8,
"SDCCH_Drop_Call_Rate_wo_LU"::float8,
"DL_Mean_StrengthdB"::float8,
"Call_Drop_Abis"::float8,
"SDCCH_Initialy_Confignumber"::float8,
"Call_Drop_TCH_RxLev_Rate"::float8,
"TA_Meannumber"::float8,
"nsp_Urban_TA_67number"::float8,
"AS3240NA:Average_MS_Power_Level_of_NonAMR_Call"::float8,
"Load"::float8,
"A3129Q:Failed_Assignments_Reconnection_to_Old_Channels,_Timer_E"::float8,
"CH_Request_11_bit"::float8,
"AS4340D:Mean_TA_during_Radio_Link_Failure_SDCCH"::float8,
"S4210A:Uplink_Interference_Indication_Messages_SDCCH"::float8,
"UL_FERnumber"::float8,
"A3129P:Failed_Assignments_Reconnection_to_Old_Channels,_Timer_E"::float8,
"K3003A:Successful_SDCCH_Seizures_Call_Type"::float8,
"CH_Req_LAUnbnumber"::float8,
"SDCCH_Drop_Rate"::float8,
"GBSC"::text,
"CS_CSSR_with_SDCCH_blocks_wo_LU"::float8,
"A312Aa:Failed_Assignments_during_MOC_on_the_A_Interface_Includi"::float8,
"Avg_BTS_Power_Level"::float8,
"R3120A:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"A312S:Failed_Assignments_Signaling_Channel"::float8,
"SD_Fail_MoC"::float8,
"UL_RxQual_avg"::float8,
"Call_Drop_HO"::float8,
"AS4330D:Mean_Downlink_Quality_during_Radio_Link_Failure_SDCCH"::float8,
"CH_Req_Emerg_Callsnbnumber"::float8,
"UL_RxQualnumber"::float8,
"UL_Fail_OtherCausenumber"::float8,
"A312Da:Failed_Assignments_during_Emergency_Call_on_the_A_Interf"::float8,
"UL_Drop_Preemnumber"::float8,
"TCH_Drops_cause_UpDown_FER"::float8,
"AS4320D:Mean_Downlink_Level_during_Radio_Link_Failure_SDCCHdB"::float8,
"A3129O:Failed_Assignments_First_Assignment,_Directed_Retry_Time"::float8,
"Max_DL_Pwr_Duration"::float8,
"UL_Drop_Suspendnumber"::float8,
"TCH_Assign_Congestionnumber"::float8,
"Transmission_Resource_Analysistimes"::float8,
"HO_Out_RxQual_rate"::float8,
"nsp_Urban_TA_1013number"::float8,
"A3129I:Failed_Assignments_Invalid_State"::float8,
"DL_Fail_OtherCausenumber"::float8,
"CR3129:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Inc_Successnumber"::float8,
"A3129B:Failed_Assignments_First_Assignment,_Terrestrial_Resourc"::float8,
"Trafic_totalErl"::float8,
"nsp_urban_TA_1463all_R"::float8,
"HO_Out_RxQualnumber"::float8,
"DL_RxQualnumber"::float8,
"HO_Inc_Succ_Rate"::float8,
"SDCCH_Fail_Rate"::float8,
"SDCCH_Non_Radio_Drops"::float8,
"nsp_Urban_TA_0number"::float8,
"A3129E:Failed_Assignments_CIC_Unavailable"::float8,
"nsp_CH_Request_PSnumber"::float8,
"TCH_Assign_Unsucc_rate"::float8,
"Call_Drop_Radio"::float8,
"EFR_TraficErl"::float8,
"A312L:Failed_Assignments_Reconnection_to_Old_Channels,_No_Chann"::float8,
"TCH_Drops_cause_DownLevel"::float8,
"nsp_Urban_TA_5number"::float8,
"Abnormal_Terminals_Analysistimes"::float8,
"HO_Inc_Congnumber"::float8,
"CSSR_Rate"::float8,
"M3020C:Call_Drops_on_SDCCHQuality"::float8,
"RH333:Handover_Drop_Rate_of_TCH"::float8,
"UL_Drop_EDGE_Abisnumber"::float8,
"A3129J:Failed_Assignments_Invalid_Message"::float8,
"TCH_Availability"::float8,
"UL_Level"::float8,
"nsp_Urban_TA_4_R"::float8,
"UL_Drop_OtherCausenumber"::float8,
"HR_Traffic_Rate"::float8,
"DL_Level"::float8,
"CH_Request_8_bit"::float8,
"UL_Fail_MS_Assingnumber"::float8,
"DL_Fail_BSC_Commandnumber"::float8,
"A3129R:Failed_Assignments_Reconnection_to_Old_Channels,_Reconne"::float8,
"UL_Fail_BSC_Commandnumber"::float8,
"DL_Drop_Suspendnumber"::float8,
"nsp_urban_TA_1463allnumber"::float8,
"Direct_Retry"::float8,
"nsp_Urban_TA_89_R"::float8,
"CH_Req_PSnbnumber"::float8,
"A3129N:Failed_Assignments_Reconnection_to_Old_Channels,_Terrest"::float8,
"A312A:Failed_Assignments_First_Assignment,_No_Channel_Available"::float8,
"nsp_Urban_TA_2number"::float8,
"DL_RxQual_avg"::float8,
"UL_Mean_StrengthdB"::float8,
"DL_Drop_EDGE_Rate"::float8,
"A3129T:Failed_Assignments_No_Ater_Resource_Available"::float8,
"TSs_Interf_B1"::float8,
"A3129D:Failed_Assignments_Reconnection_to_Old_Channels,_Reconne"::float8,
"nsp_Channel_Req_MOCnumber"::float8,
"UL_Fail_GPRS_Rate"::float8,
"S4210B:Downlink_Interference_Indication_Messages_SDCCH"::float8,
"R3120E:Channel_Assignment_Failures_All_Channels_Busy_or_Channel"::float8,
"HO_Inc_Failnumber"::float8,
"DL_Drop_Flushnumber"::float8,
"S3655:Number_of_configured_TRXs_in_a_cell"::float8,
"TCH_Drops_cause_UpDown_Qual"::float8,
"CR3005:Number_of_Initially_Configured_Channels_Static_PDTCH_Sup"::float8,
"UL/DL_FERnumber"::float8,
"SDCCH_Blocking_"::float8,
"HO_Out_InterRAT_Succ_Rate"::text,
"CM30E:Call_Drops_on_SDCCH_Location_Updating"::float8,
"AS4310D:Mean_Uplink_Quality_during_Radio_Link_Failure_SDCCH"::float8,
"UL_Congnumber"::float8,
"TAnumber"::float8,
"Configured_SDCCH"::float8,
"Othernumber"::float8,
"TCH_Assign_Fail_Radionumber"::float8,
"HO_Out_Internal_Req_Nb"::float8,
"SDCCH_Cong_Nbnumber"::float8,
"UL_Mean_Quality"::float8,
"HO_Out_External_Req_Nb"::float8,
"nsp_Urban_TA_1013_R"::float8,
"SDCCH_Drops_wo_LU"::float8,
"Call_Drop_TCH_Qualitynumber"::float8,
"A3100A:Assignment_Requests_Signaling_Channel_TCH"::float8,
"Better_Cell"::float8,
"Call_Drop_TCH_RxLevnumber"::float8,
"nsp_Channel_Req_MTCnumber"::float8,
"DL_Fail_EDGE_Rate"::float8,
"ZTR104B:Call_Drop_Rate_on_SDCCH_Call_Type"::float8,
"Call_Drop_no_MR"::float8,
"DL_Drop_Congnumber"::float8,
"HO_Inc_InterRAT_reqnumber"::float8,
"TCH_Assign_Cong_rate"::float8,
"HO_Inc_Fail_Rate"::float8,
"TCH_Assign_Fail_sp_ver_unav"::float8,
"HO_Inc_InterRAT_unsucc"::float8,
"A3170A:Number_of_Completed_TCH_Assignments_CSFB_MOC"::float8,
"HO_Inc_Unsucc_Rate"::float8,
"TSs_Interf_B4"::float8,
"A3100K:Assignment_Requests_Signaling_Channel_SDCCH"::float8,
"TCH_Drops_cause_UpDown_Level"::float8,
"SDCCH_Fail_Nbnumber"::float8,
"TCH_Non_Radio_Drops"::float8,
"DL_Fail_MSnumber"::float8,
"HO_Cmd_DL_Qual"::float8,
"DL_Drop_OtherCausenumber"::float8,
"SMS_on_SDCCH"::float8,
"FERnumber"::float8,
"HO_Inc_Reqnumber"::float8,
"nsp_Urban_TA_3_R"::float8,
"M3020D:Call_Drops_on_SDCCHOther"::float8,
"HO_Out_External_Succ_Rate"::float8,
"M3020A:Call_Drops_on_SDCCHTA"::float8,
"TCH_Assign_Fail_Radio_rate"::float8,
"CR3001:Number_of_Initially_Configured_Channels_Static_PDCH"::float8,
"HR_Traffic_totalErl"::float8,
"CH_Req_MTCnbnumber"::float8,
"SDCCH_Drop_RxQual"::float8,
"UL_RxLev_avgdBm"::float8,
"CH_Req_MOCnbnumber"::float8,
"SDCCH_Dropnumber"::float8,
"A3169A:Failed_Assignments_Um_Cause"::float8,
"A3129H:Failed_Assignments_Clear_Commands_Sent_By_MSC"::float8,
"TCH_Radio_Drops"::float8,
"HO_Out_InterRAT_Req_Nb"::float8,
"HO_Inc_InterRAT_succnumber"::float8,
"DL_Drop_Abisnumber"::float8,
"TCH_Drops_cause_UL_FER"::float8,
"UL_RxLevelnumber"::float8,
"nsp_CH_Request_CSnumber"::float8,
"A3170B:Number_of_Completed_TCH_Assignments_CSFB_MTC"::float8,
"A3129F:Failed_Assignments_CIC_Allocated"::float8,
"DL_Fail_GPRS_Rate"::float8,
"M3020B:Call_Drops_on_SDCCHReceived_Level"::float8,
"nsp_Urban_TA_1number"::float8,
"SDCCH_Availability"::float8,
"Call_Drop_forced_HO"::float8,
"SDCCH_Drop_Others"::float8,
"Call_Drop_Equip."::float8,
"CH_Req_PS"::float8,
"SDCCH_Drop_Call_Rate"::float8,
"TCH_Drops_cause_UpQual"::float8,
"CH_Req_LMU_Reservednbnumber"::float8,
"DL_RxLevnumber"::float8,
"nsp_Urban_TA_1_R"::float8,
"A312Ea:Failed_Assignments_during_Call_Reestablishment_on_the_A_"::float8,
"TSs_Interf_B3"::float8,
"SDCCH_Drop_TimingAdvance"::float8,
"Avg_BTS_Power_Level_NAMR"::float8,
"TCH_Drops_cause_Other"::float8,
"FR_TraficErl"::float8,
"Avg_MS_Power_Level"::float8,
"HO_Outgoing_Requestsnumber"::float8,
"A3129G:Failed_Assignments_A_Interface_Failure"::float8,
"CH_Req_LAU"::float8,
"nsp_Urban_TA_5_R"::float8,
"HO_Inc_Unsuccessnumber"::float8,
"AS3240A:Average_MS_Power_Level_of_AMR_Call"::float8
from "PM"."VM_gcell_evolution_hourly_BSC_recovered")
It says "column is of type double but expression is of type text" so what's really happening is that it's trying to insert one of the expressions where you cast to ::text into a column that is of type double.
If the problem was converting from text to double, you'd get a different message. Besides, the SELECT worked fine, which means it didn't encounter any text data that it couldn't convert to double.
Since you don't specify the target table columns in your INSERT, and you have so many of them, most likely you missed a column or got the order wrong.
Honestly if the table has 270 columns and they have the same name in the source and destination tables, you should really generate the query using something like python from a list of columns, that will be faster than proofreading the 270 lines...

Parsing JSON file in Snowflake Database

Database :SNOWFLAKE
My table contains JSON data for example:
{
"bucket":"IN_Apps",
"bySeqno":56,
"cas":1527639206906626048,
"content":"eyJoaWdoQmluIjoiNTQ4NTA4MDkiLCJkb2N1bWVudFR5cGUiOiJJSU5ETyIsImNhcmRUeXBlIyayI6Ik1BU1RFUkNBUkQifQ==",
"event":"mutation",
"expiration":0,
"flags":33554432,
"key":"iin54850809",
"lockTime":0,
"partition":948,
"revSeqno":1,
"vBucketUuid":137987627737694
}
when i tried to parse it.
select
parse_json:bucket::string as bucket ,
parse_json:bySeqno::string as bySeqno ,
parse_json:cas::INT as cas ,
parse_json:content::string as content ,
parse_json:event::string as event
,parse_json:expiration::string as expiration
,parse_json:flags::string as flags
,parse_json:key::string as key
,parse_json:lockTime::string as lockTime
,parse_json:partition::string as partition
,parse_json:revSeqno::string as revSeqno
,parse_json:vBucketUuid::string as vBucketUuid
from STG_YS_APPS v
but it is throwing error like.
SQL compilation error: error line 2 at position 0 invalid identifier > >'PARSE_JSON'
may someone please help me.
Answer with a known schema
Update: Since you provided schema, which shows a VAR column of VARIANT type, here's what you need, couldn't be simpler:
select
var:bucket::string as bucket,
var:bySeqno::string as bySeqno,
var:cas::int as cas
...
from STG_YS_APPS v
Below the answer before the schema was known
I'll assume you have a VARCHAR (or similar) column in your table that is called json, and stores the values you presented. You didn't provide the schema, so please adjust the column name as necessary.
You're not using PARSE_JSON as a function in your SQL. You should write something like
select
parse_json(json):bucket::string as bucket,
parse_json(json):bySeqno::string as bySeqno,
parse_json(json):cas::int as cas
...
from STG_YS_APPS v

U-sql error: Expected one of: AS EXCEPT FROM GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')' ','

I have a following table:
EstimatedCurrentRevenue -- Revenue column value of yesterday
EstimatedPreviousRevenue --- Revenue column value of current day
crmId
OwnerId
PercentageChange.
I am querying two snapshots of the similarly structured data in Azure data lake and trying to query the percentage change in Revenue.
Following is my query i am trying to join on OpportunityId to get the difference between the revenue values:
#opportunityRevenueData = SELECT (((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue)*100)/opty.EstimatedCurrentRevenue) AS PercentageRevenueChange, optyPrevious.EstimatedPreviousRevenue,
opty.EstimatedCurrentRevenue, opty.crmId, opty.OwnerId From #opportunityCurrentData AS opty JOIN #opportunityPreviousData AS optyPrevious on opty.OpportunityId == optyPrevious.OpportunityId;
But i get the following error:
E_CSC_USER_SYNTAXERROR: syntax error. Expected one of: AS EXCEPT FROM
GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')'
','
at token 'From', line 40
near the ###:
This expression is having the problem i know but not sure how to fix it.
(((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue)*100)/opty.EstimatedCurrentRevenue)
Please help, i am completely new to U-sql
U-SQL is case-sensitive (as per here) with all SQL reserved words in UPPER CASE. So you should capitalise the FROM and ON keywords in your statement, like this:
#opportunityRevenueData =
SELECT (((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue) * 100) / opty.EstimatedCurrentRevenue) AS PercentageRevenueChange,
optyPrevious.EstimatedPreviousRevenue,
opty.EstimatedCurrentRevenue,
opty.crmId,
opty.OwnerId
FROM #opportunityCurrentData AS opty
JOIN
#opportunityPreviousData AS optyPrevious
ON opty.OpportunityId == optyPrevious.OpportunityId;
Also, if you are completely new to U-SQL, you should consider working through some tutorials to establish the basics of the language, including case-sensitivity. Start at http://usql.io/.
This same crazy sounding error message can occur for (almost?) any USQL syntax error. The answer above was clearly correct for the provided code.
However since many folks will probably get to this page from a search for 'AS EXCEPT FROM GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE', I'd say the best advice to handle these is look closely at the snippet of your code that the error message has marked with '###'.
For example I got to this page upon getting a syntax error for a long query and it turned out I didn't have a casing issue, but just a malformed query with parens around the wrong thing. Once I looked more closely at where in the snippet the ### symbol was, the error became clear.

looking for db2 text function or method I can do a text contain rather than like

I'm looking for a db2 function that does a text contain search. At present I am running the following query against the data below....
SELECT distinct
s.search_id,
s.search_heading,
s.search_url
FROM repman.search s, repman.search_tags st
WHERE s.search_id = st.search_id
AND ( UPPER(s.search_heading) LIKE (cast('%REPORT%' AS VARGRAPHIC(32)))
OR (UPPER(st.search_tag) LIKE cast('%REPORT%' AS VARGRAPHIC(32)))
)
ORDER BY s.search_heading;
Which returns...
But if I change the search text to %REPORTS% rather than %REPORT% (which I need to do) the like search does not work and I get zero results.
I read a link that used a function named CONTAINS like below but when trying to use the function I get an error.
SELECT distinct
s.search_id,
s.search_heading,
s.search_url
FROM repman.search s, repman.search_tags st
WHERE s.search_id = st.search_id
AND CONTAINS(s.search_heading, 'REPORTS') = 1
Has anynoe got any suggestions? I'm on db2 version DB2/LINUXPPC 9.1.6.
Thanks
In order to look for a pattern in a string, you can use Regular Expressions. They are built-in DB2 with xQuery since DB2 v9. There are also other ways to do that. I wrote an article in my blog (in Spanish that you can translate) about Regular Expressions in DB2.
xmlcast(xmlquery('fn:matches(\$TEXT,''^[A-Za-z 0-9]*$'')')

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*
Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?
For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?