RegEx for multiline search and replace in SQL query code - sql

There is a lot of qualified documents on the Internet regarding the topic of "search and replace using regular expressions". Only few of them show how to do this in a multiline context. Even fewer show indicate how to generate a regex for several items therein.
I have tried both installable RegEx tools within editors (EditPad Pro, RJ TextED, EmEditor, Notepad++, Sublime Text 3, Visual Studio Professional 2019, the latest JetBrains PHPstorm version, and others) and online RegEx services (regular expressions 101, RegExr) the entire day, read the answers on StackOverflow which corresponded to my title criteria, and additionally tried to make the most of various online tutorials.
You make call me stupid, but I have not been able to understand whether the following concept is feasible at all
The part of the SQL query I want to change is the following one:
AND op.OP1OPVerfahren > 0
AND p.Testzwecke = 0
AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)
UNION ALL
Legend:
op.OP1OPVerfahren is the database field for the first surgery performed, 10 surgical procedures can be documented (OP1OPVerfahren until OP10OPVerfahren)
p.Testzwecke is a JOIN to the patient's personal data such as first name, last name, etc.
ods39.dat_optherapie is the table dat_optherapie from database ods39 - the system consists of 50 MySQL databases of the exact same structure
p.ID is merely the patient's ID
op.revision is an autoincrementing tracker of how many data record sets for the same surgical procedure have been saved (sometime revisions in the sense of precisions are required)
The above-mentioned part of the query has a quantitative complexity associated: Within the query, this segment appears 780 times in the following variation:
AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0
AND p.Testzwecke = 0
AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)
UNION ALL
To fully understand what I want to solve here the expression I want to replace the fore-mentioned with:
AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0
AND p.Testzwecke = 0
AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)
GROUP BY **OP1OPVerfahren** _up_to_ **OP10OPVerfahren**
UNION ALL
The op.OP_x_OPVerfahren (x = 1 to 10) from the very first line and the OP_x_OPVerfahren (x = 1 to 10) within the GROUP BY statement are numerically correlated to each other, i. e. when I want to change my replacing procedure from op.OP1OPVerfahren along 39 databases to op.OP2OPVerfahren for again 39 databases and so on, the GROUP BY numbers shall change accordingly.
Now, this replacement shall be carried out for all 39 databases. The entire SQL query code is about 20.000 lines of code - my reason why I do not want to spend hours on replacing manually as there are more such SQL query structures in different files which need replacing in a similar fashion.
To give you an example:
The code ...
AND op.OP1OPVerfahren > 0
AND p.Testzwecke = 0
AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)
UNION ALL
... needs to be expanded with a GROUP BY OP1OPVerfahren before the UNION ALL for the 39 databases ods01 up to ods39, accordingly. Then with op.OP2OPVerfahren and OP2OPVerfahren for the same 39 databases again until (op.)OP10OPVerfahren is finally reached (= 780 replacements).
The newly inserted GROUP BY statement's OP_x_... counting shall have the same number as the op.OP_x_... numbering.
I have experimented with tons of different regex statements (such as \d\d, (\d)(\d), \d{2}, and many others according to the individual needs of the above-mentioned editors I used) but I was not able to find out how to make one "number detection" (op.OP_x_OPVerfahren and OP_x_OPVerfahren) dependent on the "number detection" from the databases ods_x_.dat_optherapie).
I would greatly appreciate a bit of help from your most valuable experience and expertise, and I would also be very thankful for receiving further recommendations for other than the mentioned editors with a good (and maybe even testable) regex handling.

We can make this work using a regex replace like this:
(AND\ +op\.(OP\d0?OPVerfahren)\ *>\ *0\s+AND\ +p\.Testzwecke\ *=\ *0\s+AND\ +NOT\ +EXISTS\ *\(SELECT\ +DISTINCT\ +1\ +FROM\ +ods[0123][0-9]\.dat_optherapie\ +op2\ +WHERE\ +op2\.patID\ *=\ *p\.ID\ +AND\ +op2\.revision\ *>\ *op\.revision\))(\s+UNION\s+ALL)
Demo
It's sticks rather tight to the original string and mostly only introduces variable-length quantifiers for whitespace characters. When there is a \ * an optional space may occur, if the space is mandatory \ + is used. Otherwise the whitespace shorthand character \s is used to allow not only spaces but newlines and alike. To make it work, enable the s|singleline flag (or add (?s) in front of the pattern).

I believe something like the following regex find/replace expressions will do what you are asking:
Find:
AND op.OP(\d{1,2})(OPVerfahren.*?\))
Replace with:
AND op.OP$1$2 \n GROUP BY OP$1OPVerfahren
Note that it needs the "global" and "dot matches newline" options set for the regex.
To briefly explain, this has 2 capturing groups, one for the digit(s) between op.OP and OPVerfahren and the second to capture everything after that up to the closing bracket of the "(SELECT DISTINCT... ). These are then used as $1 and $2 in the replacement section of the regex.
Test example here. I believe this should work in Notepad++.
(By the way, I think your "GROUP BY OP1Verfahren" should be "GROUP BY OP1OPVerfahren" right? i.e. 2 lots of "OP"s!)

Related

Starting from a column type, how to find supported aggregations in Postgres?

I'm trying to figure out from a column type, which aggregates the data type supports. There's a lot of variety amongst types, just a sample below (some of these support more aggregates, of course):
uuid count()
text count(), min(), max()
integer count(), min, max(),avg(),sum()
I've been thrashing around in the system catalogs and views, but haven't found what I'm after. (See "thrashing around.") I've poked at pg_type, pg_aggregate, pg_operator, and a few more.
Is there a straightforward way to start from a column type and gather all supported aggregates?
For background, I'm writing a client-side cross-tab code generator, and the UX is better when the tool automatically prevents you from selecting an aggregation that's not supported. I've hacked in some hard-coded rules for now, but would like to improve the system.
We're on Postgres 11.4.
A plain list of available aggregate functions can be based on pg_proc like this:
SELECT oid::regprocedure::text AS agg_func_plus_args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1;
Or with separate function name and arguments:
SELECT proname AS agg_func, pg_get_function_identity_arguments(oid) AS args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1, 2;
pg_proc.prokind replaces proisagg in Postgres 11. In Postgres 10 or older use:
...
WHERE proisagg
...
Related:
How to drop all of my functions in PostgreSQL?
How to get function parameter lists (so I can drop a function)
To get a list of available functions for every data type (your question), start with:
SELECT type_id::regtype::text, array_agg(proname) AS agg_functions
FROM (
SELECT proname, unnest(proargtypes::regtype[])::text AS type_id
FROM pg_proc
WHERE proisagg
ORDER BY 2, 1
) sub
GROUP BY type_id;
db<>fiddle here
Just a start. Some of the arguments are just "direct" (non-aggregated) (That's also why some functions are listed multiple times - due to those additional non-aggregate columns, example string_agg). And there are special cases for "ordered-set" and "hypothetical-set" aggregates. See the columns aggkind and aggnumdirectargs of the additional system catalog pg_aggregate. (You may want to exclude the exotic special cases for starters ...)
And many types have an implicit cast to one of the types listed by the query. Prominent example string_agg() works with varchar, too, but it's only listed for text above. You can extend the query with information from pg_cast to get the full picture.
Plus, some aggregates work for pseudo types "any", anyarray etc. You'll want to factor those in for every applicable data type.
The complication of multiple aliases for the same data type names can be eliminated easily, though: cast to regtype to get canonical names. Or use pg_typeof() which returns standard names. Related:
Type conversion. What do I do with a PostgreSQL OID value in libpq in C?
PostgreSQL syntax error in parameterized query on "date $1"
How do I translate PostgreSQL OID using python
Man, that is just stunning Thank you. The heat death of the universe will arrive before I could have figured that out. I had to tweak one line for PG 11 compatibility...says the guy who did not say what version he was on. I've reworked the query to get close to what I'm after and included a bit of output for the archives.
with aggregates as (
SELECT pro.proname aggregate_name,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregate_types
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro. prokind = 'a' -- I needed this for PG 11, I didn't say what version I was using.
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname),
-- The *super helpful* code above is _way_ past my skill level with Postgres. So, thrashing around a bit to get close to what I'm after.
-- First up, a CTE to sort everything by aggregation and then combine the types.
aggregate_summary as (
select aggregate_name,
array_agg(aggregate_types) as types_array
from aggregates
group by 1
order by 1)
-- Finally, the previous CTE is used to get the details and a count of the types.
select aggregate_name,
cardinality(types_array) as types_count, -- Couldn't get array_length to work here. ¯\_(ツ)_/¯
types_array
from aggregate_summary
limit 5;
And a bit of output:
aggregate_name types_count types_array
array_agg 2 {{anynonarray},{anyarray}}
avg 7 {{int8},{int4},{int2},{numeric},{float4},{float8},{interval}}
bit_and 4 {{int2},{int4},{int8},{bit}}
bit_or 4 {{int2},{int4},{int8},{bit}}
bool_and 1 {{bool}}
Still on my wish list are
Figuring out how to execute arrays (we aren't using array fields now, and only have a few places that we ever might. At that point, I don't expect we'll try and support pivots on arrays. tab tool
Getting all of the aliases for the various types. it seems like (?) int8, etc. can come through from pg_attribute in multiple ways. For example, timestamptz can come back from "timestamp with time zone".
These results are going to be consumed by client-side code and processed, so I don't need to get Postgres to figure everything out in one query, just enough for me to get the job done.
In any case, thanks very, very much.
There's the pg_proc catalog table, that lists all functions. The column proisagg marks aggregation functions and the column proargtypes holds an array of the OIDs of the argument types.
So for example to get a list of all aggregation functions with the names of their arguments' type you could use:
SELECT pro.proname aggregationfunctionname,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregationfunctionargumenttypes
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro.proisagg
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname;
Of course you may need to extend that, e.g. joining and respecting the schemas (pg_namespace) and check for compatible types in pg_type (have a look at the typcategory column for that), etc..
Edit:
I overlooked, that proisagg was removed in version 11 (I'm still mostly on a 9.6) as the other answers mentioned. So for the sake of completeness: As of version 11 replace WHERE pro.proisagg with WHERE pro.prokind = 'a'.
I've been playing around with the suggestions a bit, and want to post one adaptation based on one of Erwin's scripts:
select type_id::regtype::text as type_name,
array_agg(proname) as aggregate_names
from (
select proname,
unnest(proargtypes::regtype[])::text AS type_id
from pg_proc
where prokind = 'a'
order by 2, 1
) subquery
where type_id in ('"any"', 'bigint', 'boolean','citext','date','double precision','integer','interval','numeric','smallint',
'text','time with time zone','time without time zone','timestamp with time zone','timestamp without time zone')
group by type_id;
That brings back details on the types specified in the where clause. Not only is this useful for my current work, it's useful to my understanding generally. I've run into cases where I've had to recast something, like an integer to a double, to get it to work with an aggregate. So far, this has been pretty much trial and error. If you run the query above (or one like it), it's easier to see from the output where you need recasting between similar seeming types.

SQL Query for Search Page

I am working on a small project for an online databases course and i was wondering if you could help me out with a problem I am having.
I have a web page that is searching a movie database and retrieving specific columns using a movie initial input field, a number input field, and a code field. These will all be converted to strings and used as user input for the query.
Below is what i tried before:
select A.CD, A.INIT, A.NBR, A.STN, A.ST, A.CRET_ID, A.CMNT, A.DT
from MOVIE_ONE A
where A.INIT = :init
AND A.CD = :cd
AND A.NBR = :num
The way the page must search is in three different cases:
(initial and number)
(code)
(initial and number and code)
The cases have to be independent so if certain field are empty, but fulfill a certain case, the search goes through. It also must be in one query. I am stuck on how to implement the cases.
The parameters in the query are taken from the Java parameters in the method found in an SQLJ file.
If you could possibly provide some aid on how i can go about this problem, I'd greatly appreciate it!
Consider wrapping the equality expressions in NVL (synonymous to COALESCE) so if parameter inputs are blank, corresponding column is checked against itself. Also, be sure to kick the a-b-c table aliasing habit.
SELECT m.CD, m.INIT, m.NBR, m.STN, m.ST, m.CRET_ID, m.CMNT, m.DT
FROM MOVIE_ONE m
WHERE m.INIT = NVL(:init, m.INIT)
AND m.CD = NVL(:cd, m.CD)
AND m.NBR = COALESCE(:num, m.NBR)
To demonstrate, consider below DB2 fiddles where each case can be checked by adjusting value CTE parameters all running on same exact data.
Case 1
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(NULL)),
n(num) AS (VALUES(53)),
cte AS
...
Case 2
WITH
i(init) AS (VALUES(NULL)),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(NULL)),
cte AS
...
Case 3
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(53)),
cte AS
...
However, do be aware the fiddle runs a different SQL due to nature of data (i.e., double and dates). But query does reflect same concept with NVL matching expressions on both sides.
SELECT *
FROM cte, i, c, n
WHERE cte.mytype = NVL(i.init, cte.mytype)
AND YEAR(CAST(cte.mydate AS date)) = NVL(c.cd, YEAR(CAST(cte.mydate AS date)))
AND ROUND(cte.mynum, 0) = NVL(n.num, ROUND(cte.mynum, 0));

sql query to show a range and account for missing numbers

I have a SQL query
SELECT
Group_Id, MIN(Rec_Number) as RecStart, MAX(Rec_Number) AS RecEnd
FROM
Rec
WHERE
Group_Id != ''
GROUP BY
Group_Id
ORDER BY
Group_Id
This produces the following kind of results.
92-2274 9222740001 9222740004
92-2275 9222750001 9222750026
etc...
However if record 3 is missing (in the first row for instance) the query obviously doesn't account for it. What I am trying to do is the following
92-2274 9222740001 9222740002
92-2274 9222740004 9222740018
92-2275 9222750001 9222750016
92-2275 9222750018 9222750026
etc...
So essentially each time the script sees a record missing inside the group it starts a new line whilst staying inside the group before iterating on the next group. The group_Id is of course the first 6 digits of the rec_Number
I would also like to do this as well
92-2274 0001 0002
92-2274 0003 0004
Or even trim it to and remove the leading 0's as well if possible I know about using Right (Rec_Number, 4) however as this is a float the automatic convert to string seems to be messing something up as I get +009 is many columns so I assume I need to cast first or something. This particular function I could do it Excel after the fact I guess but I'm sure SQL could do it if the guy writing the query was a DBA and not a bumbling server admin (that's me!)
So is there a way of doing that in SQL also I must warn you that the standard CTE or using functions such as row number don't work as this is SQL Server 2000 - yes it is that old!
Hence me struggling to find posts on Stack Overflow that apply. Many of them start with the WITH keyword which means I can't use any of those to start with!
I think I am needing an IF ELse kind of block but I am not sure what kind of method I can use to get the query to create a new row each time it hits a missing concurrent number in the group range.
The final output will show me the ranges of records in each group whilst highlighting the missing ones via a new line each time.
For the second part, this should work :
RIGHT ( CAST ( MIN (Rec_Number) as Decimal(10)), 4)
It will only keep the last 4 characters of your number.

SQL trouble with JOIN on INSTR()

I've read lots of examples of JOIN syntax, but none of them work for my problem.
I have a table of quotes and a table of sales enquiries. One quote may result in multiple enquiries, or it may not result in any enquiries. I want to create a list of all quotes and include details of any resulting enquiry. I'm expecting multiple result rows for a quote that resulted in many enquiries. I'm expecting a single row with empty or null enquiries fields, for those quotes that didn't give rise to any enquiries.
The connecting data is in the quotes table, where there's a field called 'activity' that contains the id of any or all enquiries that resulted. It's updated each tim e anew enquiry comes in.
So I tried this:
SELECT q.*, e.id, e.price
FROM quotes as q
LEFT JOIN enquiries as e
ON INSTR(q.activity, e.id) >'0'
WHERE q.date > '2013-07-01'
But every row in my results includes enquiries data. I can't seem to get it to include the quotes that never resulted in anything. I thought LEFT JOIN was supposed to give me all of the quotes regardless of enquiries, and include the enquiries data where there was a match. But every example I've seen is just joining on a.id = b.id, so I suspect that my INSTR() match criteria might be messing things up.
As previous commentators have suggested the issue will be down to the join with Instr. The return value from INSTR of many RDBMSs is an integer value. When you therefore test the value of INSTR against '0' you won't get a match. Also, if Instr doesn't find a match you may get something else returned like MS Access where Null is a possible return value. This is obviously all speculation and we really need to see an example of your data and the issue to confirm if this is the actual problem. In the absence of any more info this is the best you are going to get:
Without knowing which DB you are using I've included a few links for INSTR:
MySql,
Oracle,
MS Access (returns variant Long),
SQL Server - No Instr Function - CharIndex
I think your problem might be somewhere else, because this seems to work fine for me. I assumed the list of enquiries was just a comma separated string. See http://sqlfiddle.com/#!4/71ce1/1
Get rid of the single quotes around the 0, but that doesn't make any difference. Also, you shouldn't be relying on the default date format, but using TO_DATE.You don't say what DBMS you're using, but I tried both Oracle and MySQL.

search criteria difference between Like vs Contains() in oracle

I created a table with two columns.I inserted two rows.
id name
1 narsi reddy
2 narei sia
one is simply number type and another one is CLOB type.So i decided to use indexing on that. I queried on that by using contains.
query:
select * from emp where contains(name,'%a%e%')>0
2 narei sia
I expected 2 would come,but not. But if i give same with like it's given what i wanted.
query:
select * from emp where name like '%a%e%'
ID NAME
1 (CLOB) narsi reddy
2 (CLOB) narei sia
2 rows selected
finally i understood that like is searching whole document or paragraph but contains is looking in words.
so how can i get required output?
LIKE and CONTAINS are fundamentally different methods for searching.
LIKE is a very simple string pattern matcher - it recognises two wildcards (%) and (_) which match zero-or-more, or exactly-one, character respectively. In your case, %a%e% matches two records in your table - it looks for zero or more characters followed by a, followed by zero or more characters followed by e, followed by zero or more characters. It is also very simplistic in its return value: it either returns "matched" or "not matched" - no shades of grey.
CONTAINS is a powerful search tool that uses a context index, which builds a kind of word tree which can be searched using the CONTAINS search syntax. It can be used to search for a single word, a combination of words, and has a rich syntax of its own, such as boolean operators (AND, NEAR, ACCUM). It is also more powerful in that instead of returning a simple "matched" or "not matched", it returns a "score", which can be used to rank results in order of relevance; e.g. CONTAINS(col, 'dog NEAR cat') will return a higher score for a document where those two words are both found close together.
I believe that your CONTAINS query is matching 'narei sia' because the pattern '%a%e%' matches the word 'narei'. It does not match against 'narsi reddy' because neither word, taken individually, matches the pattern.
I assume you want to use CONTAINS instead of LIKE for performance reasons. I am not by any means an expert on CONTAINS query expressions, but I don't see a simple way to do the exact search you want, since you are looking for letters that can be in the same word or different words, but must occur in a given order. I think it may be best to do a combination of the two techniques:
WHERE CONTAINS(name,'%a% AND %e%') > 0
AND name LIKE '%a%e%'
I think this would allow the text index to be used to find candidate matches (anything which has at least one word containing 'a' and at least one word containing 'e'). These would would then be filtered by the LIKE condition, enforcing the requirement that 'a' precede 'e' in the string.