Does anyone know what this regexp_replace do? - sql

I received this snippet from someone who doesn't work for my current company anymore, and I can't figure out what this regex do.
The objective for him was to scan through sql query strings and rearrange table info
regexp_replace(query, ' ("*?)(analytics_internal|arr|bizops_analytics|cloud_api|cloud_backend_raw|data_science|eloqua|fb_ads|google_ads|information_schema|intricately|legacy_sfdc|marketing_analytics|ns__analytics_postprocessing|ns__global_write|product_analytics|raw_bing_ads|raw_cloud_api|raw_compass|raw_coveo|raw_eloqua|raw_g_search_console|raw_gainsight|raw_google_ads|raw_intercom|raw_linkedin_ads|raw_mightysignal|raw_realm|raw_sfdc|remodel_cloud|remodel_test|sales_analytics|sales_ops|sampledb|segment|sfdc|ts_analytics|university_platform_analytics|upstream_gainsight|usage|xform_cloud|xform_etl|xform_finance|xform_marketing|xform_reference|xform_sales|xform_tables)("*?)\.("*?)(.+?)("*?)(\s|$)', ' awsdatacatalog.$1$2$3.$4$5$6$7') as queryString

The regex you've provided attempts to match any of the many provided strings in the second capture group and modify that part of the query to be prefixed with awsdatacatalog.. This is most likely an attempt to modify queries to occur on a new database, in particular a database named awsdatacatalog. For example, the consider the following query:
SELECT * FROM "analytics_internal".foo.table
Your regex_replace should produce a new query that looks like
SELECT * FROM awsdatacatalog."analytics_internal".foo.table

Related

Regex pattern to identify column names in an SQL WHERE clause

I am looking for resources on how to build a regex pattern to match column names from an SQL WHERE clause.
I have the SQL SELECT statement:
SELECT * FROM test WHERE area = 'testarea' AND description = 'testdescription' AND ...
And I'm trying to extract the terms area and description, and any others following ANDs. I understand this can get more complicated as the WHERE clause gets more complicated, but for now, I'm assuming that it conforms to the structure in the example.
When I try to search the web for examples on how to do this, I only see examples of how to include regex in the WHERE clause, but not actually match against it.
Can someone help me get started here? I'm at a loss.

How can I find the second value in a comma separated list (array) in PostgreSQL?

I am trying to get the second value in field to populate a column in a query.
Example:
I was able to accomplish this using MS Access and the SQL statement looks like this:
SELECT stone_schedules.mach_equip_id,
stone_schedules.timeline,
IIf(InStr(1,Mid([timeline],InStr(1,[timeline],",")+1),",")=0,0,Mid(Mid([timeline],InStr(1,[timeline],",")+1),1,InStr(1,Mid([timeline],InStr(1,[timeline],",")+1),",")-1)) AS NextSchedEntry_id
FROM stone_schedules
WHERE (((stone_schedules.active)<>"0"));
The problem with running this in Access and not on the server is that it runs too slow and server side is able to run this a lot quicker. This is what I have so far server side in pgAdmin:
SELECT
schedules.mach_equip_id,
schedules.timeline,
--select function where I need help
FROM
stone.schedules
WHERE
schedules.active = true;
Thanks
Well I was able to find an answer that is easier than the solution from ms access and my guess is probably less expensive.
When values are separated like that and inside the curly braces, it is an array.
I was able to select the 2nd value of the array by using the following sql statement.
SELECT
schedules.mach_equip_id,
schedules.timeline,
schedules.timeline[2] -- using brackets made it possible to select the 2nd value
FROM
stone.schedules
WHERE
schedules.active = true;
That was easier than I thought it would be.

DB2 complex like

I have to write a select statement following the following pattern:
[A-Z][0-9][0-9][0-9][0-9][A-Z][0-9][0-9][0-9][0-9][0-9]
The only thing I'm sure of is that the first A-Z WILL be there. All the rest is optional and the optional part is the problem. I don't really know how I could do that.
Some example data:
B/0765/E 3
B/0765/E3
B/0764/A /02
B/0749/K
B/0768/
B/0784//02
B/0807/
My guess is that I best remove al the white spaces and the / in the data and then execute the select statement. But I'm having some problems writing the like pattern actually.. Anyone that could help me out?
The underlying reason for this is that I'm migrating a database. In the old database the values are just in 1 field but in the new one they are splitted into several fields but I first have to write a "control script" to know what records in the old database are not correct.
Even the following isn't working:
where someColumn LIKE '[a-zA-Z]%';
You can use Regular Expression via xQuery to define this pattern. There are many question in StackOverFlow that talk about patterns in DB2, and they have been solved with Regular Expressions.
DB2: find field value where first character is a lower case letter
Emulate REGEXP like behaviour in SQL

SQL wildcard issue

I have a database which can be modified by our users through an interface. For one field (companyID) they should have the ability to place an asterisk in the string as a wildcard character.
For example, they can put in G378* to stand for any companyID starting with G378.
Now on my client program I'm providing a "full" companyID as a parameter:
SELECT * FROM table WHERE companyID = '" + myCompanyID + "'
But I have to check for the wildcard, is there anything I can add to my query to check for this. I'm not sure how to explain it but it's kinda backwards from what I'm used to. Can I modify the value I provide (the full companyID) to match the wildcard value from in the query itself??
I hope this maked sense.
Thanks!
EDIT: The user is not using SELECT. The user is only using INSERT or UPDATE and THEY are the ones placing the * in the field. My program is using SELECT and I only have the full companyID (no asterisk).
This is a classic SQL Injection target! You should be glad that you found it now.
Back to your problem, when users enter '*', replace it with '%', and use LIKE instead of = in your query.
For example, when end-users enter "US*123", run this query:
SELECT * FROM table WHERE companyID LIKE #companyIdTemplate
set #companyIdTemplate parameter to "US%123", and run the query.
I used .NET's # in the example, but query parameters are denoted in ways specific to your hosting language. For example, they become ? in Java. Check any DB programming tutorial on use of parameterized queries to find out how it's done in your system.
EDIT : If you would like to perform an insert based on a wildcard that specifies records in another table, you can do an insert-from-select, like this:
INSERT INTO CompanyNotes (CompanyId, Note)
SELECT c.companyId, #NoteText
FROM Company c
WHERE c.companyId LIKE 'G378%'
This will insert a record with the value of the #NoteText parameter into CompanyNotes table for each company with the ID matching "G378%".
in TSQL I would use replace and like. ie:
select * from table where companyid like replace(mycompanyid,'*','%');
This is somewhat implementation dependant and you did not mention which type of SQL you are dealing with. However, looking at MS SQL Server wildcards include % (for any number of characters) or _ (for a single character). Wildcards are only evaluated as wildcards when used with "like" and not an = comparison. But you can pass in a paramater that includes a wildcard and have it evaluated as a wildcard as long as you are using "like"

SQL Contains - only match at start

For some reason I cannot find the answer on Google! But with the SQL contains function how can I tell it to start at the beginning of a string, I.e I am looking for the full-text equivalent to
LIKE 'some_term%'.
I know I can use like, but since I already have the full-text index set up, AND the table is expected to have thousands of rows, I would prefer to use Contains.
Thanks!
You want something like this:
Rather than specify multiple terms, you can use a 'prefix term' if the
terms begin with the same characters. To use a prefix term, specify
the beginning characters, then add an asterisk (*) wildcard to the end
of the term. Enclose the prefix term in double quotes. The following
statement returns the same results as the previous one.
-- Search for all terms that begin with 'storm'
SELECT StormID, StormHead, StormBody FROM StormyWeather
WHERE CONTAINS(StormHead, '"storm*"')
http://www.simple-talk.com/sql/learn-sql-server/full-text-indexing-workbench/
You can use CONTAINS with a LIKE subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS('"Alice in wonderland"')
) AS S1
WHERE S1.edition LIKE 'Alice in wonderland%'
This way, the slow LIKE query will be run against a smaller set
The only solution I can think of it to actually prepend a unique word to the beginning of every field in the table.
e.g. Update every row so that 'xfirstword ' appears at the start of the text (e.g. Field1). Then you can search for CONTAINS(Field1, 'NEAR ((xfirstword, "TERM*"),0)')
Pretty crappy solution, especially as we know that the full text index stores the actual position of each word in the text (see this link for details: http://msdn.microsoft.com/en-us/library/ms142551.aspx)
I am facing the similar issue. This is what I have implemented as a work around.
I have made another table and pulled only the rows like 'some_term%'.
Now, on this new table I have implemented the FullText search.
Please do inform me if you tried some other better approach