SQL exclusion regexp does not work, why? - sql

I have some sentences in db. I want to select the ones that don't have urls in them.
So what I do is
select ID, SENTENCE
from SENTENCE_TABLE
where regexp_like(SENTENCE, '[^http]');
However after the query is executed the sentences that appear in the results pane still have urls. I tried a lot of other combinations without any success.
Can somebody explain or give a good link where it is explained how regexps actually work in SQL.
How can I filter(exclude) actual words in db with SQL query?

You're over-complicating this. Just use a standard LIKE.
select ID, SENTENCE
from SENTENCE_TABLE
where SENTENCE not like '%http%';
regexp_like(SENTENCE, '[^http]') will match everything but h, t and p separately. I like the PSOUG page on regular expressions in Oracle but I would also recommend reading the documentation.
To respond to your comment you can use REGEXP_LIKE, there's just no point.
select ID, SENTENCE
from SENTENCE_TABLE
where not regexp_like(SENTENCE, 'http');
This looks for the string http rather than the letters individually.

[^http] would match any character except h or t or t or p..So this would match any string that doesn't contain h or t or t or p anywhere in the string
It should be where not regexp_like(SENTENCE, '^http');..this would match anything that doesn`t start with http

Related

Does anyone know what this regexp_replace do?

I received this snippet from someone who doesn't work for my current company anymore, and I can't figure out what this regex do.
The objective for him was to scan through sql query strings and rearrange table info
regexp_replace(query, ' ("*?)(analytics_internal|arr|bizops_analytics|cloud_api|cloud_backend_raw|data_science|eloqua|fb_ads|google_ads|information_schema|intricately|legacy_sfdc|marketing_analytics|ns__analytics_postprocessing|ns__global_write|product_analytics|raw_bing_ads|raw_cloud_api|raw_compass|raw_coveo|raw_eloqua|raw_g_search_console|raw_gainsight|raw_google_ads|raw_intercom|raw_linkedin_ads|raw_mightysignal|raw_realm|raw_sfdc|remodel_cloud|remodel_test|sales_analytics|sales_ops|sampledb|segment|sfdc|ts_analytics|university_platform_analytics|upstream_gainsight|usage|xform_cloud|xform_etl|xform_finance|xform_marketing|xform_reference|xform_sales|xform_tables)("*?)\.("*?)(.+?)("*?)(\s|$)', ' awsdatacatalog.$1$2$3.$4$5$6$7') as queryString
The regex you've provided attempts to match any of the many provided strings in the second capture group and modify that part of the query to be prefixed with awsdatacatalog.. This is most likely an attempt to modify queries to occur on a new database, in particular a database named awsdatacatalog. For example, the consider the following query:
SELECT * FROM "analytics_internal".foo.table
Your regex_replace should produce a new query that looks like
SELECT * FROM awsdatacatalog."analytics_internal".foo.table

Regex pattern to identify column names in an SQL WHERE clause

I am looking for resources on how to build a regex pattern to match column names from an SQL WHERE clause.
I have the SQL SELECT statement:
SELECT * FROM test WHERE area = 'testarea' AND description = 'testdescription' AND ...
And I'm trying to extract the terms area and description, and any others following ANDs. I understand this can get more complicated as the WHERE clause gets more complicated, but for now, I'm assuming that it conforms to the structure in the example.
When I try to search the web for examples on how to do this, I only see examples of how to include regex in the WHERE clause, but not actually match against it.
Can someone help me get started here? I'm at a loss.

DB2 complex like

I have to write a select statement following the following pattern:
[A-Z][0-9][0-9][0-9][0-9][A-Z][0-9][0-9][0-9][0-9][0-9]
The only thing I'm sure of is that the first A-Z WILL be there. All the rest is optional and the optional part is the problem. I don't really know how I could do that.
Some example data:
B/0765/E 3
B/0765/E3
B/0764/A /02
B/0749/K
B/0768/
B/0784//02
B/0807/
My guess is that I best remove al the white spaces and the / in the data and then execute the select statement. But I'm having some problems writing the like pattern actually.. Anyone that could help me out?
The underlying reason for this is that I'm migrating a database. In the old database the values are just in 1 field but in the new one they are splitted into several fields but I first have to write a "control script" to know what records in the old database are not correct.
Even the following isn't working:
where someColumn LIKE '[a-zA-Z]%';
You can use Regular Expression via xQuery to define this pattern. There are many question in StackOverFlow that talk about patterns in DB2, and they have been solved with Regular Expressions.
DB2: find field value where first character is a lower case letter
Emulate REGEXP like behaviour in SQL

Issue using wildcards in index with multiple start points

With multiple start points, if i provide the full name (value) of the key 'Name' the cypher query works.
This Works:
start n=node:na('NAME:("JERI, MICHAEL M", "ANDREW, TONNA", "JILLSO, DAVID")')
return n.NAME
Say, if i wish to use wildcards on Name key, something like this:
start n=node:na('NAME:("JERI*", "ANDREW*", "JILLSO*")')
return n.NAME
This doesn't work. It gives me zero rows.
It would be great if someone could help me with the correct way to achieve this.
I think this may be due to the double quotes, making Lucene query parser 3.6.2 (used in Neo4j 1.9) parse the terms as phrases instead of single terms. And wildcards are only supported for single terms, not phrases.

SQL Contains - only match at start

For some reason I cannot find the answer on Google! But with the SQL contains function how can I tell it to start at the beginning of a string, I.e I am looking for the full-text equivalent to
LIKE 'some_term%'.
I know I can use like, but since I already have the full-text index set up, AND the table is expected to have thousands of rows, I would prefer to use Contains.
Thanks!
You want something like this:
Rather than specify multiple terms, you can use a 'prefix term' if the
terms begin with the same characters. To use a prefix term, specify
the beginning characters, then add an asterisk (*) wildcard to the end
of the term. Enclose the prefix term in double quotes. The following
statement returns the same results as the previous one.
-- Search for all terms that begin with 'storm'
SELECT StormID, StormHead, StormBody FROM StormyWeather
WHERE CONTAINS(StormHead, '"storm*"')
http://www.simple-talk.com/sql/learn-sql-server/full-text-indexing-workbench/
You can use CONTAINS with a LIKE subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS('"Alice in wonderland"')
) AS S1
WHERE S1.edition LIKE 'Alice in wonderland%'
This way, the slow LIKE query will be run against a smaller set
The only solution I can think of it to actually prepend a unique word to the beginning of every field in the table.
e.g. Update every row so that 'xfirstword ' appears at the start of the text (e.g. Field1). Then you can search for CONTAINS(Field1, 'NEAR ((xfirstword, "TERM*"),0)')
Pretty crappy solution, especially as we know that the full text index stores the actual position of each word in the text (see this link for details: http://msdn.microsoft.com/en-us/library/ms142551.aspx)
I am facing the similar issue. This is what I have implemented as a work around.
I have made another table and pulled only the rows like 'some_term%'.
Now, on this new table I have implemented the FullText search.
Please do inform me if you tried some other better approach