Isolate SQL field using regex - sql

I'm trying to isolate a specific field in a SQL dump file so I can edit it but I'm not having any luck.
The regex I'm using is:
^(?:(?:'[^\r\n']*'|[^,\r\n]*),){6}('[^\r\n']*'|[^,\r\n]*)
Which is supposed to grab the seventh field and place it inside reference 1.
The trouble is that this is stumbling when ever it finds a comma inside a text field and counts the partial match as the allowable matches.
Eg. (1, 'Title', 1, 3, '2006-09-29', 'Commas, the bane of my regex', 'This is the target', 2, 4) matches " the bane of my regex'" instead of "'This is the target'".

It might be easier to load the SQL into a temp database and then do a SELECT to get the data in that field.

Do you have control over the dump file, or are they historic or outside of your control?
If you can choose a better delimeter, comma really is a terrible choice.

[^,\r\n]*, matches
'Commas,
I suggest [^,\r\n']*, instead.

I think you will have more luck if you make the regex more specific. I havent tested this but I believe this should work.
Also as Paul suggests you might try a different delimiter to make this easier.
Enjoy!
\d{1,4}(,){1}('){1}[a-zA-Z0-9,]+('){1}\d{1,4}(,){1}\d{1,4}(,){1}('){1}[0-9-]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}\d{1,4}(,){1}\d{1,4}(\r\n){1}

Doh!
My fields weren't just split with a comma. They were split with a comma followed by a space.
Correct RegEx is
^(?:(?:'[^\r\n']*'|[^,\r\n]*), ){6}('[^\r\n']*'|[^,\r\n]*)
Now it works.
Sorry to waste you time with this one. It was Beta's response that got me thinking as it was the second alternation in play for all fields. The extra space forced it to use this option rather than the option enclosed within quotes.

Related

Splitting variable content in SQL

I have a variable in a stored procedure that contains a string of characters like
[Tag]MESSAGE[/Tag]
I need a way to get the MESSAGE part from within the tags.
Any help would be much appreciated
Note: I have tested it on Oracle RDBMS
A more reliable approach is to use REGEXP_REPLACE.
REGEXP_REPLACE(value, pattern)
Example
SELECT REGEXP_REPLACE(
'<Tag>Message</Tag>',
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*') FROM DUAL;
Just replace "<" with "[" if your tags are different
What you need is this:
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
You'll require to change the html_tag and html_close_tag with your own HTML tag that you want to get rid of.
If the column contains only single tag, simple call of substring function should be enough. Otherwise there will always be some point where regular expression does not suffice since you fall into trap (see this legendary StackOverflow answer).

Searching for multuple valeus using REGEX

I have a big sporadic sql scripts and need to find and replace a few values in it. I am trying to pass my values in REGEX to Notepad++ but I can't seem to make it work. To be more specific, I have around 50 script, each with 5000 lines, and I need to look for a list of values, e.g. "[dbo].[livesales]" "[dbo].[CreditCards]" in all my scripts separately. I undertand that I need either run this separately against each script or merger them all into one file, but I need the proper REGEX command for it. I need to include square bracket and dots as well. I end up building this but it doesn't work for me:
^(?=.*\b[dbo].[LiveSales]\b)(?=.*\b[dbo].[CreditCards]\b).+$
enter image description here
thanks in advance,
I wouldn't bother using word boundaries, as square brackets in SQL Server are pretty ubiquitous for database object names (e.g. database and column names). I suggest the following pattern:
\[dbo\]\.\[(?:LiveSales|CreditCards)\]
Demo
The major changes I have made include not using word boundaries, escaping the [ and ] brackets (since square bracket is a regex metacharacter with a special meaning), and also not try to match the entire input. Presumably you want to find all such occurrences, and so don't bother trying to scope your pattern with ^ and $.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

Replace all occurrences of a substring in a database text field

I have a database that has around 10k records and some of them contain HTML characters which I would like to replace.
For example I can find all occurrences:
SELECT * FROM TABLE
WHERE TEXTFIELD LIKE '%&#47%'
the original string example:
this is the cool mega string that contains &#47
how to replace all &#47 with / ?
The end result should be:
this is the cool mega string that contains /
If you want to replace a specific string with another string or transformation of that string, you could use the "replace" function in postgresql. For instance, to replace all occurances of "cat" with "dog" in the column "myfield", you would do:
UPDATE tablename
SET myfield = replace(myfield,"cat", "dog")
You could add a WHERE clause or any other logic as you see fit.
Alternatively, if you are trying to convert HTML entities, ASCII characters, or between various encoding schemes, postgre has functions for that as well. Postgresql String Functions.
The answer given by #davesnitty will work, but you need to think very carefully about whether the text pattern you're replacing could appear embedded in a longer pattern you don't want to modify. Otherwise you'll find someone's nooking a fire, and that's just weird.
If possible, use a suitable dedicated tool for what you're un-escaping. Got URLEncoded text? use a url decoder. Got XML entities? Process them though an XSLT stylesheet in text mode output. etc. These are usually safer for your data than hacking it with find-and-replace, in that find and replace often has unfortunate side effects if not applied very carefully, as noted above.
It's possible you may want to use a regular expression. They are not a universal solution to all problems but are really handy for some jobs.
If you want to unconditionally replace all instances of "&#47" with "/", you don't need a regexp.
If you want to replace "&#47" but not "&#471", you might need a regexp, because you can do things like match only whole words, match various patterns, specify min/max runs of digits, etc.
In the PostgreSQL string functions and operators documentation you'll find the regexp_replace function, which will let you apply a regexp during an UPDATE statement.
To be able to say much more I'd need to know what your real data is and what you're really trying to do.
If you don't have postgres, you can export all database to a sql file, replace your string with a text editor and delete your db on your host, and re-import your new db
PS: be careful

Removing extraneous characters in column using T-SQL

I am attempting to remove extraneous characters from data in a primary key column..the data in this column serves as a control number, and the extra characters are preventing a Web application from effectively interacting with the data.
As an example, one row may look like this:
ocm03204415 820302
I want to remove everything after the space...so the characters '820302'. I could manually do it, but, there are around 2,000 records that have these extra values in the column. It would be great if I could remove them programmatically. I can't do a simple Replace because the characters have no pattern...I couldn't define a rule to discover them...the only thing uniform is the space...although, now that I look at the data set, they do all start with 8.
Is there a way I could remove these characters programmatically? I am familiar with PL/SQL in the Oracle environment, and was wondering if Transactional SQL would offer some possibilities in the MS-SQL environment?
Thanks so much.
You may want to look into the CHARINDEX function to find the space. Then you can use SUBSTRING to grab everything up to the space in a single UPDATE statement.
Try this:
UPDATE YourTable
SET YourColumn = LEFT(YourColumn,CHARINDEX(' ',YourColumn)-1)
WHERE CHARINDEX(' ',YourColumn) > 1