SQL find-and-replace regular-expression capturing-group limit? - sql

I need to convert data from a spreadsheet into insert statements in SQL. I've worked out most of the regular expressions for using the find and replace tool in SSMS, but I'm running into an issue when trying to reference the 9th parenthesized item in my final replace.
Here is the original record:
Blue Doe 12/21/1967 1126 Queens Highway Torrance CA 90802 N 1/1/2012
And this is what I need (for now):
select 'Blue','Doe','19671221','1126 Queens Highway','Torrance','CA','90802','N','20120101'
Due to limitations on the number of parenthesized items allowed I have to run through the replace three times. This may work into a stored procedure if I can make first make this work as a POC.
This is the first matching expression:
^{:w:b:w:b}{:z}/{:z}/{:z:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b}{:z}/{:z}/{:z}
And the replace: \10\2/0\3/\40\5/0\6/\7
This adds zeros to the months and days so that they have at least two characters.
The next match reformats the dates into the format required in the query (no comments about not using a date field. This is a client requirement for the database).
Matching expression:
^{:w:b:w:b}[0-9]*{[0-9]^2}/[0-9]*{[0-9]^2}/{:z}{:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b}[0-9]*{[0-9]^2}/[0-9]*{[0-9]^2}/{:z}
And the replace: \1\4\(2,2)\(2,3)\5\8\(2,6)\(2,7)
Finally, the final match inserts the results into the SQL statement that will get used in an insert statement.
Matching expression:
^{:w}:b{:w}:b{:z}:b{[0-9A-Za-z:b]+}:b{:w}:b{[A-Z]+}:b{:z}:b{:w}:b{:z}
And the replace: select '\1','\2','\3','\4','\5','\6','\7','\8','\9'
It all works except the last replacement. For some reason the \9 is NOT getting the data from the match. If I just replace the whole replace expression with \9 I get a blank space. If I use \8, I get N. If I eliminate the 8th parenthesized item, thus making my 9th item eighth, it returns what I want, 20120101.
So my question is, does SSMS / SQL allow for 9 tagged expressions when using find / replace and regular expressions? Or am I missing something here? I know there are other ways to do this. I'm just trying to get it done quickly as a POC before we move this into a sproc or application.
Thanks for any assistance.
-Peter

None of your matching expressions work with the record you provided in my MS SQL Server Management Studio 2008r2.
From your description it sounds like there is an issue with the Tagged Expression 9 since the desired result is returned when using Tagged Expression 8, but not 9. You may want to ask Microsoft or report it as a bug.
A quicker solution would be to move the text you are performing the Find/Replace on in SSMS to a spread sheet and use cell formulas to parse the data into insert commands. If you have MS Excel the CONCATENATE, FIND, and MID functions will probably be useful. Also, it helps to split the values into their own columns so you can format the date, then use one concatenate to build your insert.
Please let me know if you need an example.
Update: I tried your example in MS SQL Server Management Studio 2008r2, Visual Studio 2005, and Visual Studio 2010 with the same result you get, \9 returns an empty string. Checking around I found that others are also having this issue (see the community content from Henrique Evaristo) and that the whole system has been replaced in the new editors.
So in answer to your question, SSMS does not support 9 tagged expressions due to a bug.
If you are unable to use the Spreadsheet idea you could try splitting the action into two parts, setting the first 8 values, then swinging back again to do the last. For example:
^{:w}:b{:w}:b{:z}:b{[0-9A-Za-z:b]+}:b{:w}:b{[A-Z]+}:b{:z}:b{:w}:b:z
select '\1','\2','\3','\4','\5','\6','\7','\8','\0'
:w:b:w:b:z:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b{:z}
\1

Related

Excel data type issues

I am using MS query to pull data from sql server and all is good.
Problem starts when data comes from the server I am stuck with data type general for everything, and no way to change the data type in excel.
Main issue is numbers, where in database datatype is decimal yet i can do no calculations on it in excel. Any help would be appreciated.
I am using excel to execute a stored procedure on server
This pulls the data into the following table
Even though the data in the sql server for column price is formatted as decimal it becomes a general data type after getting to excel.
Changing it to number/currency etc. does not change anything.
Also no errors appear. Simply data comes down and no matter what changes in excel I apply nothing changes it all is treated as text.
You can do these things.
Select Column
Click Data-> Text to Columns
Follow the wizard
Set the format
Use this official support ticket from Microsoft
Problem in this case was created by myself.
But I suppose it could easily happen to others who are just starting on their path with sql and excel.
Here is what happened as I established after few days of going in circles.
as there was load of trailing spaces in the data coming down from the server I have decided to tidy things up.
Without considerring implications I have stuck an RTRIM() on everything.
This caused excel to treat everything as strings as string RTRIM is a built in string function.
What made things worse is the fact that when using power query I was able to transform the data to the desired, formats.
Unfortunately MS query does not seem to be quite as clever as power query hence the issues.

DB2 SQL z/OS - variable equivalent of a hex constant

I'm trying to extract data (using SPUFI) from a DB2 table to a file, with one of the output fields converting a decimal field to the same format as a COBOL comp field.
So e.g. today's date (20141007) would be ..ëõ
The SQL HEX function converts 20141007 to 013353CF, and doing a SELECT of x'013353CF' gives me the desired result, but obviously that's a constant, I'm trying to find an equivalent function.
Basically an inverse of the HEX function.
I've come across a couple of suggestions using user defined functions. Problem is, we've only recently upgraded to DB2 10 and new function mode isn't enabled yet, which means I don't have access to any control functions in a UDF.
I suspect I'm out of luck, but wondering if anyone has any suggestions.
I appreciate this is completely the wrong tool for the job, and would be easier to just write a COBOL program to do it, but various constraints are preventing that. I'm limited to just SQL functions and possibly JCL).
I thought I had a solution using a recursive UDF to get around the lack of control functions, but that's not allowed either.

SQL SUBSTR error: What is the correct syntax?

Brief Description of the app:
I have written a Delphi application that allows a user to run a query over either a MySQL database, or a DB2 database. The application uses a TADOQuery component to run the query.
The application uses a simple interface to build the query string, allowing users with no knowledge of SQL to build queries. At no point does the user see any SQL - everything is in plain English so that even non-technical users can understand what they are doing.
The applicatione examines the parameters that the user entered using the query building interface and builds the SQL statement in the background, submitting it without the user actually seeing the SQL itself.
Problem:
Some of the queries use substrings to retrieve data from within certain fields. When I use the SUBSTR statement, I'm not adding spaces after the commas within the SUBSTR statement. For example, SUBSTR(field,1,10).
This is fine most of the time, but when the locale on the PC is set to a different locale from English (e.g. Dutch, changed via the Regional Settings applet in the Windows Control Panel), the SUBSTR statement in this form fails when running over a DB2 database (it seems fine over MySQL).
In order to get the SUBSTR to execute properly in that particular locale, I need to add spaces after the commas. For example, SUBSTR(field, 1, 10).
Searching for the correct syntax for the SUBSTR statement shows examples both with and without commas, although obviously I've found problems when I've not included commas, so I'd be inclined to go with the version with spaces. However, what I want to know is whether or not this is the definitive syntax, whether or not I'll get any problems using SUBSTR in this way, and as a bonus, why I get the error when I don't use spaces after the commas in the first place.
The proper way is with or without space. Spaces are optional and not parsed, you can even have 10 spaces after comma and 3 before if you like (just arbitrary numbers).
The reason why SUBSTR(field,1,10) doesn't work in some locales is because of the part I highlighted. In European countries, the decimal sign is comma, not period. By putting a space and making it SUBSTR(field, 1, 10), the 1, 10 is very clearly split into two parameters so there is no longer any confusion.

SQL Query and Unicode Issue

I have a really weird issue with Sql queries on unicode data. Here's what I've got:
Sql Server Express 2008 R2 AS
Table containing chinese characters/words/phrases (100,000 rows)
When I run the following, I get the correct row + 36 other rows returned... when it should only be the one row:
SELECT TOP 1000 [ID]
,[MyChineseColumn]
,UNICODE([MyChineseColumn])
FROM [dbo].[MyTableName]
WHERE [MyChineseColumn]= N'㐅'
As you'd expect, the row with 㐅 is returned, but also the following: 〇, 宁, 㮸 and a bunch of others...
Anyone have any ideas what is going on here? This has really got me confused and I am not sure how to solve this one (tried "Googling" already)...
Thanks
Please check the column is using an appropriate Chinese collation as that will determine the semantics used in this type of comparison.
You may want to try and use a binary collation, these characters seem to be somehow matched as identical (possibly by ignoring case and/or accents, depending on the used collation).

Excel to SQL direct import error

Working for a considerable time on cracking some sales data, I came across an error which started to bug me for so real, eating my time of work. After so much of an effort, I was so fed up and nearly to give up on un-importable records.
The scenario:
Bulk sales data comes on txt/csv format needs to be imported to SQL database and then matched with Address History information available on a combination of tables by verifying strings directly from field to field.
If codes matched, need to run a script to update few tables with data. If not matched, need to insert a whole bunch of data in to different tables to create ID which requires for the final sales import.
Most of the was matching, except for few which was giving the trouble. I just needed to import those to history tables. Then started the problems, even though, I updated them, i couldn't match them.
After some much of frustrated hours, I just asked my girl-friend to check when there any error in the string, I worked with.
The string is "Bramhall Stockport" to be matched to "Bramhall   Stockport". For SQL script, these two strings are not matching.
I bet if you copy and paste on your table this would match, coz now this is txt format.
Then, Ana figured the error (She is not a computer geek, Architecture Masters), by simply coping and pasting on Microsoft Word 2007.
Screenshot: http://www.contentbcc.com/Anushka/sql_xls.png
Do you see the difference? First is in the txt/csv file and second on the SQL table.
In the first one, you have three regular spaces (ascii 20). In the second one, you have a regular space followed by a non-breaking space, (unicode 0xA0). In excel you can do a search and replace with ALT+0160 as the search and a space character as the replacement to fix it.