SSIS 2008 - how to do regular expression search in SSIS Derived Column tool - sql

How do I do a regular expression in an SSIS Derived Column Tool
i.e.
I have string in the format XXXNNNN and I want to filter our those strings not in this format using an SSIS Derived Column Tool.
i.e
ABC1234 is ok
ABCDEFG is not.

The Derived Column transformation doesn't support regular expressions, so you'll have to look at some other options:
Use a Script Task and write the regex using the standard .NET regex features
Use a third-party component
If you always have 7 characters, you could use the SUBSTRING and CODEPOINT functions to check that each one is in the range you expect (see the function reference). But that's probably awkward to read and maintain, and may not be practical at all depending on what your data looks like.

Related

How to extract digits from field using regex

I am using Firebird 2.5 and I have a field (called identifier) with mixed letters, numbers and special characters. I would like to use regex to extract only the numbers in a new column. I have tried something like below, but it is not working.
Any idea how I can achieve this using regex without using stored procedures or execute block
SELECT ORDER_ID,
ORDER_DATE,
SUBSTRING(IDENTIFIER FROM 1 TO 10) SIMILAR TO '^[0-9]{10}$' --- DESIRED EXTRACTION COLUMN
FROM ORDERS
Example of data
IDENTIFIER DESIRED OUTPUT
ANDRE 02869567995 02869567995
02869567995 MARIA 02869567995
028.695.67.995 02869567995
028695679-95 02869567995
You cannot do this in Firebird 2.5, at least not without help from a UDF, or a (selectable) stored procedure. I'm not aware of third-party UDFs providing regular expressions, so you might have to write this yourself.
In Firebird 3.0, you could also use a UDR or stored function to achieve this. Unfortunately, using the regular expression functionality available in Firebird alone will not be enough to solve this.
NOTE: The rest of the answer is based on the assumption to extract digits if the first 10 characters of string are digits. With the updated question, this assumption is no longer valid.
That said, if your need is exactly as shown in your question, that is only extract the first 10 characters from a string if they are all digits, then you could use:
case
when IDENTIFIER similar to '[[:DIGIT:]]{10}%'
then substring(IDENTIFIER from 1 for 10)
end
(as an aside, the positional SUBSTRING syntax is from <start> for <length>, not from <start> to <end>)
In Firebird 3.0 and higher, you can use SUBSTRING(... SIMILAR ...) with a SQL regular expression pattern. Assuming you want to extract 10 digits from the start of a string, you can do:
substring(IDENTIFIER similar '#"[[:DIGIT:]]{10}#"%' escape '#')
The #" delimits the pattern to extract (where # is a custom escape character as specified in the ESCAPE clause). The remainder of the pattern must match the rest of the string, hence the use of % here (in other cases, you may need to specify a pattern before the first #" as well.
See this dbfiddle for an example.
It is not possible in any version of Firebird.

Replace all occurrences of a substring in a database text field

I have a database that has around 10k records and some of them contain HTML characters which I would like to replace.
For example I can find all occurrences:
SELECT * FROM TABLE
WHERE TEXTFIELD LIKE '%&#47%'
the original string example:
this is the cool mega string that contains &#47
how to replace all &#47 with / ?
The end result should be:
this is the cool mega string that contains /
If you want to replace a specific string with another string or transformation of that string, you could use the "replace" function in postgresql. For instance, to replace all occurances of "cat" with "dog" in the column "myfield", you would do:
UPDATE tablename
SET myfield = replace(myfield,"cat", "dog")
You could add a WHERE clause or any other logic as you see fit.
Alternatively, if you are trying to convert HTML entities, ASCII characters, or between various encoding schemes, postgre has functions for that as well. Postgresql String Functions.
The answer given by #davesnitty will work, but you need to think very carefully about whether the text pattern you're replacing could appear embedded in a longer pattern you don't want to modify. Otherwise you'll find someone's nooking a fire, and that's just weird.
If possible, use a suitable dedicated tool for what you're un-escaping. Got URLEncoded text? use a url decoder. Got XML entities? Process them though an XSLT stylesheet in text mode output. etc. These are usually safer for your data than hacking it with find-and-replace, in that find and replace often has unfortunate side effects if not applied very carefully, as noted above.
It's possible you may want to use a regular expression. They are not a universal solution to all problems but are really handy for some jobs.
If you want to unconditionally replace all instances of "&#47" with "/", you don't need a regexp.
If you want to replace "&#47" but not "&#471", you might need a regexp, because you can do things like match only whole words, match various patterns, specify min/max runs of digits, etc.
In the PostgreSQL string functions and operators documentation you'll find the regexp_replace function, which will let you apply a regexp during an UPDATE statement.
To be able to say much more I'd need to know what your real data is and what you're really trying to do.
If you don't have postgres, you can export all database to a sql file, replace your string with a text editor and delete your db on your host, and re-import your new db
PS: be careful

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.

SQL Server 2005 seperate stored procedure CSV value into multiple columns?

I'm a SQL Server 2005 newb and I have to do something that should be easy but I'm having difficulties.
For the time being my stored procedure is storing csv in one column in my db. I need to break apart the csv into multiple columns. The number of values in a the CSV parameter is static, there is always gonna be 8 values and they are always going to be in the same order. Here is an example of the data:
req,3NxWZ7RYQVsC4chw3BMeIlywYqjxdF5IUX8GMUqgJlJTztcXQS,192.168.208.78,getUserInfo,AssociateService,03303925,null,M042872,
What is the best function to use to separate this parameter in a stored proc so I can then insert it into separate columns? I was looking at substring but that seems to be positional and not regex aware??
Thanks,
Chris
SQL Server doesn't have the native functionality; you have to build a function (generally referred to as "split"). This thread provides a number of TSQL options, looking for the same answer you're after--what performs best. Where they lack is in testing large amounts of comma delimited data, but then your requirement is only for eight values per column anyway...
SQL Server 2005+ supports SQLCLR, where functionality can be handed off to .NET code. The .NET code has to be deployed to the SQL Server instance as an assembly, and TSQL functions/procedures need to be created to expose the functionality within the assembly.
SUBSTRING is positional, but you can use the CHARINDEX function to find the position of each delimiter, and grab the SUBSTRINGs between each comma.
It's a clunky solution, to be sure, but the impact is minimized given a small static number of fields that always appear in order...
SQL Server isn't very strong in string handling - especially if you cannot use SQL-CLR to help.
You might be better off doing this beforehand, in your client application, using something like FileHelpers, a C# library which very easily imports a CSV and breaks it apart into an object for each row.
Once you have that, you can easily loop through the array of objects you have and insert those individual object properties into a relational table.

How can I get SSIS Lookup transformation to ignore alphabetical case?

Hopefully this is easy to explain, but I have a lookup transformation in one of my SSIS packages. I am using it to lookup the id for an emplouyee record in a dimension table. However my problem is that some of the source data has employee names in all capitals (ex: CHERRERA) and the comparison data im using is all lower case (ex: cherrera).
The lookup is failing for the records that are not 100% case similar (ex: cherrera vs cherrera works fine - cherrera vs CHERRERA fails). Is there a way to make the lookup transformation ignore case on a string/varchar data type?
There isn't a way I believe to make the transformation be case-insensitive, however you could modify the SQL statement for your transformation to ensure that the source data matches the case of your comparison data by using the LOWER() string function.
Set the CacheType property of the lookup transformation to Partial or None.
The lookup comparisons will now be done by SQL Server and not by the SSIS lookup component, and will no longer be case sensitive.
You have to change the source and as well as look up data, both should be in same case type.
Based on this Microsoft Article:
The lookups performed by the Lookup transformation are case sensitive. To avoid lookup failures that are caused by case differences in data, first use the Character Map transformation to convert the data to uppercase or lowercase. Then, include the UPPER or LOWER functions in the SQL statement that generates the reference table
To read more about Character Map transformation, follow this link"
Character Map Transformation