REVERSE function in Netezza not working, how to extract file name from path without it? - sql

My company has production and testing databases in SQL Server and a data warehouse in IBM Netezza. I wrote a query in SQL Server and now need to covert it for use in the data warehouse, however I am running into a problem.
A crucial part of the query is extracting a file name from a path, and in SQL Server I use this:
RIGHT( BitmapID, CHARINDEX( '\', REVERSE( BitmapID ) + '\' ) - 1 )
This turns "G:\grps\every\Permanent Marketing Signage\SPC\BRD\BLAD\BCAG_BLAD_001.png" to "BCAG_BLAD_001.png" and it works perfectly. I tried to convert this to Netezza syntax like so:
SUBSTRING(bit_map_ID, LENGTH(bit_map_ID) - ( STRPOS( REVERSE( bit_map_ID ), '\' ) + 2 ) )
However, when I run this, I get an error:
ERROR [42S02] ERROR: Function 'REVERSE(VARCHAR)' does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts
When I replace REVERSE( bit_map_ID ) with a reversed string example like "gnp.100_DALB_GACB\DALB\DRB\CPS\egangiS gnitekraM tnenamreP\yreve\sprg:G" this also works perfectly, so it's the REVERSE function that's the problem. Even though Aginity Workbench highlights the REVERSE function as if it exists, it doesn't seem to work at all - or if there is a way to make it work, I can't figure it out. I've already tried using CAST as suggested by the error message but it makes no difference.
Is there a way to reverse a string in Netezza? Or failing that, is there any other way of accomplishing what I want to do without reversing the string?

I was able to figure out how to do this in Netezza without using a REVERSE function like so:
SUBSTRING( bit_map_ID, INSTR( bit_map_ID, '\', -1 ) + 1 )
The key is to use the INSTR function and specify the third argument as -1 so that it will look for the first instance starting from the end of the string instead of the beginning of the string. No reversing needed.
While this works for my needs, I would definitely be open for alternative answers for the question I posed!

To my knowledge, the REVERSE function does not exist on netezza, and that is indeed what the error message above says, so I can confirm that the solution you provided is the way to go.
Alternative solutions would be to use a regular expression function or a string split.
To my knowledge MSsql server has none of those 3 solutions available, and the real issue for you is probably that the SQL standard does not include a list of functions needed to be compliant, so each database has its own take on which functions to include and what their interface is (negative arguments to instr in not universally accepted)

Related

How to dynamically format BigQuery `dataset.schema.table name` with backticks

I need to work through how to take stored procedure functions from
region-us.INFORMATION_SCHEMA.ROUTINES
and modify the backticks that default to coming through around the project and place them around the dataset.schema.table()
The reason is more for uniform results across our system than a technical error need.
currently when I run this query
SELECT
replace(ddl, 'CREATE PROC', 'CREATE OR REPLACE PROC'),
FROM region-us.INFORMATION_SCHEMA.ROUTINES
where lower(routine_type) = 'procedure'
It will return the below:
`project-data-sandbox`.schema.MySP()
`project-data-sandbox`.schema.YourSP(MySP)
`project-data-sandbox`.inv.partnumber(orderid)
`project-data-sandbox`.inv_part.part_number(part_id)
I have tried the below query
SELECT
REGEXP_REPLACE(ddl, r"project-data-sandbox`.", "project-data-sandbox.") AS replaced_word
, REGEXP_REPLACE(ddl, r'`([a-zA-Z]+(-[a-zA-Z]+)+)`\.[a-zA-Z]+\.[a-zA-Z]+\(\)','Apples') tester
FROM region-us.INFORMATION_SCHEMA.ROUTINES
where lower(routine_type) = 'procedure'
I get part of what I want. However, the problem is our stored procedures can be named any sort of names and they could require objects to be passed to them.
I added the tester column to see if I could replace the project string with another word (or regex) but it isn't even replacing it with apples yet.
which I would want turned into this:
`project-data-sandbox.schema.MySP`()
`project-data-sandbox.schema.YourSP`(MySP)
`project-data-sandbox.inv.partnumber`(orderid)
`project-data-sandbox.inv_part.part_number`(part_id)
I'm working through Regexp_replace but I'm having difficulty figuring out how to get the backtick between the parenthesis and the last letter.
Thanks for any help!

How to remove several symbols from a string in BiqQuery

I have strings which contain numbers like that:
a20cdac0_19221bdc12022bab3fe05a43df4a7dbe
I need to get only symbols after underscore symbol:
19221bdc12022bab3fe05a43df4a7dbe
Unfortunately, the amount of those symbols is always different, so I can't use just RIGHT function.
I know that probably REGEXP might help, but I can't understand how to use that exactly. Will be very grateful for the help.
Below is for BigQuery Standard SQL (using regexp)
regexp_extract(value, r'_(.*)') regexp_approach
if to apply to sample value from your question
regexp_extract('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe', r'_(.*)') regexp_approach
result is
Yet, another regexp option is to use regexp_replace as in below example
regexp_replace(value, r'^.*?_', '')
Note: using split in this case is also an option unless you have more than one _ in which case you will get part between first and second _
split(value, '_')[safe_offset(1)]
Also, as you can see you need to use safe to prevent error in cases when _ is absent
You can use the split function like this
select split('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe','_')[ORDINAL(2)];
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!
Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

Matching multiple variations of input to one sql row

I would like to know after much searching how I would match different variations of input to one sql row using standard TSQL. Here is the scenario:
I have in my sql row the following text: I love
I then have the following 3 inputs all of which should return a match to this row:
"I want to tell you we all love StackOverflow"
"I'm totally in love with StackOverflow"
"I really love StackOverflow"
As you can see I have bolded the reason for the match to try and make it clearer to you why they match. The I in I'm is deliberately matched too so it would be good if we could include that in matches.
I thought about splitting the input string which I done using the following TSQL:
-- Create a space delimited string for testing
declare #str varchar(max)
select #str = 'I want to tell you we all love StackOverflow'
-- XML tag the string by replacing spaces with </x><x> tags
declare #xml xml
select #xml = cast('<x><![CDATA['+ replace(#str,' ',']]></x><x><![CDATA[') + ']]></x>' as xml)
-- Finally select values from nodes <x> and trim at the same time
select ltrim(rtrim(mynode.value('.[1]', 'nvarchar(12)'))) as Code
from (select #xml doc) xx
cross apply doc.nodes('/x') (mynode)
This gets me all the words as separate rows but then I could not work out how to do the query for matching these.
Therefore any help from this point or any alternate ways of matching as required would be more than greatly appreciated!
UPDATE:
#freefaller pointed me to the RegEx route and creating a function I have been able to get a bit further forward, therefore +1 #freefaller, however I now need to know how I can get it to look at all my table rows rather than the hard-coded input of 'I love' I now have the following select statements:
SELECT * FROM dbo.FindWordsInContext('i love','I want to tell you we all love StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I''m totally in love with StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I really love StackOverflow',30)
The above returns me the number of times matched and the context of the string matched, therefore the first select above returns:
Hits Context
1 ...I want to tell you we all love StackOv...
So based on the fact we now have the above can anyone tell me how to make this function look at all of the rows for matches and then return the row/rows that have a match?
One option would be to use Regular Expressions via SQLCLR objects as explained here.
I have never myself created SQLCLR objects, so cannot comment on the ease of this method. I am however, a great fan of Regular Expressions and would recommend their use for most text search / manipulation
Edit: In response to the comment, I have no experience of SQLCLR, but assuming you get that working, something like the following simple untested TSQL might work...
SELECT *
FROM mytable
WHERE dbo.RegexMatch(#search, REPLACE(myfield, ' ', '.*?')) = 1
I have managed to come up with an answer to my own question so thought I thought I would post here in case anyone else has similar requirements in the future. Basically it relies upon the SQL-CLR regular expression functionality and runs with minimal impact to performance.
Firstly enable SQL-CLR on your server if not already available (you need to be sysadmin):
--Enables CLR Integration
exec sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
Then you will need to create the assembly in SQL (Don't forget to change your path from D:\SqlRegEx.dll and use SAFE permission set as this is the most restrictive and safest set of permissions but won't go into detail here.) :
CREATE ASSEMBLY [SqlRegEx] FROM 'D:\SqlRegEx.dll' WITH PERMISSION_SET = SAFE
Now create the actual function you will call:
CREATE FUNCTION [dbo].[RegexMatch]
(#Input NVARCHAR(MAX), #Pattern NVARCHAR(MAX), #IgnoreCase BIT)
RETURNS BIT
AS EXTERNAL NAME SqlRegEx.[SqlClrTools.SqlRegEx].RegExMatch
Finally and to complete and answer my own question we can then run the following TSQL:
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(your_field, ' ', '.*?'), 1) = 1
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(REVERSE(your_field), ' ', '.*?'), 1) = 1
I hope this will help someone in what should be a simple search option in the future.

T-SQL: checking for email format

I have this scenario where I need data integrity in the physical database. For example, I have a variable of #email_address VARCHAR(200) and I want to check if the value of #email_address is of email format. Anyone has any idea how to check format in T-SQL?
Many thanks!
I tested the following query with many different wrong and valid email addresses. It should do the job.
IF (
CHARINDEX(' ',LTRIM(RTRIM(#email_address))) = 0
AND LEFT(LTRIM(#email_address),1) <> '#'
AND RIGHT(RTRIM(#email_address),1) <> '.'
AND CHARINDEX('.',#email_address ,CHARINDEX('#',#email_address)) - CHARINDEX('#',#email_address ) > 1
AND LEN(LTRIM(RTRIM(#email_address ))) - LEN(REPLACE(LTRIM(RTRIM(#email_address)),'#','')) = 1
AND CHARINDEX('.',REVERSE(LTRIM(RTRIM(#email_address)))) >= 3
AND (CHARINDEX('.#',#email_address ) = 0 AND CHARINDEX('..',#email_address ) = 0)
)
print 'valid email address'
ELSE
print 'not valid'
It checks these conditions:
No embedded spaces
'#' can't be the first character of an email address
'.' can't be the last character of an email address
There must be a '.' somewhere after '#'
the '#' sign is allowed
Domain name should end with at least 2 character extension
can't have patterns like '.#' and '..'
AFAIK there is no good way to do this.
The email format standard is so complex parsers have been known to run to thousands of lines of code, but even if you were to use a simpler form which would fail some obscure but valid addresses you'd have to do it without regular expressions which are not natively supported by T-SQL (again, I'm not 100% on that), leaving you with a simple fallback of somethign like:
LIKE '%_#_%_.__%'
..or similar.
My feeling is generally that you shouln't be doing this at the last possible moment though (as you insert into a DB) you should be doing it at the first opportunity and/or a common gateway (the controller which actually makes the SQL insert request), where incidentally you would have the advantage of regex, and possibly even a library which does the "real" validation for you.
If you use SQL 2005 or 2008 you might want to look at writing CLR stored proceudues and use the .NET regex engine like this. If you're using SQL 2000 or earlier you can use the VBScript scripting engine's regular expression like ths. You could also use an extended stored procedure like this
There is no easy way to do it in T-SQL, I am afraid. To validate all the varieties of email address allowed byRFC 2822 you will need to use a regular expression.
More info here.
You will need to define your scope, if you want to simplify it.