How to search a image/varbinary field for records that start with a binary pattern

How to search a image/varbinary field for records that start with a binary pattern - sql

I am trying to find all images that do not start with the magic number ff d8 ff e0 (the signature for jpg) According to the MSDN I should be able to use patindex on my data. However
SELECT TOP 1000 [cpclid]
FROM [cp]
where patindex('FFD8FFE0%', cpphoto) = 0 -- cpphoto is a column type of image
gives me the error
Msg 8116, Level 16, State 1, Line 1
Argument data type image is invalid for argument 2 of patindex function.
What would be the correct way to find records that do not match the magic number of ff d8 ff e0?
UPDATE:
Here is a link to test any suggestions you have.
I Ross's solution worked in the end with some tweaking on what the query.
SELECT [cpclid]
FROM [cp]
where convert(varchar(max), cast(cpphoto as varbinary(max))) not like convert(varchar(max), 0xFFD8FFE0 ) + '%'
I found a even better solution, see my answer.

I found a much simpler solution that runs a lot faster.
SELECT [cpclid]
FROM [cp]
where cast(cpphoto as varbinary(4)) <> 0xFFD8FFE0

Why are you still using the IMAGE data type? It has been deprecated in favor of VARBINARY(MAX)... if you convert your column to VARBINARY(MAX) I think you'll find it a lot easier to work with.
EDIT
In SQL Server 2008 you can use a much easier convert:
SELECT CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX), cpphoto), 2) FROM cpphoto;
In fact this worked just fine on your StackExchange query (I suspect the back end is not using SQL Server 2005).
But I'm glad my answer was so useless to you. Noted to self.

Use where cpphoto not like 'FFD8FFE0%' in your where clause.
cast cpphoto as a varchar(max) if it is not a string already.

Related

REVERSE function in Netezza not working, how to extract file name from path without it?

My company has production and testing databases in SQL Server and a data warehouse in IBM Netezza. I wrote a query in SQL Server and now need to covert it for use in the data warehouse, however I am running into a problem.
A crucial part of the query is extracting a file name from a path, and in SQL Server I use this:
RIGHT( BitmapID, CHARINDEX( '\', REVERSE( BitmapID ) + '\' ) - 1 )
This turns "G:\grps\every\Permanent Marketing Signage\SPC\BRD\BLAD\BCAG_BLAD_001.png" to "BCAG_BLAD_001.png" and it works perfectly. I tried to convert this to Netezza syntax like so:
SUBSTRING(bit_map_ID, LENGTH(bit_map_ID) - ( STRPOS( REVERSE( bit_map_ID ), '\' ) + 2 ) )
However, when I run this, I get an error:
ERROR [42S02] ERROR: Function 'REVERSE(VARCHAR)' does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts
When I replace REVERSE( bit_map_ID ) with a reversed string example like "gnp.100_DALB_GACB\DALB\DRB\CPS\egangiS gnitekraM tnenamreP\yreve\sprg:G" this also works perfectly, so it's the REVERSE function that's the problem. Even though Aginity Workbench highlights the REVERSE function as if it exists, it doesn't seem to work at all - or if there is a way to make it work, I can't figure it out. I've already tried using CAST as suggested by the error message but it makes no difference.
Is there a way to reverse a string in Netezza? Or failing that, is there any other way of accomplishing what I want to do without reversing the string?

I was able to figure out how to do this in Netezza without using a REVERSE function like so:
SUBSTRING( bit_map_ID, INSTR( bit_map_ID, '\', -1 ) + 1 )
The key is to use the INSTR function and specify the third argument as -1 so that it will look for the first instance starting from the end of the string instead of the beginning of the string. No reversing needed.
While this works for my needs, I would definitely be open for alternative answers for the question I posed!

To my knowledge, the REVERSE function does not exist on netezza, and that is indeed what the error message above says, so I can confirm that the solution you provided is the way to go.
Alternative solutions would be to use a regular expression function or a string split.
To my knowledge MSsql server has none of those 3 solutions available, and the real issue for you is probably that the SQL standard does not include a list of functions needed to be compliant, so each database has its own take on which functions to include and what their interface is (negative arguments to instr in not universally accepted)

SQL Server : extracting the Midddle Characters without CHARINDEX

To start, I have seen the CHARINDEX results on here but none of them seem to be working for my case. The reasons are either a, CHARINDEX can't help me, or b, I am not understanding how CHARINDEX works. That being said, I would like to ask my question here in hopes that I can get some clarification on both how to solve my issue and CHARINDEX if that so happens to be the way this question is answered.
The variable that I am trying to extract from has varying length. However, two things are always constant.
There is always a '/' as the 16th character in the string
The last character in the string is always '0' OR '1'
What I am trying to do is extract the name from between '/' and '0' or '1'. In short, I want to chop off the first 16 characters and the last character of every string. So far, this is what I have:
SELECT
SUBSTRING([string_name], 17, LEN([string_name]) - 1) AS 'username'
FROM
[table_name]
The results I get still contain the 0 OR 1 at the end. What I need to do is somehow remove that 0 from the string. It is important to note that the number of characters between '/' and '0' are always different.
Current results:
gordon0
grant0
greg0
guy1
hanying0
Desired results:
gordon
grant
greg
guy
hanying
Any advice here would be wonderful.
Please let me know if you need any additional information from me. If possible, would like to maintain using either SUBSTRING, LEFT or RIGHT.
Thanks

Adjusting the length would seem to address your problem:
SELECT SUBSTRING([string_name], 17, LEN([string_name])-17) AS username
FROM [table_name]

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!

Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

Matching multiple variations of input to one sql row

I would like to know after much searching how I would match different variations of input to one sql row using standard TSQL. Here is the scenario:
I have in my sql row the following text: I love
I then have the following 3 inputs all of which should return a match to this row:
"I want to tell you we all love StackOverflow"
"I'm totally in love with StackOverflow"
"I really love StackOverflow"
As you can see I have bolded the reason for the match to try and make it clearer to you why they match. The I in I'm is deliberately matched too so it would be good if we could include that in matches.
I thought about splitting the input string which I done using the following TSQL:
-- Create a space delimited string for testing
declare #str varchar(max)
select #str = 'I want to tell you we all love StackOverflow'
-- XML tag the string by replacing spaces with </x><x> tags
declare #xml xml
select #xml = cast('<x><![CDATA['+ replace(#str,' ',']]></x><x><![CDATA[') + ']]></x>' as xml)
-- Finally select values from nodes <x> and trim at the same time
select ltrim(rtrim(mynode.value('.[1]', 'nvarchar(12)'))) as Code
from (select #xml doc) xx
cross apply doc.nodes('/x') (mynode)
This gets me all the words as separate rows but then I could not work out how to do the query for matching these.
Therefore any help from this point or any alternate ways of matching as required would be more than greatly appreciated!
UPDATE:
#freefaller pointed me to the RegEx route and creating a function I have been able to get a bit further forward, therefore +1 #freefaller, however I now need to know how I can get it to look at all my table rows rather than the hard-coded input of 'I love' I now have the following select statements:
SELECT * FROM dbo.FindWordsInContext('i love','I want to tell you we all love StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I''m totally in love with StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I really love StackOverflow',30)
The above returns me the number of times matched and the context of the string matched, therefore the first select above returns:
Hits Context
1 ...I want to tell you we all love StackOv...
So based on the fact we now have the above can anyone tell me how to make this function look at all of the rows for matches and then return the row/rows that have a match?

One option would be to use Regular Expressions via SQLCLR objects as explained here.
I have never myself created SQLCLR objects, so cannot comment on the ease of this method. I am however, a great fan of Regular Expressions and would recommend their use for most text search / manipulation
Edit: In response to the comment, I have no experience of SQLCLR, but assuming you get that working, something like the following simple untested TSQL might work...
SELECT *
FROM mytable
WHERE dbo.RegexMatch(#search, REPLACE(myfield, ' ', '.*?')) = 1

I have managed to come up with an answer to my own question so thought I thought I would post here in case anyone else has similar requirements in the future. Basically it relies upon the SQL-CLR regular expression functionality and runs with minimal impact to performance.
Firstly enable SQL-CLR on your server if not already available (you need to be sysadmin):
--Enables CLR Integration
exec sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
Then you will need to create the assembly in SQL (Don't forget to change your path from D:\SqlRegEx.dll and use SAFE permission set as this is the most restrictive and safest set of permissions but won't go into detail here.) :
CREATE ASSEMBLY [SqlRegEx] FROM 'D:\SqlRegEx.dll' WITH PERMISSION_SET = SAFE
Now create the actual function you will call:
CREATE FUNCTION [dbo].[RegexMatch]
(#Input NVARCHAR(MAX), #Pattern NVARCHAR(MAX), #IgnoreCase BIT)
RETURNS BIT
AS EXTERNAL NAME SqlRegEx.[SqlClrTools.SqlRegEx].RegExMatch
Finally and to complete and answer my own question we can then run the following TSQL:
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(your_field, ' ', '.*?'), 1) = 1
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(REVERSE(your_field), ' ', '.*?'), 1) = 1
I hope this will help someone in what should be a simple search option in the future.

inserting number into oracle sql - using jython

I have this insert command where iam trying to insert a number to be taken from loop
i=0
for line in column:
myStmt.executeQuery("INSERT INTO REVERSE_COL
( TABLE_NAME,COL_NAME,POS) values
(,'test','"+column[i]+"','"+i+"'")
i=i+1
POS IS NUMBER DATATYPE
but it works if i hard code as 1
i=0
for line in column:
myStmt.executeQuery("INSERT INTO REVERSE_COL
( TABLE_NAME,COL_NAME,POS) values
(,'test','"+column[i]+"',1")
I have tried only i , +i+ and other method but its not working any suggestion how to solve this .
Thanks everyone .

I have no jython experience, but I will still try to offer my personal approach and advice. Take from it what you will.
The first thing that I would look into, and perhaps this is something someone else knows offhand, is the way that a number is concatenated to the string. I'm speaking from a C++ background here, but a number i may well be converted to the ASCII character representing that value, and not necessarily the character that you intend.
For example, if i is 9, it may be placing a TAB into the string and not the number 9, which would be an ASCII value 57.
Again, I'm not telling you this IS the answer...but it's the first thing that pops into my mind. Good luck!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to search a image/varbinary field for records that start with a binary pattern - sql

I found a much simpler solution that runs a lot faster. SELECT [cpclid] FROM [cp] where cast(cpphoto as varbinary(4)) <> 0xFFD8FFE0

Use where cpphoto not like 'FFD8FFE0%' in your where clause. cast cpphoto as a varchar(max) if it is not a string already.

Related

REVERSE function in Netezza not working, how to extract file name from path without it?

SQL Server : extracting the Midddle Characters without CHARINDEX

Dynamic, Nested Replace

Matching multiple variations of input to one sql row

inserting number into oracle sql - using jython

Categories

Resources