Substring depending on Delimiter in the string - sql

I am writing a procedure in order to handle a file name in SSIS.
Overview:
I am capturing the file name during a Text file load process in SSIS. I have written a procedure in order to split this file name into different components and return the values in form of Variables which I would be using further down the SSIS package.
Problem
This file name is of the format #FileName ="FILE_DATE_REF_DATETIME".All I need
to split this in a way like "FILE" , "DATE". I am able to achieve this by using
SUBSTRING(#Filename,0,CHARINDEX('_',#FileName))
and
Substring(#FileName,CHARINDEX('_',#FileName)+1,CHARINDEX('_',SUBSTRING(#Filename,CHARINDEX('_',#Filename)+1,Len(#Filename)))-1)
But here the major problem is when we get an additional '_' in FILE it completely goes wrong. Can anyone please suggest a way that I split the above file name format into FILE and DATE.
EDIT
Samples of FileNames:
asdfkg_20140710_ets20140710_0525_theds
asdjjf_they_20140710_ets20140710_0525_theds
oiuth_theyb_wgb_20140710_ets20140710_0526_theds
I need to extract anything before the 20140710 and also 20140710.

You can do it using PATINDEX instead of CHARINDEX
select SUBSTRING(#Filename,0,PATINDEX('%[_][0-9]%',#FileName))

If the last part of the filename is more reliable (no unexpected underscores), you could REVERSE it and use CHARINDEX to find the fourth underscore (and reverse the substrings again afterwards).
Otherwise, if you can trust the date format, you can use PATINDEX with a horrible expression like
'%[_][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][_]%'

Not the prettiest but it will work. LOL
DECLARE #FILE VARCHAR(55) = 'FILE_DATE_REF_DATETIME'
DECLARE #FILEFUN AS VARCHAR(55) = LEFT(#FILE,CHARINDEX('_',#FILE))
DECLARE #FILENAMEOUTPUT AS TABLE(Name Varchar(55))
WHILE LEN(#FILE) > 0
BEGIN
INSERT INTO #FILENAMEOUTPUT
SELECT REPLACE(#FILEFUN,'_','')
SET #FILE = REPLACE(#FILE,#FILEFUN,'')
SET #FILEFUN = iif(CHARINDEX('_',#FILE)=0,#FILE,LEFT(#FILE,CHARINDEX('_',#FILE)))
END
SELECT * FROM #FILENAMEOUTPUT

Related

How to replace a backslash with a double backslash without replacing existing double backslashes in SQL

I have been provided a path from an external source I have no control over and need to store the file path in my SQL Server database.
The file path will appear similar to the below;
C:\\Users\\Temp\filepath\test\document.txt
I need to store these with all double backslashes as such
C:\\Users\\Temp\\filepath\\test\\document.txt
What is the correct way to replace \ with \\ without turning the string into this
C:\\\\Users\\\\Temp\\filepath\\test\\document.txt
with a REPLACE call?
Here is a simple technique Gordon Linoff demonstrated some time ago. (can't recall the original post)
It will handle any number of repeating characters. In this case \
In short, it expands, eliminates and finally normalizes.
Example
Declare #S varchar(150) = 'C:\\Users\\Temp\filepath\test\document.txt'
Select replace(replace(replace(#S,'\','†‡'),'‡†',''),'†‡','\\')
Results
C:\\Users\\Temp\\filepath\\test\\document.txt
You can convert double to single and then single to double.
DECLARE #Path VARCHAR(100) = 'C:\\Users\\Temp\filepath\test\document.txt'
SELECT Replace(Replace(#Path, '\\', '\'), '\', '\\')

SQL regex FIND & Replace everything before a word, phrase, or path

I'm using sql server mgmt studio. I'm trying to do an UPDATE query along with a REPLACE using a regex to strip off internal pathing. It doesn't seem to be working right. Is there some other way I need to be invoking regex in SQL?
UPDATE dbo.Table
SET Path = REPLACE(Path , '.+?(?=Data)', '')
I wanted to basically go from
\\somepath\anotherpath\Data\file.txt to Data\File.txt
There is going to be variations on the paths so I'm trying to use regex to remove all characters before the word Data\
My regular expression is "+?(?=Data)" which seems to be working find in Textpad, but not in SQL.
There is no regexp support in SQL server. This can be done using substring and charindex functions.
UPDATE dbo.Table
SET Path = 'Data\' + SUBSTRING(path,CHARINDEX('\Data\',path)+len('\Data\'),len(path))
WHERE CHARINDEX('\Data\',path) > 0
You could use reverse and charindex to do this:
UPDATE dbo.Table
SET path =
case when path like '%\\%\\%'
then substr(path, 1-charindex('\\', reverse(path),
charindex('\\', reverse(path))+1
)
)
else path
end
This will find the second-last backslash, and take the characters that follow it. The case when is there to deal with paths that contain fewer than two backslashes.

SQL Server : Derive Multiple Rows from a text string contained in a database

I have a database that contains logging information. When a user uploads multiple files they show up as a text string in a record. I need to update another table with the names of the files that were uploaded.
In the below example, File1.txt and File2.txt are the file names:
PK Description
----------------------------------
1 Path: [Path]:\folder\sub Upload Method: Html5 Browser: IE 10.0 IP: 1.1.1.1 Files: Name: file1.txt Size: 313 KB Status: Completed Name: file2.txt Size: 444 KB Status: Completed Total Size: 758 KB Elapsed Time: 2 seconds Transfer Rate: 286 KB/second
I need to obtain and insert the file name in a new table ignoring the superfluous information so that it would appear like so:
PK Filename
-----------------------------------
1 file1.txt
2 file2.txt
Because different paths may be uploaded to, there is not a set number of characters that will be present before the first file. And although my example shows 2 files there could be more so I need to continue parsing file names from the text be there 1 or 10 or 50 of them. The file names are also not uniform but all of them are preceded by name.
My recommended broad approach
This is a pretty typical use-case for a user-defined table-valued function.
You essentially want to create a function that takes each value of your log Description as the main input parameter - probably also taking additional parameters to govern what the start and end of each interesting substring should be. (In your case, interesting substrings start after Name: and end just before Size:.)
The function extracts each interesting value and adds it to an accumulator table variable, which is then returned as the result of the function.
You can use such a function neatly over presumably-many rows of logging information, using cross apply or outer apply operators (explained around half-way down this page), something like so:
select L.Description
,R.Filename
from dbo.uploadlogs as L
cross apply dbo.my_tv_function(L.Description,'%Name: %','% Size:%') as R;
This assumes the my_tv_function returns a column called Filename containing the split out filenames. (That's up to how you write the function.)
You could hard-code the patterns you want to search for into the function, but then it'd be less useful/transferrable to different styles of logging information.
For every Description, this will produce n rows in the result set corresponding to n files uploaded in that Description log.
Having got that it should be easy to add a new unique key column using row_number().
How to create such a user-defined function?
In a general sense, you're going to want to leverage two standard SQL functions:
Patindex: finds out where a particular pattern in a bigger string first starts.
Substring: slices a, well, a substring from a bigger string.
Combining these functions (or patindex's closely related charindex) is a very common way to get hold of a consistent bit of a string, when you don't know where exactly it'll start (or how long it'll go on for).
But this only gets me the first occurrence of the text I want!
This is where to bring in a while loop. Looping in SQL is both often-maligned, and often-misused. However, it's a useful language construct and situations like this, within functions, are exactly where looping is both appropriate and effective. To ensure the loop ends, you need to make the long string (the log Description) shorter on each time around, by cutting off the bit you've already found a filename in, and leaving everything beyond it.
There are other possible approaches without a while loop: in a general sense, this problem of "doing the same thing multiple times along a big string" can be solved recursively or iteratively, and a while loop is the iterative approach. In SQL, I prefer this approach.
Putting it all together
I'm not sure if you wanted a complete code solution or just guidance. If you want to figure the actual code out yourself, stop reading about now... :)
Here's a SQL function definition that will do what I described above:
create function dbo.fn_SplitSearch (
#searchString nvarchar(max)
,#startPattern nvarchar(255)
,#endPattern nvarchar(255)
)
returns #fileList table (Filename nvarchar(255) not null)
begin
/***
This table-valued function will return all instances of text
starting with #startPattern, and going up to the last character before
#endPattern starts. This might include leading/trailing spaces depending
on what you define as the patterns.
***/
declare #foundValue nvarchar(255) =''
declare #startLoc int =0
declare #endLoc int =0
while patindex(#startPattern,#searchString)<>0
begin
set #startLoc = patindex(#startPattern,#searchString)
set #endLoc = patindex(#endPattern,#searchString)
set #foundValue = substring(#searchString,#startLoc,#endLoc-#startLoc)
insert into #fileList values (#foundValue)
-- Next time round, only look in the remainder of the search string beyond the end of the first endPattern
set #searchString = substring(#searchString,#endLoc+len(#endPattern),len(#searchString))
end
return
end;
This will actually output results like this:
Filename
---------
Name: file1.txt
Name: file2.txt
including the startPattern text in the output. For me this is a little more generic and it should be easy to trim off the Name: bit outside the function if you want. You could alternatively modify the function to only return the file1.txt part.
I would add some regex clr assembly to my database and then use regex match to extract file names.

Pass varchar param to sql file via Batch file

I have a batch file which runs multiple sql files and passes parameters to them.
The input parameters are stored in a txt file which is read by batch file.
I am trying to pass varchar values to IN clause of SQL.
For e.g.
In my input file i this entry
var1="'tommy','jim'"
In Batch file
<code to read file , assuming %1 has the var1 value>
set param=%~1
sqlplus %DBCONN% #%programFolder%\test.sql %param%
test.sql (name is varchar2)
select * from table where name in (&1);
This gives error, saying invalid number
as it tries to run
select * from table where name in (tommy);
If i echo right before the sql stmt, its displays 'tommy','jim'
but in sql its removing jim and the single quotes ...
Please help!
Now i edited entry in Input file as
var1="'''tommy''','''jim'''"
it goes as select * from table where name in ('tommy');
But it truncates the second value
Any clue how to include comma ??
Finally found a way
input file -
var1=('tommy','jim')
Remove the braces from the sql file
select * from table where name in &1;
This works , i have no idea why was taking a comma was such an issue, there should have been some way to pass the comma !
If anyone finds out please let me know.
Thanks
The tilde (~) strips quotes from the variable.
Try using var1='tommy','jim' in your input file and set param=%1 in your batch script.

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63