Removing part of a String using value from another column - sql

I have a field that contains file paths to attachments, contained within the filename is the attachments "AttachmentID" which gets auto appended, but in some cases this ID is duplicated which is causing problems when my front-end tries to find the attachment. I want to remove the duplicate ID.
I'm thinking the best way to do this is using REPLACE but I don't know how I can tell SQL find the AttachmentID within the Path
Here's what I've written to find the records:
SELECT Path
FROM [Attachments].[dbo].[Attachments]
WHERE [Path] LIKE CONCAT ('%','-',[AttachmentID],'-','%')
I.e. \\SERVERNAME\X\FILEPATH\ATTACHMENT\01928-01928-Filename.JPG
I want it to read: \\SERVERNAME\X\FILEPATH\ATTACHMENT\01928-Filename.JPG
That number I'm removing is also stored independently in another column called AttachmentID.

I think I may have answered my own question, sorry!
UPDATE [Attachments].[dbo].[Attachments]
SET Path = REPLACE(Path, CONCAT('-',[AttachmentID]), '')
WHERE [Path] LIKE CONCAT ('%','-',[AttachmentID],'-','%')
Since the additional ID is always prefixed by a hyphen, this seems to have worked.

Related

Remove sub-string from data in sql table column

I have a table that has a bunch of url's within a certain column. We no longer want a certain url within the table and instead of manually updating each data record I was curious if there is a way to remove just a certain type of url through an update query?
For instance, a data record with the following url's exists:
Presentation (PowerPoint File)<br> Presentation (Webcast)
and I want to remove the smil url so the data only shows:
Presentation (PowerPoint File)<br>
I want to remove the entire "smil" url from this string (from ), and every other smil url from the other records (the other records are similar with a different smil file name). Some of the records could have more than two urls, BUT the "smil" url is always the last one.
Preserving some of the comment history so future readers understand the decision points before implementing the solution
Does it always follow the pattern of text<br>text
there are a few times where there are two urls and they exclude the <br> and then there are a few times where it is just the smil url within the data.
You haven't clearly define what a "smil" url is. Is it one with smil in it anywhere? With the file suffix being .smil? With /smil/ in the path? some combination of these?
The problem you're going to have is that to properly solve this, you'll need to be able to have some insight into the html fragments. That's usually a .NET thing, the string matching in TSQL is likely to be insufficient for your needs. You could try taking multiple passes as it. If it follows the text<br>text pattern, you could left(myCol, charindex(mycol, '<br>')) where Mycol like '%smil%' and keep taking passes at it until you've found all the patterns.
#billinkc: I see where you are going, I was thinking if it would be possible to remove everything from the start of <a href="xxx since those "smil" links all start with that character string.
And there'd never be the case of streaming<br>foo? If so, then yeah, search for the <a href="http: using charindex/patindex (can never remember which) and then slice it out with left/substring.
#billinkc: yup that will always be the case. the "streaming" url is ALWAYS last. Ok this was easier than I thought, just needed some outside eyes. Thank you.
Given that we know we don't have to worry about anything useful existing after the smil url and that the url will always be an external, we can safely use a left/substring approach like
DECLARE #Source table
(
SourceUrl varchar(200)
)
INSERT INTO #Source
(SourceUrl)
VALUES
('Presentation (PowerPoint File)<br> Presentation (Webcast)');
-- INSPECT THIS, IF APPROPRIATE THEN
SELECT
S.SourceUrl AS Before
, CHARINDEX('<a href="http://', S.SourceUrl) AS WhereFound
, LEFT(S.SourceUrl, CHARINDEX('<a href="http://', S.SourceUrl) -1) AS After
FROM
#Source AS S
WHERE
S.SourceUrl LIKE '%smil%';
-- Only run this if you like the results of the above
UPDATE
S
SET
SourceUrl = LEFT(S.SourceUrl, CHARINDEX('<a href="http://', S.SourceUrl) -1)
FROM
#Source AS S
WHERE
S.SourceUrl LIKE '%smil%';

Trim string up to certain character in sql (oracle)

Looking for some help here if anyone can offer some. I am working with an oracle database and I would like to trim a string up until a certain character '/'. These fields are paths of a URL and they are all different sizes so I need to make sure it's getting to the very last '/' in the URL and removing everything up until that point. Additionally, there is a session ID that is associated with some of these URLs that is located at the end of the string and has a semi-colon before it starts, so I would want to remove everything that contains a semi-colon up to the semi_colon and on. So essentially I want to remove content from the beginning of the URL and from the end of the URL if applicable. Examples of these URL's (string) are as follows:
Current URLS
/ingaccess/jsp/mslogon.jsp
/ingaccess/help/helpie_term_cash_surrender_value.html
/eportal/logout.do;jsessionid=xr8co1kyebrve47xsjwmzw--.p704
/eportal/logout.do;jsessionid=gdh_e_e1m1hna0z9ednklg--.p705
/ingaccess/help/helpie_term_northern_unit_value.html
/ingaccess/help/helpie_scheduled_rebalance.html
/eportal/home.action;jsessionid=9vhfbkhunkvtcm5g1dtgsa--.p704
/ingaccess/help/helpie_catch_up_50.html
/ingaccess/piechartmaker
/ingaccess/help/helpie_term_fund_balance.html
Desired URLS
mslogon.jsp
helpie_term_cash_surrender_value.html
logout.do
logout.do
helpie_term_northern_unit_value.html
helpie_scheduled_rebalance.html
home.action
helpie_catch_up_50.html
piechartmaker
helpie_term_fund_balance.html
Anyone know of an easy fix here? I've tried working with SUBSTR and REPLACE a bit but can't seem to get them to work.
Thanks a bunch in advance,
Ryan
Try this
SELECT CASE WHEN INSTR (surname,';')>0 THEN SUBSTR(surname,1
,INSTR(surname,';',1,1)-1) ELSE surname END FROM
(SELECT SUBSTR(column,INSTR(column,'/',-1)+1) AS surname
FROM tableName)
Tested on Apex

Tortoise SVN property substitution - fails for more than one property "group"

I'm using TortoiseSVN 1.6.12, and seeing something very strange behaviour on property substitution. I have some svn:keyword properties (configured via TSVN) like this:
Author, LastChangedBy, Date, DateLastChanged
which I've applied recursively across every file in the codeset
I then did a simple test on a text file like this
Some text
$Author$
$LastChangedBy$
$Date$
$LastChangedDate$
When I commit my changes, the Author and LastChangedBy properties are substituted but not the Date or LastChangedDate ones. I did some experimenting around combinations and it appears that either the author properties are set, or the date ones (but never both). So it must be doing some validation based on property groups. (In TSVN, you can't simply created another svn:keywords entry, you're stuck with one).
Has anyone ever encountered this and/or is there a workaround?
The problem you have is simply based on that SVN only replaces keywords which are known to SVN.
You are using the following list of keywords set:
Author, LastChangedBy, Date, DateLastChanged
but you have placeholders set in your text file:
$Author$
$LastChangedBy$
$Date$
$LastChangedDate$
the known keywords are the following:
URL, HeadURL
Author, LastChangedBy
Date, LastChangedDate
Rev, Revision
LastChangedRevision
Id
Header
The problem you have that svn:keywords must exactly represents the keywords you would like to replace with values. But be aware that keywords are case sensitive. Furthermore you have defined a keyword "DateLastChanged" which does simply not exist and will of course not be replaced by SVN, cause it's unknown by SVN. On the other hand i assume you have a typo in your svn:keywords contents. may be you can copy&past the output of
svn pl . -v filename
on command line on that file. One point i missed before have you separated the keywords with a space?

What is the best way to retrieve an ID from a file name?

Scenario:
Our customer is has provided us with files whose names contain an ID number that we need for indexing purposes.
.\root\dir1\a123.txt (ID is 123)
.\root\dir2\abc345.csv (ID is 345)
.\root\dir3\235.xls (ID is 235)
we know what format to expect based on the files location and extension. Our customer would like to be able to add
.\root\dir4\foo556.bar (ID is 556)
meaning we cannot write a custom method for each entry under root.
My Solution:
The solution we are thinking of is to store the formats of the file names in an XML file
<root>
<entry>
...
<format>abc###</format>
...
<entry>
<root>
when the customer want to add a new entry under root they'll have to give a directory, a file extension and a format. Then on our end implement a getID() method that is able to use the format specified in the XML to retrieve the IDs from the file name.
Question:
Has anyone else dealt with a similar situation? If so is there a better solution than the one I have provided?
Assuming the file name will always be on the form <letters><digits>.<extension>, I would use a simple regular expression to match the relevant part of the name. E.g. .*\\[a-z]*\([0-9]*\)\..* (may vary depending on the RE engine in question).
If you want a generic solution which would automatically identify all files that match, Yyou could use file globs in the shell if they are available and work for your particular case:
something like:
ls root/*/ | sed 's/^(.*)([0-9])+(.[A-Za-z][A-Za-z][A-Za-z]+)$/"\1\2\3" \2/' | xargs -n2 runMyProgramHere
if you need to do it programatically, normally directory inquiries are fairly easy in most languages, list everything in /root, of those, list everything, filter by files ending in +.+, there's your list.
in psuedo-code:
for (directory in file.getDirectoryList("/root")) {
for (name in file.getDirectoryList("/root/" + directory)) {
if (name contains a sequence of numbers followed by a dot ending with an extension) {
extract id
store filename and id
}
}
}
you can probably do this with regexes if you really want, but I tend to avoid regexes in programs unless I have a really good reason not to. They are often poorly understood and prone to breaking without good error reporting.

MySQL: How can I remove trailing HTML from a field in the database?

I want to remove some rogue HTML from a DB field that is supposed to contain a simple filename. Example of ok field:
myfile.pdf
Example of not ok field:
myfile2.pdf<input type="hidden" id="gwProxy" />...
Does anyone know a query I can run that can remove the HTML part but leave the filename? i.e. remove everything from the first < character onwards.
Lets assume the field is called myattachment and is defined as a varchar(250) and the table is called mytable in a MySQL database.
Background info (not necessary to read):
The field in our database is supposed to contain filenames however, due to a issue (documented here) some of the fields now contain a filename and some rogue HTML. We have fixed the root issue and now need to fix the corrupt fields. In the past I have replaced text using this kind of query:
UPDATE mytable SET myattachment = replace(myattachment, 'JPG', 'jpg') WHERE myattachment LIKE '%JPG';
This query seems to work ok, can anyone see any issues with it?
UPDATE mytable
SET myattachment = SUBSTRING_INDEX(myattachment, '<', 1)
WHERE `myattachment` LIKE '%<%';
For docs on SUBSTRING_INDEX see the mysql manual page.