The Problem
I have a field that stores file keys, such as:
dev/application/document_type_name/document 12345-67890_123.pdf
I need to select the key without the number on the end so the value looks like:
dev/application/document_type_name/document 12345-67890_.pdf
Potential Strategy
It's been a while since I've done T-SQL but coming from the .NET side I think the general strategy would be:
Get the last index of an underscore character
Get the last index of the period character.
Replace the value between those two characters with a blank.
In C#, I think it would be something like:
var test = "dev/application/document_type_name/document 12345-67890_123.pdf"
var indexOfUnderscore = test.LastIndexOf("_");
var indexOfPeriod = test.LastIndexOf(".");
var textToReplace = subtring(indexOfUnderscore + 1, indexOfPeriod -1);
var output = test.Replace(textToReplace, String.Empty);
Notes:
Every key will have that Format
The length of the value between the underscore and the period may be different (could be 1, 12345, etc.
the end result should keep the underscore and period
Try
select left(#s,len(#s)-charindex('_',REVERSE(#s)))+'_.pdf'
where #s is your string.
Or if the file extension can change
select left(#s,len(#s)-charindex('_',REVERSE(#s))+1)
+ right(#s,charindex('.',REVERSE(#s)))
Related
I am running queries in a large IBM DB2 database table (let's call it T) and have found that the cells for column Identifier tend to be padded not just on the margins, but in between as well, as in: ' ID1 ID2 '. I do not have rights to update this DB, nor would I, given a number of factors. However, I want a way to ignore the whitespace AT LEAST on the left and right, even if I need to simply add a couple of spaces in between. The following queries work, but are slow, upwards of 20 seconds slow....
SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%';
SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2';
SELECT * FROM T WHERE TRIM(Identifier) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1 ID2%';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1 ID2';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';
Trying to query something like "Select * FROM T WHERE REPLACE(Identifier, ' ', '')..." of course just freezes up Access until I Ctrl+Break to end the operation. Is there a better, more efficient way to ignore the whitespace?
================================
UPDATE:
As #Paul Vernon describes below, "Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces."
This led me to generate combinations of spaces before 'ID1' and 'ID2' and select the records using the IN clause. The number of combinations means that the query is slower than if I knew the exact match. This is how it looks in my Java code with Jdbc (edited to make it more generic to the key issue):
private static final int MAX_LENGTH = 30;
public List<Parts> queryMyTable(String ID1, String ID2) {
String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)";
final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2);
return namedJdbcTemplate.query(query,parameters,new PartsMapper());
}
public static List<String> getIDCombinations(String ID1, String ID2) {
List<String> combinations = new ArrayList<>();
final int literalLength = ID1.length() + ID2.length();
final int maxWhitespace = MAX_LENGTH - literalLength;
combinations.add(ID1+ID2);
for(int x = 1; x <= maxWhitespace; x++){
String xSpace = String.format("%1$"+x+"s", "");
String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2);
String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2);
combinations.add(idZeroSpaceBeforeBase);
combinations.add(idZeroSpaceAfterBase);
for(int y = 1; (x+y) <= maxWhitespace; y++){
String ySpace = String.format("%1$"+y+"s", "");
String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2);
combinations.add(id);
}
}
return combinations;
}
Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.
Assuming there is an index on the Identifier, your only option (if you can't change the data, or add a functional index or index a generated column), is probably something like this
SELECT * FROM T
WHERE
Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
which the Db2 optimize might implement as 6 index lookups, which would be faster than a full index or table scan
You could also try this
SELECT * FROM T
WHERE
Identifier LIKE 'ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
which the Db2 optimize might implement as 3 index range scans,
In both examples add more lines to cover the maximum number of leading spaces you have in your data if needed. In the first example add more lines for the embeded spaces too if needed
Index on the expression REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') and the following query should make Db2 use this index:
SELECT *
FROM T
WHERE REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') = 'ID1 ID2'
If you need to search excluding leading and trailing spaces, then no traditional indexes can help you with that, at least as you show the case. To make the query fast, the options I can see are:
Full Text Search
You can use a "full text search" solution. DB2 does include this functionality, but I don't remember if it's included by default in the license or is sold separately. In any case, it requires a bit of indexing or periodic re-indexing of the data to make sure the search is up to date. It's worth the effort if you really need it. You'll need to change your app, since the mechanics are different.
Index on extra, clean column
Another solution is to index the column without the leading or trailing spaces. But you'll need to create an extra column; on a massive table this operation can take some time. The good news is that once is created then there's no more delay. For example:
alter table t add column trimmed_id varchar(100)
generated always as (trim(identifier));
Note: You may need to disable/enable integrity checks on the table before and after this clause. DB2 is picky about this. Read the manual to make sure it works. The creation of this column will take some time.
Then, you need to index it:
create index ix1 on t (trimmed_id);
The creation of the index will also take some time, but it should be faster than the step above.
Now, it's ready. You can query your table by using the new column instead of the original one (that's still there), but this time, you can forget about leading and traling spaces. For example:
SELECT * FROM T WHERE trimmed_id LIKE 'ID1%ID2';
The only wildcard now shows up in the middle. This query will be much faster than reading the whole table. In fact, the longer the string ID1 is, the faster the query will be, since the selectivity will be better.
Now, if ID2 is longer than ID1 then you can reverse the index to make it fast.
I am trying to extract the last file name from a field in SQL where the separator is /, and there is also one after the last file name. (I am using this to create a new filed in a BI web intelligence document.)
Filename1/filename2/filename3/filename4/ result required Filename4
File1/file2/file3/file4/file5/file6 result required file6
I have tried various combinations but without success. As you can see the file names are not of a standard length and the number of folders is variable.
Any help on this would really be appreciated.
Thank you
Lyn
Depending on your answer to my comment ... do you have a input string that ends with "/" or not ? I have put both types of test strings in this query using SQL 2008 as dbms. Just comment out either Set #tstString to run each condition and you will see the two result possibilities.
Declare #tmpFirstMark int
Declare #tmpLastMark int
Declare #tmpUseMark int
Declare #tstString varchar(100)
Set #tstString = 'Filename1/filename2/filename2/filename4/'
Set #tstString = 'File1/file2/file3/file4/file5/file6'
-- Calculate 1st Occurrence of "/"
Set #tmpFirstMark = PATINDEX('%/%',#tstString)
-- Calculate last Occurrence of "/"
Set #tmpLastMark = (LEN(#tstString) - PATINDEX('%/%',REVERSE(#tstString)) + 1)
-- Calculate 2nd to last Occurrence of "/"
Set #tmpUseMark = #tmpLastMark - PATINDEX('%/%', REVERSE(SUBSTRING(#tstString, 1, #tmpLastMark-1)))
Select
#tstString
,#tmpFirstMark
,#tmpLastMark
,#tmpUseMark
,SUBSTRING(#tstString, #tmpLastMark + 1, LEN(#tstString)) as 'resultSTR'
,SUBSTRING(#tstString, #tmpUseMark + 1, #tmpLastMark-#tmpUseMark-1) as 'otherResult'
I would use a regular expression to retrieve the desired output :
([^/]+)/?$
This will match as many non-/ characters as possible (at least 1) before the end of the string, that may be followed by an optional /.
You will want to use the first group of the match to retrieve the filename of a directory without its trailing /.
You haven't specified your RDBMS and I'm not so confortable with using regexps in SQL so I hope you'll be able to piece that together in your SQL dialect.
I've been working on this for days and can't seem to work it out. Basically I need return digits from a field before there is a forward slash. e.g. if the field was 1234/TEXT I want to return 1234. I can't just use left fieldname 4 as the digits vary in left e.g. 12345/TEXT, so it needs to be anything left of the forward slash. Now in the World of MS Access, it is something like this - and it works
Left(TABLE!FIELD,InStr(1,TABLE!FIELD,"/")-1)
However, how do I convert this to be used in an IBM\DB2 system? The DB2 SQL seems somewhat different to 'normal' SQL.
Thanks!
Rather than INSTR, maybe LOCATE
LOCATE(char, string)
char is the search term
string is the string being searched
You can achieve this by combining LOCATE with SUBSTR;
Locate information
Substring information
Cheat sheet (for this example);
SUBSTRING('FIELD','START POSITION', 'LENGTH')
LOCATE('SEARCH STRING', 'SOURCE STRING')
SUBSTRING lets you retrieve specific characters from a string, i.e.;
AFIELD = 'Hello'
SUBSTRING(AFIELD,4,2)
Result = 'lo' (position 4 and 5 of Hello)
LOCATE returns the position of the first character of the search string it finds as a number, i.e.;
AFIELD = 'Hello'
LOCATE('ello', AFIELD)
Result = 2 (it starts at position 2)
So you can combine these to do what you want, example;
XTABLE has 1 column called ACOL with the following values in it;
123467/ABCD
1321/ABDD
1123467/ABCD
To just retrieve the numbers;
SELECT SUBSTRING(ACOL,1, LOCATE('/',ACOL)-1)
FROM XRDK/XTABLE
Result;
123467
1321
1123467
What are we doing?
SUBSTRING(
ACOL,
1,
LOCATE('/',ACOL)-1
)
SUBSTRING(
Field ACOL,
Starting at position 1,
Length; using locate set this to where I find a '/' and subtract 1 from the
resulting postion (without the -1 you'd have the / on the end)
)
Try this
SELECT SUBSTRING(CAST (ROUND(COLUMN,2) AS DECIMAL(6,2)), 0, locate('/',CAST (ROUND(COLUMN,2) AS DECIMAL(6,2))))
FROM TABLE
I have a table called documents one of the fields is called location which shows the file path for the document. I need to change it from D:\........ to H:\.....
How can I do this using update in sql as the file paths vary in length and there are lots of records
You can use string helper function to achieve the same. Something like below
UPDATE documents SET location = 'H:' + Mid(location, 2, Len(location) - 2)
WHERE Left(location, 1) = 'D'
Here, Len() function returns the length of the string literal
Left() function returns 1 character from the left of the string literal
Mid() function give you substring from a string (starting at any position)
See MS Access: Functions for more information on the same.
If I have a number (such as 88) and I want to perform a LIKE query in Rails on a primary ID column to return all records that contain that number at the end of the ID (IE: 88, 288, etc.), how would I do that? Here's the code to generate the result, which works fine in SQLLite:
#item = Item.where("id like ?", "88").all
In PostgreSQL, I'm running into this error:
PG::Error: ERROR: operator does not exist: integer ~~ unknown
How do I do this? I've tried converting the number to a string, but that doesn't seem to work either.
Based on Erwin's Answer:
This is a very old question, but in case someone needs it, there is one very simple answer, using ::text cast:
Item.where("(id::text LIKE ?)", "%#{numeric_variable}").all
This way, you find the number anywhere in the string.
Use % wildcard to the left only if you want the number to be at the end of the string.
Use % wildcard to the right also, if you want the number to be anywhere in the string.
Simple case
LIKE is for string/text types. Since your primary key is an integer, you should use a mathematical operation instead.
Use modulo to get the remainder of the id value, when divided by 100.
Item.where("id % 100 = 88")
This will return Item records whose id column ends with 88
1288
1488
1238872388
862388
etc...
Match against arbitrary set of final two digits
If you are going to do this dynamically (e.g. match against an arbitrary set of two digits, but you know it will always be two digits), you could do something like:
Item.where(["id % 100 = ?", last_two_digits)
Match against any set or number of final digits
If you wanted to match an arbitrary number of digits, so long as they were always the final digits (as opposed to digits appearing elsewhere in the id field), you could add a custom method on your model. Something like:
class Item < ActiveRecord
...
def find_by_final_digits(num_digits, digit_pattern)
# Where 'num_digits' is the number of final digits to match
# and `digit_pattern` is the set of final digits you're looking fo
Item.where(["id % ? = ?", 10**num_digits, digit_pattern])
end
...
end
Using this method, you could find id values ending in 88, with:
Item.find_by_final_digits(2, 88)
Match against a range of final digits, of any length
Let's say you wanted to find all id values that end with digits between 09 and 12, for whatever reason. Maybe they represent some special range of codes you're looking up. To do this you could do another custom method to use Postgres' BETWEEN to find on a range.
def find_by_final_digit_range(num_digits, start_of_range, end_of_range)
Item.where(["id % ? BETWEEN ? AND ?", 10**num_digits, start_of_range, end_of_range)
end
...and could be called using:
Item.find_by_final_digit_range(2, 9, 12)
...of course, this is all just a little crazy, and probably overkill.
The LIKE operator is for string types only.
Use the modulo operator % for what you are trying to do:
#item = Item.where("(id % 100) = ?", "88").all
I doubt it "works" in SQLite, even though it coerces the numeric types to strings. Without leading % the pattern just won't work.
-> sqlfiddle demo
Cast to text and use LIKE as you intended for arbitrary length:
#item = Item.where("(id::text LIKE ('%'::text || ?)", "'12345'").all
Or, mathematically:
#item = Item.where("(id % 10^(length(?)) = ?", "'12345'", "12345").all
LIKE operator does not work with number types and id is the number type so you can use it with concat
SELECT * FROM TABLE_NAME WHERE concat("id") LIKE '%ID%'