Extract only number/numeric - sql

I have URL which is received by the application. The URL can take any format, but it always has a number at the end.
For example -
/Page/Modern-Day-Hotel-59460
/Page/Future-Fun-Days-At-Beach-223345
/Page/Hello-Page-123
/Page/This-Page/Second-Page/Main-Product-44231
As you can see there is a numeric value at the end of URL. I'm trying to extract only the numeric portion of the URL. is there any way to achieve this ? Since the URL length will change and the number size also varies, I'm looking for something generic
I'm using amazon redshift database

If the number is always preceded by a hyphen (as in your examples), relatively simple string manipulation is sufficient:
select right(col, position('-' in reverse(col)) - 1)
Note that this returns the last string after the last hyphen. It does not check that it is a number.

Related

Is format ####0.000000 different to 0.000000?

I am working on some legacy code at the moment and have come across the following:
FooString = String.Format("{0:####0.000000}", FooDouble)
My question is, is the format string here, ####0.000000 any different from simply 0.000000?
I'm trying to generalize the return type of the function that sets FooDouble and so checking to make sure I don't break existing functionality hence trying to work out what the # add to it here.
I've run a couple tests in a toy program and couldn't see how the result was any different but maybe there's something I'm missing?
From MSDN
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string.
Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
Because you use one 0 before decimal separator 0.0 - both formats should return same result.

Filtering rows in Pentaho

I have a dataset with columns containing numbers. However, some of the rows in that column have missing data. Instead of numbers, a dash (-) is placed in the cell.
What I want to happen is to separate those rows with a dash and output them to a separate excel file. Those without the dash, should output to a csv file.
I tried the "filter rows" but it gives me an error:
Unexpected conversion error while converting value [constant String] to a Number
constant String : couldn't convert String to number
constant String : couldn't convert String to number : non-numeric character found at position 1 for value [-]
My condition is if
Column1 CONTAINS - (String)
You cant try to convert to number in the select step,and handler the error, if can not convert to number that mean that is (-)
You can convert missing value indicators (like a dash or any other string) to null in Text-File-Input - see field option "Null if". That way you still can use the metadata detection feature and will not trip over a dash arriving in a Number field.
With CSV-File-Input you should stick to the String datatype until a Null-If step has cleansed the values, so you can change the datatype to Number in a Select-Values step.
If you must preserve the dash character, don't use metadata detection (as it suggests datatype Number) or use more rows to sample (so a field with a dash is encountered) or just revert the datatype to String again before saving and running the transformation.
My solution lies on the first 'Replace in String'. I replaced the dash into something numeric and can easily be distinguished from the rest of the numbers (I used 9999) and carried on with the rest of my process.
In filter rows, I had no problems anymore with the data type because both my variables and condition contained numbers, therefore, it no longer had to convert anything.
After filter rows, I added the 'Null-if' to remove the random 9999 that I used
just to have something to replace the dash.
After that, the separation was made just as I hope it would.
Thanks to #marabu for the Null-if idea.

How to search within a URL field in Solr? (like *wildcard*)

In Solr I have a field dedicated to URLs. The URL field can be anywhere up to 2000 in length. However, I only ever need to search the first 200 characters.
Example URL:
https://www.google.co.uk/search/2014/here/?q=help+me&oq=stackoverflow&aqs=c
I've experimented over the last 2 weeks with Grams and various combinations of Tokenizers to no avail. I always seem to fall short. I would provide examples but they are all standard so no point cluttering this with non-working types.
The main problem seems to be with how Solr deals with punctuation. It treats non-A-z/0-9 characters as separators. How do I disable this for a field?
For example I can search: 'google' and get the correct result, but when I search 'google.co' nothing comes back. Same problem with most of the non-A-z/0-9 characters, it seems to treat them as a separator.
Everything needs to be *wildcard*searchable from 4char strings up to 200 char strings.
So the following search terms would return the above result. '&aqs','ow&aqs=','ps://www.goo','q=help+','2014/he'... etc
How would you define a field type for the URL wildcard use case?
You can use a string field for your url and use a filter that cuts it off to 200 characters.It can be a regex expressions also to keep only 200 characters for that field.
String field will match the exact tokens

Using SQL - how do I match an exact number of characters?

My task is to validate existing data in an MSSQL database. I've got some SQL experience, but not enough, apparently. We have a zip code field that must be either 5 or 9 digits (US zip). What we are finding in the zip field are embedded spaces and other oddities that will be prevented in the future. I've searched enough to find the references for LIKE that leave me with this "novice approach":
ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9]'
AND ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Is this really what I must code? Is there nothing similar to...?
ZIP NOT LIKE '[\d]{5}' AND ZIP NOT LIKE '[\d]{9}'
I will loath validating longer fields! I suppose, ultimately, both code sequences will be equally efficient (or should be).
Thanks for your help
Unfortunately, LIKE is not regex-compatible so nothing of the sort \d. Although, combining a length function with a numeric function may provide an acceptable result:
WHERE ISNUMERIC(ZIP) <> 1 OR LEN(ZIP) NOT IN(5,9)
I would however not recommend it because it ISNUMERIC will return 1 for a +, - or valid currency symbol. Especially the minus sign may be prevalent in the data set, so I'd still favor your "novice" approach.
Another approach is to use:
ZIP NOT LIKE '%[^0-9]%' OR LEN(ZIP) NOT IN(5,9)
which will find any row where zip does not contain any character that is not 0-9 (i.e only 0-9 allowed) where the length is not 5 or 9.
There are few ways you could achieve that.
You can replace [0-9] with _ like
ZIP NOT LIKE '_'
USE LEN() so it's like
LEN(ZIP) NOT IN(5,9)
You are looking for LENGTH()
select * from table WHERE length(ZIP)=5;
select * from table WHERE length(ZIP)=9;
To test for non-numeric values you can use ISNUMERIC():
WHERE ISNUMERIC(ZIP) <> 1

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.