Ambigous route matches on slugified urls - asp.net-core

I'm getting an ambiguous match for uri: Toyota-Corolla-vehicles/2 Iv'e isolated the problem down to these two routes
[HttpGet("{make}-vehicles/{makeId:int}")]
[HttpGet("{make}-{query}-vehicles/{makeId:int}")]
It looks pretty unambiguous to me. Shouldn't the uri match the route with two dashes in it?
For more context:
I'm using readable url's like this Toyota-vehicles-in-2005. So I can't use forward slashes for separation.
[HttpGet("{make}-vehicles-in-{year}/{makeId:int}")]
The docs state:
Complex segments (for example, [Route("/dog{token}cat")]), are
processed by matching up literals from right to left in a non-greedy
way. See the source code for a description. For more information, see
this issue.
https://github.com/aspnet/Routing/blob/9cea167cfac36cf034dbb780e3f783114ef94780/src/Microsoft.AspNetCore.Routing/Patterns/RoutePatternMatcher.cs#L296
https://github.com/aspnet/AspNetCore.Docs/issues/8197

I would suggest you to use these routes:
[HttpGet("{make}/vehicles/{makeId:int}")]
[HttpGet("{make}/{query}/vehicles/{makeId:int}")]
There is no ambiguity between Toyota/vehicles/2 and Toyota/Corolla/vehicles/2 in that case. In your case it has ambiguity due to fact, that {query} is type of string, so Toyota-Corolla-vehicles string matches both {make}-vehicles and {make}-{query}-vehicles, because we can parse it like:
ALL {make} parameter equals to Toyota-Corolla;
{make}-{query} equals to Toyota-Corolla, where {make} is Toyota and {query} is Corolla correspondingly.
So, the problem is with your character -. If you don't want to change your routes, you can leave only [HttpGet("{make}-vehicles/{makeId:int}")] and distinguish then Toyota-Corolla by string.Split method.

Related

URL-parameters input seems inconsistent

I have review multiple instructions on URL-parameters which all suggest 2 approaches:
Parameters can follow / forward slashes or be specified by parameter name and then by parameter value. so either:
1) http://numbersapi.com/42
or
2) http://numbersapi.com/random?min=10&max=20
For the 2nd one, I provide parameter name and then parameter value by using the ?. I also provide multiple parameters using ampersand.
Now I have see the request below which works fine but does not fit into the rules above:
http://numbersapi.com/42?json
I understand that the requests sets 42 as a parameter but why is the ? not followed by the parameter name and just by the value. Also the ? seems to be used as an ampersand???
From Wikipedia:
Every HTTP URL conforms to the syntax of a generic URI. The URI generic syntax consists of a hierarchical sequence of five components:
URI = scheme:[//authority]path[?query][#fragment]
where the authority component divides into three subcomponents:
authority = [userinfo#]host[:port]
This is represented in a syntax diagram as:
As you can see, the ? ends the path part of the URL and starts the query part.
The query part is usually a &-separated string of name=value pairs, but it doesn't have to be, so json is a valid value for the query part.
Or, as the Wikipedia articles says it:
An optional query component preceded by a question mark (?), containing a query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of attribute–value pairs separated by a delimiter.
It is also fairly common for request processors to treat a name=value pair that is missing the = sign, as if the it was name=.
E.g. if you're writing Servlet code and call servletRequest.getParameter("json"), it would return an empty string ("") for that last URL in the question.

IP Address/Hostname match regex

I need to match two ipaddress/hostname with a regular expression:
Like 20.20.20.20
should match with 20.20.20.20
should match with [http://20.20.20.20/abcd]
should not match with 20.20.20.200
should not match with [http://20.20.20.200/abcd]
should not match with [http://120.20.20.20/abcd]
should match with AB_20.20.20.20
should match with 20.20.20.20_AB
At present i am using something like this regular expression: "(.*[^(\w)]|^)20.20.20.20([^(\w)].*|$)"
But it is not working for the last two cases. As the "\w" is equal to [a-zA-Z0-9_]. Here I also want to eliminate the "_" underscore. I tried different combination but not able to succeed. Please help me with this regular expression.
(.*[_]|[^(\w)]|^)10.10.10.10([_]|[^(\w)].*|$)
I spent some more time on this.This regular expression seems to work.
I don't know which language you're using, but with Perl-like regular expressions you could use the following, shorter expression:
(?:\b|\D)20\.20\.20\.20(?:\b|\D)
This effectively says:
Match word boundary (\b, here: the start of the word) or a non-digit (\D).
Match IP address.
Match word boundary (\b, here: the end of the word) or a non-digit (\D).
Note 1: ?: causes the grouping (\b|\D) not to create a backreference, i.e. to store what it has found. You probably don't need the word boundaries/non-digits to be stored. If you actually need them stored, just remove the two ?:s.
Note 2: This might be nit-picking, but you need to escape the dots in the IP address part of the regular expression, otherwise you'd also match any other character at those positions. Using 20.20.20.20 instead of 20\.20\.20\.20, you might for example match a line carrying a timestamp when you're searching through a log file...
2012-07-18 20:20:20,20 INFO Application startup successful, IP=20.20.20.200
...even though you're looking for IP addresses and that particular one (20.20.20.200) explicitly shouldn't match, according to your question. Admittedly though, this example is quite an edge case.

Is it possible to ignore characters in a string when matching with a regular expression

I'd like to create a regular expression such that when I compare the a string against an array of strings, matches are returned with the regex ignoring certain characters.
Here's one example. Consider the following array of names:
{
"Andy O'Brien",
"Bob O'Brian",
"Jim OBrien",
"Larry Oberlin"
}
If a user enters "ob", I'd like the app to apply a regex predicate to the array and all of the names in the above array would match (e.g. the ' is ignored).
I know I can run the match twice, first against each name and second against each name with the ignored chars stripped from the string. I'd rather this by done by a single regex so I don't need two passes.
Is this possible? This is for an iOS app and I'm using NSPredicate.
EDIT: clarification on use
From the initial answers I realized I wasn't clear. The example above is a specific one. I need a general solution where the array of names is a large array with diverse names and the string I am matching against is entered by the user. So I can't hard code the regex like [o]'?[b].
Also, I know how to do case-insensitive searches so don't need the answer to focus on that. Just need a solution to ignore the chars I don't want to match against.
Since you have discarded all the answers showing the ways it can be done, you are left with the answer:
NO, this cannot be done. Regex does not have an option to 'ignore' characters. Your only options are to modify the regex to match them, or to do a pass on your source text to get rid of the characters you want to ignore and then match against that. (Of course, then you may have the problem of correlating your 'cleaned' text with the actual source text.)
If I understand correctly, you want a way to match the characters "ob" 1) regardless of capitalization, and 2) regardless of whether there is an apostrophe in between them. That should be easy enough.
1) Use a case-insensitivity modifier, or use a regexp that specifies that the capital and lowercase version of the letter are both acceptable: [Oo][Bb]
2) Use the ? modifier to indicate that a character may be present either one or zero times. o'?b will match both "o'b" and "ob". If you want to include other characters that may or may not be present, you can group them with the apostrophe. For example, o['-~]?b will match "ob", "o'b", "o-b", and "o~b".
So the complete answer would be [Oo]'?[Bb].
Update: The OP asked for a solution that would cause the given character to be ignored in an arbitrary search string. You can do this by inserting '? after every character of the search string. For example, if you were given the search string oleary, you'd transform it into o'?l'?e'?a'?r'?y'?. Foolproof, though probably not optimal for performance. Note that this would match "o'leary" but also "o'lea'r'y'" if that's a concern.
In this particular case, just throw the set of characters into the middle of the regex as optional. This works specifically because you have only two characters in your match string, otherwise the regex might get a bit verbose. For example, match case-insensitive against:
o[']*b
You can add more characters to that character class in the middle to ignore them. Note that the * matches any number of characters (so O'''Brien will match) - for a single instance, change to ?:
o[']?b
You can make particular characters optional with a question mark, which means that it will match whether they're there or not, e.g:
/o\'?b/
Would match all of the above, add .+ to either side to match all other characters, and a space to denote the start of the surname:
/.+? o\'?b.+/
And use the case-insensitivity modifier to make it match regardless of capitalisation.

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.

Find a url for a file in html using a regular expression

I've set myself a somewhat ambitious first task in learning regular expressions (and one which relates to a problem I'm trying to solve). I need to find any instance of a url that ends in .m4v, in a big html string.
My first attempt was this for jpg files
http.*jpg
Which of course seems correct on first glance, but of course returns stuff like this:
http://domain.com/page.html" title="Misc"><img src="http://domain.com/image.jpg
Which does match the expression in theory. So really, I need to put something in http.*m4v that says 'only the closest instance between http and m4v'. Any ideas?
As you've noticed, an expression such as the following is greedy:
http:.*\.jpg
That means it reads as much input as possible while satisfying the expression.
It's the "*" operator that makes it greedy. There's a well-defined regex technique to making this non-greedy… use the "?" modifier after the "*".
http:.*?\.jpg
Now it will match as little as possible while still satisifying the expression (i.e. it will stop searching at the first occurrence of ".jpg".
Of course, if you have a .jpg in the middle of a URL, like:
http://mydomain.com/some.jpg-folder/foo.jpg
It will not match the full URL.
You'll want to define the end of the URL as something that can't be considered part of the URL, such as a space, or a new line, or (if the URL in nested inside parentheses), a closing parenthesis. This can't be solved with just one little regex however if it's included in written language, since URLs are often ambiguous.
Take for example:
At this page, http://mysite.com/puppy.html, there's a cute little puppy dog.
The comma could technically be a part of a URL. You have to deal with a lot of ambiguities like this when looking for URLs in written text, and it's hard not to have bugs due to the ambiguities.
EDIT | Here's an example of a regex in PHP that is a quick and dirty solution, being greedy only where needed and trying to deal with the English language:
<?php
$str = "Checkout http://www.foo.com/test?items=bat,ball, for info about bats and balls";
preg_match('/https?:\/\/([a-zA-Z0-9][a-zA-Z0-9-]*)(\.[a-zA-Z0-9-]+)*((\/[^\s]*)(?=[\s\.,;!\?]))\b/i', $str, $matches);
var_dump($matches);
It outputs:
array(5) {
[0]=>
string(38) "http://www.foo.com/test?items=bat,ball"
[1]=>
string(3) "www"
[2]=>
string(4) ".com"
[3]=>
string(20) "/test?items=bat,ball"
[4]=>
string(20) "/test?items=bat,ball"
}
The explanation is in the comments.
Perl, ruby, php and javascript should all work with these:
/(http:\/\/(?:(?:(?!\http:\/\/).))+\.jpg)/
The URLs will be stored in the matched groups. Tested this out against "http://a.com/b.jpg-folder/c.jpg http://mydomain.com/some.jpg-folder/foo.jpg" and it worked correctly without being too greedy.