Excluding prefixes and suffixes from Faker("name") - factory-boy

I've noticed that Faker("name") sometimes return prefixes (e.g., Mr.) and suffixes (e.g., MD). Is there a good way to exclude prefixes and suffixes from the names generated?
Thanks

You could very well just concatenate faker.first_name() and faker.last_name().

Related

How to convert string to URL with arq/tarql?

I got a TSV file that I'm converting with tarql.
Column prop has strings like dc:source, skos:broader etc. How can I convert these to the corresponding URLs? Assume I have all needed prefixes defined in the tarql query.
I can do this statically eg uri(concat(str(dc:),"source")) but how to do it dynamically? The problem can be narrowed to this: given a prefix dc: how to expand it to the appropriate URL?
Looked at ARQ functions but didn't find anything appropriate. If there's no other solution, I can use a VALUES table that repeats the prefixes and namespaces, but what an ugly solution...
The tarql:expandPrefixedName(?qname) function (completely coincidentally committed just today) does exactly what you need: It expands a prefixed name to a full IRI, using any prefixes declared in the query.
The tarql namespace is declared implicitly in every Tarql query.

What is the name of this naming convention? "this-is-an-identifier"

Parts of a composed name are separated by "-" and all the letter are lower case. I know that thisIsAIdentifier is camel case, but don't the name of this one ;)
This-is-a-hyphenated-identifier

Is it possible to ignore characters in a string when matching with a regular expression

I'd like to create a regular expression such that when I compare the a string against an array of strings, matches are returned with the regex ignoring certain characters.
Here's one example. Consider the following array of names:
{
"Andy O'Brien",
"Bob O'Brian",
"Jim OBrien",
"Larry Oberlin"
}
If a user enters "ob", I'd like the app to apply a regex predicate to the array and all of the names in the above array would match (e.g. the ' is ignored).
I know I can run the match twice, first against each name and second against each name with the ignored chars stripped from the string. I'd rather this by done by a single regex so I don't need two passes.
Is this possible? This is for an iOS app and I'm using NSPredicate.
EDIT: clarification on use
From the initial answers I realized I wasn't clear. The example above is a specific one. I need a general solution where the array of names is a large array with diverse names and the string I am matching against is entered by the user. So I can't hard code the regex like [o]'?[b].
Also, I know how to do case-insensitive searches so don't need the answer to focus on that. Just need a solution to ignore the chars I don't want to match against.
Since you have discarded all the answers showing the ways it can be done, you are left with the answer:
NO, this cannot be done. Regex does not have an option to 'ignore' characters. Your only options are to modify the regex to match them, or to do a pass on your source text to get rid of the characters you want to ignore and then match against that. (Of course, then you may have the problem of correlating your 'cleaned' text with the actual source text.)
If I understand correctly, you want a way to match the characters "ob" 1) regardless of capitalization, and 2) regardless of whether there is an apostrophe in between them. That should be easy enough.
1) Use a case-insensitivity modifier, or use a regexp that specifies that the capital and lowercase version of the letter are both acceptable: [Oo][Bb]
2) Use the ? modifier to indicate that a character may be present either one or zero times. o'?b will match both "o'b" and "ob". If you want to include other characters that may or may not be present, you can group them with the apostrophe. For example, o['-~]?b will match "ob", "o'b", "o-b", and "o~b".
So the complete answer would be [Oo]'?[Bb].
Update: The OP asked for a solution that would cause the given character to be ignored in an arbitrary search string. You can do this by inserting '? after every character of the search string. For example, if you were given the search string oleary, you'd transform it into o'?l'?e'?a'?r'?y'?. Foolproof, though probably not optimal for performance. Note that this would match "o'leary" but also "o'lea'r'y'" if that's a concern.
In this particular case, just throw the set of characters into the middle of the regex as optional. This works specifically because you have only two characters in your match string, otherwise the regex might get a bit verbose. For example, match case-insensitive against:
o[']*b
You can add more characters to that character class in the middle to ignore them. Note that the * matches any number of characters (so O'''Brien will match) - for a single instance, change to ?:
o[']?b
You can make particular characters optional with a question mark, which means that it will match whether they're there or not, e.g:
/o\'?b/
Would match all of the above, add .+ to either side to match all other characters, and a space to denote the start of the surname:
/.+? o\'?b.+/
And use the case-insensitivity modifier to make it match regardless of capitalisation.

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.

How can I check for a certain suffix in my string?

I got a list of strings. And I want to check for every string in there. Sometimes, a string can have the suffix _anim(X) where X is an integer. If such string has that kind of suffix, I need to check for all other strings that have the same "base" (the base being the part without suffix) and finally group such strings and send them to my function.
So, given the next list:
Man_anim(1)
Woman
Man_anim(3)
Man_anim(2)
My code would discover the base Man has a special suffix, and will then generate a new list grouping all Man objects and arrange them depending on the value inside parenthesis. The code is supposed to return
Man_anim(1)
Man_anim(2)
Man_anim(3)
And send such list to my function for further processing.
My problem is, how can I check for the existence of such suffix, and afterwards, check for the value inside parenthesis?
If you know that the suffix is going to be _anim(X) every time (obviously, with X varying) then you can use a regular expression:
Regex.IsMatch(value, #"_anim\(\d+\)$")
If the suffix isn't at least moderately consistent, then you'll have to look into data structures, like Suffix Trees, which you can use to determine common structures in strings.