Am trying to determine how one attempts to identify, in Snowflake SQL, if a product code begins with three letters.
Suggestions?
I did just try: LEFT(P0.PRODUCTCODE,3) NOT LIKE '[a-zA-Z]%' but it didn't work.
Thanks folks
You can use REGEXP_LIKE to return a boolean value indicating whether or not your string matched the pattern you're interested in.
In your case, something like REGEXP_LIKE(string_field_here, '[a-zA-Z]{3}.*')
Breaking down the regular expression pattern:
[a-zA-Z]: Only match letter characters, both upper and lowercase
{3}: Require three of those letters
.*: Allow any number of any characters after those three letters
Note: in many cases, you would need to specifically indicate the beginning/ending of the string in the pattern, but Snowflake's implementation handles that for you. From the docs:
The function implicitly anchors a pattern at both ends (i.e. ''
automatically becomes '^$', and 'ABC' automatically becomes '^ABC$').
To match any string starting with ABC, the pattern would be 'ABC.*'.
You can try running these examples:
SELECT REGEXP_LIKE('abc', '[a-zA-Z]{3}.*') AS _abc,
REGEXP_LIKE('123', '[a-zA-Z]{3}.*') AS _123,
REGEXP_LIKE('abc123', '[a-zA-Z]{3}.*') AS _abc123,
REGEXP_LIKE('123abc', '[a-zA-Z]{3}.*') AS _123abc
I am working on some legacy code at the moment and have come across the following:
FooString = String.Format("{0:####0.000000}", FooDouble)
My question is, is the format string here, ####0.000000 any different from simply 0.000000?
I'm trying to generalize the return type of the function that sets FooDouble and so checking to make sure I don't break existing functionality hence trying to work out what the # add to it here.
I've run a couple tests in a toy program and couldn't see how the result was any different but maybe there's something I'm missing?
From MSDN
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string.
Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
Because you use one 0 before decimal separator 0.0 - both formats should return same result.
Could you provide a regex that match Twitter usernames?
Extra bonus if a Python example is provided.
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+)
I've used this as it disregards emails.
Here is a sample tweet:
#Hello how are #you doing #my_friend, email #000 me # whats.up#example.com #shahmirj
Matches:
#Hello
#you
#my_friend
#shahmirj
It will also work for hashtags, I use the same expression with the # changed to #.
If you're talking about the #username thing they use on twitter, then you can use this:
import re
twitter_username_re = re.compile(r'#([A-Za-z0-9_]+)')
To make every instance an HTML link, you could do something like this:
my_html_str = twitter_username_re.sub(lambda m: '%s' % (m.group(1), m.group(0)), my_tweet)
The regex I use, and that have been tested in multiple contexts :
/(^|[^#\w])#(\w{1,15})\b/
This is the cleanest way I've found to test and replace Twitter username in strings.
#!/usr/bin/python
import re
text = "#RayFranco is answering to #jjconti, this is a real '#username83' but this is an#email.com, and this is a #probablyfaketwitterusername";
ftext = re.sub( r'(^|[^#\w])#(\w{1,15})\b', '\\1\\2', text )
print ftext;
This will return me as expected :
RayFranco is answering to jjconti, this is a real 'username83' but this is an#email.com, and this is a #probablyfaketwitterusername
Based on Twitter specs :
Your username cannot be longer than 15 characters. Your real name can be longer (20 characters), but usernames are kept shorter for the sake of ease.
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
Twitter recently released to open source in various languages including Java, Ruby (gem) and Javascript implementations of the code they use for finding user names, hash tags, lists and urls.
It is very regular expression oriented.
The only characters accepted in the form are A-Z, 0-9, and underscore. Usernames are not case-sensitive, though, so you could use r'#(?i)[a-z0-9_]+' to match everything correctly and also discern between users.
This is a method I have used in a project that takes the text attribute of a tweet object and returns the text with both the hashtags and user_mentions linked to their appropriate pages on twitter, complying with the most recent twitter display guidelines
def link_tweet(tweet):
"""
This method takes the text attribute from a tweet object and returns it with
user_mentions and hashtags linked
"""
tweet = re.sub(r'(\A|\s)#(\w+)', r'\1#\2', str(tweet))
return re.sub(r'(\A|\s)#(\w+)', r'\1#\2', str(tweet))
Once you call this method you can pass in the param my_tweet[x].text. Hope this is helpful.
Shorter, /#([\w]+)/ works fine.
This regex seems to solve Twitter usernames:
^#[A-Za-z0-9_]{1,15}$
Max 15 characters, allows underscores directly after the #, (which Twitter does), and allows all underscores (which, after a quick search, I found that Twitter apparently also does). Excludes email addresses.
I have used the existing answers and modified it for my use case. (username must be longer then 4 characters)
^[A-z0-9_]{5,15}$
Rules:
Your username must be longer than 4 characters.
Your username must be shorter than 15 characters.
Your username can only contain letters, numbers and '_'.
Source: https://help.twitter.com/en/managing-your-account/twitter-username-rules
In case you need to match all the handle, #handle and twitter.com/handle formats, this is a variation:
import re
match = re.search(r'^(?:.*twitter\.com/|#?)(\w{1,15})(?:$|/.*$)', text)
handle = match.group(1)
Explanation, examples and working regex here:
https://regex101.com/r/7KbhqA/3
Matched
myhandle
#myhandle
#my_handle_2
twitter.com/myhandle
https://twitter.com/myhandle
https://twitter.com/myhandle/randomstuff
Not matched
mysuperhandleistoolong
#mysuperhandleistoolong
https://twitter.com/mysuperhandleistoolong
You can use the following regex: ^#[A-Za-z0-9_]{1,15}$
In python:
import re
pattern = re.compile('^#[A-Za-z0-9_]{1,15}$')
pattern.match('#Your_handle')
This will check if the string exactly matches the regex.
In a 'practical' setting, you could use it as follows:
pattern = re.compile('^#[A-Za-z0-9_]{1,15}$')
if pattern.match('#Your_handle'):
print('Match')
else:
print('No Match')
I have this string:
201057&channelTitle=null_JS
I want to be able to cut out the '201057' and make it a new variable. But I don't always know how long the digits will be, so can I somehow use the '&' as a reference?\
myDigits substring(0, position of &)?
Thanks
Sure, you can split the string along the &.
String s = "201057&channelTitle=null_JS";
String[] parts = s.split("&");
String newVar = parts[0];
The expected result here is
parts[0] = "201057";
parts[1] = "channelTitle=null_JS";
In production code you chould check of course the length of the parts array, in case no "&" was present.
Several programming languages also support the useful inverse operation
String s2 = parts.join("&"); // should have same value like s
Alas this one is not part of the Java standard libs, but e.g. Apache Commons Lang features it.
Always read the API first. There is an indexOf method in String that will return you the first index of the character/String you gave it.
You can use myDigits.substring(0, myDigits.indexOf('&');
However, if you want to get all of the arguments in the query separately, then you should use mvw's answer.
I need to store an alphanumeric string in an integer column on one of my models.
I have tried:
#result.each do |i|
hex_id = []
i["id"].split(//).each{|c| hex_id.push(c.hex)}
hex_id = hex_id.join
...
Model.create(:origin_id => hex_id)
...
end
When I run this in the console using puts hex_id in place of the create line, it returns the correct values, however the above code results in the origin_id being set to "2147483647" for every instance. An example string input is "t6gnk3pp86gg4sboh5oin5vr40" so that doesn't make any sense to me.
Can anyone tell me what is going wrong here or suggest a better way to store a string like the aforementioned example as a unique integer?
Thanks.
Answering by request form OP
It seems that the hex_id.join operation does not concatenate strings in this case but instead sums or performs binary complement of the hex values. The issue could also be that hex_id is an array of hex-es rather than a string, or char array. Nevertheless, what seems to happen is reaching the maximum positive value for the integer type 2147483647. Still, I was unable to find any documented effects on array.join applied on a hex array, it appears it is not concatenation of the elements.
On the other hand, the desired result 060003008600401100500050040 is too large to be recorded as an integer either. A better approach would be to keep it as a string, or use different algorithm for producing a number form the original string. Perhaps aggregating the hex values by an arithmetic operation will do better than join ?