What manipulations can be done to user emails to prevent duplicates - authentication

I am woking on email based authentication that checks database for existing users based on their email and decides whether to create new account or use existing one.
Issue I came across is that users sometimes use different capitalisation in their emails, append things like +1 in the middle etc...
To combat some of these I am now (1) Stripping whitespaces away from the emails (2) always lowercasing them.
I would like to take this further, but am not sure what else I am allowed to do without breaking some emails i.e.
(3) Can I remove everything after + and before # signs?
(4) Can I remove other symbols like . from the emails?

Email addresses are case-insensitive (A and a are treated the same), so changing all upper case to lower case is fine. Digits (0-9) are also valid for emails.
However, you should not remove any of the following characters from an email address:
!#$%&'*+-/=?^_`{|}~.
Control characters, white space and other specials are invalid.
If you discover characters not in the list of 20 characters above, they would represent an invalid email. How those are handled is undefined in the standard.
Why removing the + is an issue:
It is used by some mail providers to separate (file) inbound email into folders for a user. So jack+finance#email.com would go to a finance folder in Jack's email. Other mail providers would consider it part of the email address. So jack+bauer#email.com can be a different account than jack+sparrow#email.com.
So removing the + (along with characters after it) could conflate different email accounts into an invalid email address.

Can I remove everything after + and before # signs? Can I remove other symbols like . from the emails?
Sure, you can - but should you?
If you don't care about standards and want to block valid email addresses, then block any characters you like.
RFC 822 - Standard for ARPA Internet Text Messages and RFC 2822 - Internet Message Format clearly specify the valid characters for email addresses.
+ is no different to x, ! or $
The local-part (before #) can contain:
uppercase and lowercase Latin letters (A-Z, a-z)
numeric values (0-9)
special characters, such as # ! % $ ‘ &
+ * – \ = ? ^ _ . { | } ~ `
...and you can block x, ! or $ or indeed any of them - but again - should you?
See: https://mozilla.fandom.com/wiki/User:Me_at_work/plushaters

No. Any manipulation along these lines is speculative at best, and harmful at worst. Some providers regard some characters as insignificant (so, for example, Gmail will famously ignore any dots in the localpart) but there is no safe generalization.
The only sane and safe way to validate an email address remains to send a message to it, and discard the address if the recipient does not respond e.g. by clicking a link in the message or replying to it within a reasonable time frame (say, 48 hours). And if you don't have any previous relationship with the owner of this mailbox, don't; then you're a spammer.

You can treat gmail separately. (This is what some banks do today.)
If the address is gmail, you do your items (3) and (4). (Removing the plus part and ignoring the dots before the ‘#‘ sign.). It is a good idea to warn the user at registration before removing.
For other email providers, since it is impossible to keep track how each one behaves, better to accept both the dot and plus.
Considering gmail addresses are the most frequently used ones for subscriptions, you should be OK to go for most cases.

Related

How to make a Password Validator in Scratch

So I am trying to make a password validator in Scratch where it asks the user to input an answer and then it puts the answer through some criterias and then outputs if it is a valid password or not. The criterias are:
Has at least 8 characters,
Has at least one uppercase letter,
Has at least one lowercase letter,
Has at least one number,
Has at least one special character,
Must contain less than 18 characters.
I tried to make a list first with all the different characters and check if the password contained them, but it doesn't actually work. I looked all over the internet for help on this but no one seems to have done it. The Scratch Wiki does have some stuff about case sensitivity but I haven't really been able to implement it. I really need help and I have been trying for a while now. Thanks.
If you just check if the password contains the list, it will only work if it has every single character of the list in order. If you want to make sure it contains each check, you're probably going to have to make a system that checks each letter for every check, which is a little complex.
Check if <lowercase letter/whatever check> contains(letter(text reading #) of (password))
If it passes this check, continue to the next check and set text reading # to 1. Otherwise, change text reading # by 1.
I assume you'll know how to code this properly, but I just partially phrased in the way a normal human would.
This will repeat until either it reaches the end of the password or it passes the check. it will then do this again, but for a different check. It's hard to explain in text, and this is my first answer, but I hope it helps.
You have to use the operators "contains", "length of" and > operators, from the end of the class. Combine "contains", "or" and "and".

What is the format of a Helium API "B58" address?

The Helium API includes several requests that specify an address in "B58" format (examples).
What is the "B58" format, and what API will return a B58 address given a Helium node name?
What is the "B58" format
It is sort of like base64 encoding, but some easily confused characters have been removed from the alphabet. From wikipedia:
Similar to Base64, but modified to avoid both non-alphanumeric
characters (+ and /) and letters that might look ambiguous when
printed (0 – zero, I – capital i, O – capital o and l – lower-case L).
Base58 is used to represent bitcoin addresses.[2] Some messaging and
social media systems break lines on non-alphanumeric strings. This is
avoided by not using URI reserved characters such as +. For segwit it
was replaced by Bech32, see below.
and what API will return a B58 address given a Helium node name?
You want: https://api.helium.io/v1/hotspots/name/:name
From here: https://docs.helium.com/api/blockchain/hotspots/#hotspots-for-name
I think it was added sort of recently, so it likely didn't exist when you asked this question.

What does an SQL Injection query such as below means? What will be the answer of this

' AND ascii(substring((SELECT concat(login,0x3a,password) from users limit 0,1),1,1))>96#
I am working on bee box machine and practicing blind SQL injection. I know what the ascii and substring does, I also know what concat does, but why we are concatenating the login and password together since we have to generate a single ascii character?why we had 1,1 in the end.
It will attempt to get a 'user' and 'password' from the first row of a table called 'users', concat them with a colon in the middle (dave:passw), then takes the first character of that, and tests if that first character is a lowercase letter, or the characters { | } ~
No idea why that information would be useful to an attacker, but sometimes a whole battery of tests adds up to information even though each test in isolation is irrelevant.
You can check the full ASCII code here:
https://www.ascii-code.com/
As you can see from the web, the ASCII code 96 is
`
any character after that (ASCII number larger than 96) are valid English characters (in lower case) regardless of those special symbols, which means for you secure login user, at least it should be a person name start with those characters (in lower case).
So you were taking the first character of your concatenated login string by SUBSTRING(long_string,1,1), which is trying to validate if the login person's account start with a English character(in lower case)

Regular expression to find usernames in NSString Objective C [duplicate]

Could you provide a regex that match Twitter usernames?
Extra bonus if a Python example is provided.
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+)
I've used this as it disregards emails.
Here is a sample tweet:
#Hello how are #you doing #my_friend, email #000 me # whats.up#example.com #shahmirj
Matches:
#Hello
#you
#my_friend
#shahmirj
It will also work for hashtags, I use the same expression with the # changed to #.
If you're talking about the #username thing they use on twitter, then you can use this:
import re
twitter_username_re = re.compile(r'#([A-Za-z0-9_]+)')
To make every instance an HTML link, you could do something like this:
my_html_str = twitter_username_re.sub(lambda m: '%s' % (m.group(1), m.group(0)), my_tweet)
The regex I use, and that have been tested in multiple contexts :
/(^|[^#\w])#(\w{1,15})\b/
This is the cleanest way I've found to test and replace Twitter username in strings.
#!/usr/bin/python
import re
text = "#RayFranco is answering to #jjconti, this is a real '#username83' but this is an#email.com, and this is a #probablyfaketwitterusername";
ftext = re.sub( r'(^|[^#\w])#(\w{1,15})\b', '\\1\\2', text )
print ftext;
This will return me as expected :
RayFranco is answering to jjconti, this is a real 'username83' but this is an#email.com, and this is a #probablyfaketwitterusername
Based on Twitter specs :
Your username cannot be longer than 15 characters. Your real name can be longer (20 characters), but usernames are kept shorter for the sake of ease.
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
Twitter recently released to open source in various languages including Java, Ruby (gem) and Javascript implementations of the code they use for finding user names, hash tags, lists and urls.
It is very regular expression oriented.
The only characters accepted in the form are A-Z, 0-9, and underscore. Usernames are not case-sensitive, though, so you could use r'#(?i)[a-z0-9_]+' to match everything correctly and also discern between users.
This is a method I have used in a project that takes the text attribute of a tweet object and returns the text with both the hashtags and user_mentions linked to their appropriate pages on twitter, complying with the most recent twitter display guidelines
def link_tweet(tweet):
"""
This method takes the text attribute from a tweet object and returns it with
user_mentions and hashtags linked
"""
tweet = re.sub(r'(\A|\s)#(\w+)', r'\1#\2', str(tweet))
return re.sub(r'(\A|\s)#(\w+)', r'\1#\2', str(tweet))
Once you call this method you can pass in the param my_tweet[x].text. Hope this is helpful.
Shorter, /#([\w]+)/ works fine.
This regex seems to solve Twitter usernames:
^#[A-Za-z0-9_]{1,15}$
Max 15 characters, allows underscores directly after the #, (which Twitter does), and allows all underscores (which, after a quick search, I found that Twitter apparently also does). Excludes email addresses.
I have used the existing answers and modified it for my use case. (username must be longer then 4 characters)
^[A-z0-9_]{5,15}$
Rules:
Your username must be longer than 4 characters.
Your username must be shorter than 15 characters.
Your username can only contain letters, numbers and '_'.
Source: https://help.twitter.com/en/managing-your-account/twitter-username-rules
In case you need to match all the handle, #handle and twitter.com/handle formats, this is a variation:
import re
match = re.search(r'^(?:.*twitter\.com/|#?)(\w{1,15})(?:$|/.*$)', text)
handle = match.group(1)
Explanation, examples and working regex here:
https://regex101.com/r/7KbhqA/3
Matched
myhandle
#myhandle
#my_handle_2
twitter.com/myhandle
https://twitter.com/myhandle
https://twitter.com/myhandle/randomstuff
Not matched
mysuperhandleistoolong
#mysuperhandleistoolong
https://twitter.com/mysuperhandleistoolong
You can use the following regex: ^#[A-Za-z0-9_]{1,15}$
In python:
import re
pattern = re.compile('^#[A-Za-z0-9_]{1,15}$')
pattern.match('#Your_handle')
This will check if the string exactly matches the regex.
In a 'practical' setting, you could use it as follows:
pattern = re.compile('^#[A-Za-z0-9_]{1,15}$')
if pattern.match('#Your_handle'):
print('Match')
else:
print('No Match')

"Error validating Name:Invalid string." When Adding Customer

I receive this error when adding a QBO Customer with a name that has greater/less than characters ('>', '<'). I've looked through the documentation and can't find a lit of unacceptable characters. How do I know what is and is not acceptable? I just need to know what to look for to sanitize our local data before uploading.
The API is XML-based, right? That means you need to either escape the characters into ">" and "<" respectively, probably with a standard library available in your environment; or filter the 5 characters listed here.