Redis - Check if an email is already in use - redis

In my application I store users as user:n where n is a unique ID.
When a new user is created I increment a global variable such as user_count and use that ID as user:n.
But, I have an issue where I need to ensure an email is not already in use. I've done some reading around and the only way I can see how to do this is to:
1) Loop through the users. But, I am not keen on this solution as it could cause slower performance right?
2) Create a lookup that contains a list of email addresses used.
Both solutions seem a bit strange to me as I come from an SQL background.
Are these the only options available? I also have to do the same check for usernames too.

You could use Sets:
On registration: sadd taken_emails "john#example.com"
And testing with: sismember taken_emails "bob#exmaple.com"
Note that you have a possible race-condition where two users try to use the same email at the same time, both test and get "free" and then both register with it. You could use a lock to make sure they don't both get it, or make the registration operation atomic with either WATCH/MULTI/EXEC or with a lua script.

Related

Is hash always preferable if I'm always getting multiple fields in redis?

Let's say I am checking information about some of my users every second. I need to take an action on some of those users that may take more than a second. Something like this:
#pseudocode
users = DB.query("SELECT * FROM users WHERE state=5");
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
});
I want to make sure I won't accidentally run doThatThing on a user who has it already running. I am thinking of solving it by setting cache keys based on the user ID as things are processed
#pseudocode
runningUsers = redis.getMeThoseUsers();
users = DB.query("SELECT * FROM users WHERE state=5 AND id NOT IN (runningUsers)");
redis.setThoseUsers(users);
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
redis.unsetThatUser(user);
});
I am unsure if I should...
Use one hash with a field per user
Use multiple keys with mset and hget
Is there a performance or business reason I'd want one over the other? I am assuming I should use a hash so I can use hgetall to know who is running on that hash vs doing a scan on something like runningusers:*. Does that seem right?
Generally speaking, option 1 (Use one hash with a field per user) is probably the best method in most cases because you want to access all fields for the users at once. It can be achieved by using HGETALL.
But when go you for 2nd option (use multiple keys with mset and mget) you want to query every single time in redis to get the user details.By using MGET you can access all key values but you want to know the key name for each users. It will be suitable when you are accessing few fields in an object.Disadvantages: possibly slower when you need to access all/most of the fields for the users.
NOTE: By using 1st option you can't set TTL for single users because in redis there is no support for TTL for child keys in hash structure,the only way you should set for entire hash.But by using 2nd option, you can set TTL for every single users.

Get multiple sets

I've currently got a dataset which is something like:
channel1 = user1,user2,user3
channel2 = user4,user5,user6
(note- these are not actual names, the text is not a predictable sequence)
I would like to have the most optimized capability for the following:
1) Add user to a channel
2) Remove user from a channel
3) Get list of all users in several selected channels, maintaining knowledge of which channel they are in (in case it matters- this can also be simply checking whether a channel has any users or not without getting an actual list of them)
4) Detect if a specific user is in a channel (willing to forego this feature if necessary)
I'm a bit hungup on the fact that there are only two ways I can see of getting multiple keys at once:
A) Using regular keys and a mget key1, key2, key3
In this solution, each value would be a JSON string which can then be manipulated and queried clientside to add/remove/determine values. This itself has a couple problems- firstly that it's possible another client will change the data while it's being processed (i.e. this solution is not atomic) and it's not easy to detect right away if a channel contains a specific user even though it is easy to detect if a channel has any users (this is low priority, as stated above)
B) Using sets and sunion
I would really like to use sets for this solution somehow, the above solution just seems wrong... but I cannot see how to query multiple sets at once and maintain info about which set each member is from or if any of the sets in the union have 0 members (sunion only gives me a final set of all the combined members)
Any solutions which can implement the above points 1-4 in optimal time and atomic operations?
EDIT: One idea which might work in my specific case is to store the channel name as part of the username and then use sets. Still, it would be great to get a more generic answer
Short answer: use sets + pipelining + MULTI/EXEC, or sets + Lua.
1) Add user to a channel
SADD command
2) Remove user from a channel
SREM command
3) Get list of all users in several selected channels
There are several ways to do it.
If you don't need strict atomicity, you just have to pipeline several SMEMBERS commands to retrieve all the sets in one roundtrip. If you are just interested whether channels have users or not, you can replace SMEMBERS by SCARD.
If you need strict atomicity, you can pipeline a MULTI/EXEC block containing SMEMBERS or SCARD commands. The output of the EXEC command will contain all the results. This is the solution I would recommend.
An alternative (atomic) way is to call a server-side Lua script using the EVAL command. Lua script executions are always atomic. The script could take a number of channel as input parameters, and build a multi-layer bulk reply to return the output.
4) Detect if a specific user is in a channel
SISMEMBER command - pipeline them if you need to check for several users.

REDIS - How can I query keys and get their values in one query?

Using keys I can query the keys as you can see below:
redis> set popo "pepe"
OK
redis> set coco "kansas"
OK
redis> set cool "rock"
OK
redis> set cool2 "punk"
OK
redis> keys *co*
1) "cool2"
2) "coco"
3) "cool"
redis> keys *ol*
1) "cool2"
2) "cool"
Is there any way to get the values instead of the keys? Something like: mget (keys *ol*)
NOTICE: As others have mentioned, along with myself in the comments on the original question, in production environments KEYS should be avoided. If you're just running queries on your own box and hacking something together, go for it. Otherwise, question if REDIS makes sense for your particular application, and if you really need to do this - if so, impose limits and avoid large blocking calls, such as KEYS. (For help with this, see 2015 Edit, below.)
My laptop isn't readily available right now to test this, but from what I can tell there isn't any native commands that would allow you to use a pattern in that way. If you want to do it all within redis, you might have to use EVAL to chain the commands:
eval "return redis.call('MGET', unpack(redis.call('KEYS', KEYS[1])))" 1 "*co*"
(Replacing the *co* at the end with whatever pattern you're searching for.)
http://redis.io/commands/eval
Note: This runs the string as a Lua script - I haven't dove much into it, so I don't know if it sanitizes the input in any way. Before you use it (especially if you intend to with any user input) test injecting further redis.call functions in and see if it evaluates those too. If it does, then be careful about it.
Edit: Actually, this should be safe because neither redis nor it's lua evaluation allows escaping the containing string: http://redis.io/topics/security
2015 Edit: Since my original post, REDIS has released 2.8, which includes the SCAN command, which is a better fit for this type of functionality. It will not work for this exact question, which requests a one-liner command, but it's better for all reasonable constraints / environments.
Details about SCAN can be read at http://redis.io/commands/scan .
To use this, essentially you iterate over your data set using something like scan ${cursor} MATCH ${query} COUNT ${maxPageSize} (e.g. scan 0 MATCH *co* COUNT 500). Here, cursor should always be initialized as 0.
This returns two things: first is a new cursor value that you can use to get the next set of elements, and second is a collection of elements matching your query. You just keep updating cursor, calling this query until cursor is 0 again (meaning you've iterated over everything), and push the found elements into a collection.
I know SCAN sounds like a lot more work, but I implore you, please use a solution like this instead of KEYS for anything important.

What is a simple way to tell if a row of data has been changed?

If I have a row of data like:
1, 2, 3
I can create a checksum value that is the sum of all of the columns, 1 + 2 + 3 = 6. We can store this value with the row in the 4th column:
1, 2, 3, 6
I can then write a program to check to see if any of the values in the columns changed accidentally if the sum of the columns don't match the checksum value.
Now, I'd like to take this a step further. Let's say I have a table of values that anyone has read/write access to where the last column of data is the sum of the previous columns as described earlier.
1, 2, 3, 6
Let's say someone wants to be sneaky and change the value in the third column
1, 2, 9, 6
The checksum is easy to reproduce so the sneaky individual can just change the checksum value to 1 + 2 + 9 = 12 so that this row appears not to be tampered with.
1, 2, 9, 12
Now my question is, how can I make a more sophisticated checksum value so that a sneaky individual can't make this type of change without making the checksum no longer valid? Perhaps I could create a blackbox exe that given the first three values of the row can give a checksum that is a little more sophisticated like:
a^2 + b^2 + c^2
But while this logic is unknown to a sneaky user, he/she could still input the values into the black box exe and get a valid checksum back.
Any ideas on how I can make sure all rows in a table are untampered with? The method I'm trying to avoid is saving a copy of the table every time it is modified legitimately using the program I am creating. This is possible, but seems like a very unelegant solution. There has to be a better way, right?
Using basic math your checksum is invalid:
a^2 +b^2 +c^2
a=0,b=0,c=2 = checksum 4
a=2,b=0,c=0 = checksum 4
If you want a set of "read-only" data to the users, consider using materialized views. A materialized view will compute the calculation a head of time i.e. your valid data and serve that to the users, while your program can do modifications in the background.
Further this is the reason why privileges exist, if you only supply accounts that cannot modify the database for instance read-only access, this mitigates the issue of someone tampering with data. Also you cannot fully prevent a malicious user from tampering with data only make them jump through several hoops in hopes they get bored / blocked temporarily.
There is no silver bullet for security, what you can do is use a defense in depth mindset that would consist of the following features:
Extensive Logging,
Demarcation of responsibilities,
Job rotation,
Patch management,
Auditing of logs (goes together with logging, but someone actually has to read them),
Implement a HIPS system (host intrusion prevention system),
Deny outside connections to the database
The list can go on quite extensively.
You seem to be asking, "how can I give a program a different set of security permissions to the user running it?" The way to do this is to make sure the program is running in a different security context to the user. Ways of doing this vary by platform.
If you have multiple machines, then running a client server architecture can help. You expose a controlled API through the server, and it has the security credentials for the database. Then your user can't make arbitrary requests.
If you're the administrator of the client machine, and the user isn't then you may be able to have separate processes doing something similar. E.g. a daemon in unix. I think DCOM in windows lets you do something like this.
Another approach is to expose your API through stored procedures, and only grant access to these, rather than direct access to the table.
Having controlled access to a limited API may not be enough. Consider, for example, a table that stores High Scores in a game. It doesn't matter that it can only be accessed through a ClaimHighScore API, if the user can enter arbitrary values. The solution for this in games is usually complicated. The only approach I've heard of that works is to define the API in terms of a seed value that gave the initial game state, and then a set of inputs with timestamps. The server then has to essentially simulate the game to verify the score.
Users should not have unconstrained write access to tables. Better would be to create sprocs for common CRUD operations. This would let you control which fields they can modify, and if you insist you could update a CRC() checksum or other validation.
This would be a big project, so it may not be practical right now - but it's how things should be done.
Although your question is based on malicious entries to a database the use of the MOD11 can find inaccurate or misplaced values.
The following MySQL statement and SQLfiddle illustrate this
SELECT id, col1, col2, col3, col4, checknum,
9 - MOD(((col1*5)+(col2*4)+(col3*3)+(col4*2) ),9)
AS Test FROM `modtest` HAVING checknum =Test

Experimental/private branch for OID numbers in LDAP Schemas?

Attributes or object classes in LDAP schemas are identified through a unique number called OID. Moreover OIDs are also used in the SNMP protocol. Everyone can apply for an enterprise number by the IANA and then define his own subnumbers. But the processing of the application can last up to 30 days.
Does anyone know if there is a "test" branch of OID numbers that could be used for experimental purposes while waiting for an official enterprise number?
Apparently the OID branch 2.25 can be used with UUIDs without registration.
The detailled explanation can be found here:
http://www.oid-info.com/get/2.25 and there is also a link to an UUID generator.
=> I think it's good solution for unregistered OIDs. Simply generate one such OID with the UUID-Generator. You will get something like 2.25.178307330326388478625988293987992454427 and can then simply make your own subnumbers by adding .1, .2, ... at the end.
There is also the possibility to register such an 2.25 OID, but a human intervention is still needed and uniqueness isn't totally garanteed as it is still possible (although unlikely) that someone else uses the same OID as unregistered OID. For registered OIDs I would still prefer the registration of a private entreprise number by the IANA.
Here is also a list of how to get an OID assigned: http://www.oid-info.com/faq.htm#10. But the main answers are already listed here.
No. However, if there is nothing published from your work no one will know.
Some LDAP server companies will sub OID numbers if you wanted to try something. But you could just makeup anything.
The currently assigned numbers only start with 0, 1, or 2. If you started with 4 or something, any savey person would know you were faking it.
We put some info together on OIDs here:
http://ldapwiki.willeke.com/wiki/HowToGetYourOwnLDAPOID
-jim
I don't know where you're based. In the UK, each company gets it's own OID branch to play with as it will http://www.oid-info.com/get/1.2.826.0
(Not sure if there are similar setups in other countries
You could try following for internal prototyping (check "Object Identifiers (OIDs)" paragraph).