Redis vs Memcached for serving JSON on Heroku

Redis vs Memcached for serving JSON on Heroku - ruby-on-rails-3

I have a single page app (Rails + Backbone.js + Postgres on Heroku), and as some of my queries are starting to slow down for users with lots of data (there are multiple queries per object), I want to start caching the JSON I'm sending the client.
I'm already using Redis with Resque, so I'm not sure if I should be using the same redis instance for both Resque and general data caching. Is that a reason to go with Memcached?
I guess I'm looking for general input from those with experience with either so I can quickly decide on one of the two and start caching stuff (sorry if a clear-cut answer cannot be given).
Thanks for any help.

Both will cache strings just fine. Although I think that using redis for a simple cache is an overkill. I'd go with memcached.
Blog post from Salvatore on caching with Redis.

Related

Redis performance in localhost

I am trying to check redis performance against mysql in my windows localhost. I am a student and we are learning various things in my school. I have around 1048580 records in mysql local and I am performing various rest operations. I also have implemented redis to store the values by using springboot cacheable and lettuce. It all works fine but I don't know how to measure the performance to see thaat redis is performing better than mysql. I think it would be easier on a very laarge scale company structure. can I simulate on my local? Also, how to benchmark redis performances on my local for my academic project?
I have tried sending multiple requests in a loop to try to determine performance but don't see much of a difference for localhost - my records. I have tried understanding various commands of redis cli monitoring but don't see much latency.

Well it depends on how you are actually testing these redis vs MySQL. You have to keep in mind that MySQL internally use caches, also if you use hibernet then it also does a level of caching. If you do make same get request several time then there would not be any major difference between redis and MySQL result.
You should compare your result by doing several different operation, like inserting/deleting/getting thousands of different values. Then same operation for identical values etc.

Optimize response time from API, with SQL Server involved

I have a project with requirements that response time should be under 0.5sec under load as 3000 concurrency users;
I have few API which use some aggregation from SQL Server.
when we testing it with 3000CCU average response time about 15 second. And also 500 error due to SQL can't handle so many requests. Actually requests to SQL Server interrupt with timeout)
Our current instance is r4.2xlarge 8CPU and 61GB Memory.
All code is asynchronous without blocking operations.
We running our app behind load balancer with 10 instances 300 CCU per instance in this case. utilization on instances about 30%. The bottleneck currently is SQL server.
I see few solution. Setup some big SQL, Cluster or Sharding, I'm not really sure. I'm not strong in that.
Or use cache for request. We have mostly read only data, which we can cache after aggregation.
Update:
I need solution to cache exactly sql responses. To order work with it late with LINQ.
But it seems there is no ready solution for that.
I found good try for that called CacheManager. But there are few problems exist with this.
It works with Redis only in sync mode, means use sync command instead of async.
There is no implementation of concurrency lock, which can occur in our case because we have 10 instances. We need solution which work with as distributed cache.
There are few bugs which utilize Redis multiplexor wrong. And you constantly will have connection problem.
Please advice how to overcome this issue. How you solve it. I'm sure there are people who already solve it somehow.

I enable query store on sql and monitor all missing indexes. Ef core generates some of request absolutely different way that expected. After creating missing indexes performance became much better. But i still have problem to handle required CCU. I explore all existing solution which extend ef core to cache. Most of them was written in sync version. Which just can’t utilize all benefits of async. As well i did’t found any distributed cache that implement distributed lock. Finally I create this lib which extend ef core and and distributed cache in redis .cache allow us scale a lot more. And now everything just flight;) leave it here, for someone who have performance issue like me. https://github.com/grinay/Microsoft.EntityFrameworkCore.DistributedCache

Should all data be stored in Redis?

I am building a news site. Currently, I use MySQL as main data store and Redis to maintain list of articles for a user home page feed. When users click on an article on home page, I connect to MySQL to get the main content of the articles, comments, and related stuff.
Is it best practice if I store all article data in Redis? I mean instead of connecting to MySQL to get the whole content of an article, I store the main content of articles in Redis so that the performance can be improved?

This is opinion-based, so here's my opinion. Redis is primed to be used as a cache. You need to decide what to cache, and if caching is actually necessary. This depends on the scale of your app. If the articles change a lot and you do not have a huge user/visitor base, I do not think Redis is necessary at all. Remember you cannot search for stuff there. You can't go SELECT articles WHERE author='foo' in Redis.
If, on the other hand, you are seeing a massive increase in DB load due to to many users, you could pre-render the HTML for all the articles and put that into Redis. That would save the DB and the web server some load. But only if you already know which articles you want to display.

That depends on the role redis is supposed to take in your case.
If it serves as a cache, you could try to store more data in redis, where possible. As long as the development overhead is small and the process doesn't introduce new sources of errors.
In case you want redis to be a primary source for your data, what it doesn't sound like in your case, you could also decide to move everything away from MySQL. With low, and "rarely" changing data, it might be worth a shot. But remember to back up the database and sync to the HDD after changes.

Best way to manage redis data

Just getting started with redis, and I'm having a hard time managing the redis data.
Are there any tools that help give a visualization of my applications redis data?

Try Keylord - cross-platform GUI application for manage key-value databases like Redis, LevelDB, etc.
support Redis and LevelDB key-value databases
SSH tunnels for Redis connections
display keys in flat and hierarchical views
can load millions of keys in background (use SCAN* command)
can create/read/update/delete keys of different types
clear and predictable UI

There is Redis Admin UI / on github, it is .NET based.
I have not tried it myself, but the screenshots and the live demo look promising.
There is also phpRedisAdmin from ErikDubbelboer, which is working according to the poster of this very similar question: phpMyAdmin equivalent to MySQL for Redis?
At my company, when developing in Redis and dealing with a large number of keys, we create and maintain a custom management page while developing. The reason for this is that it allows us to create the best 'custom' representation of the data.

I think AnotherRedisDesktopManager is a useful tool for managing redis, faster, better and more stable. What's more, it won't crash when loading a large number of keys.

I am working on a tool like that (phpMyRedis), but there are no current working tools like that I know of.

You can try FastoRedis site programm - crossplatform Redis GUI client based on redis-cli.

Index replication and Load balancing

Am using Lucene API in my web portal which is going to have 1000s of concurrent users.
Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing.
Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?

You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.
That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.
Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.

From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.
Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.

We are working on a similar implementation to what you are describing as a proof of concept. What we see as an end-product for us consists of three separate servers to accomplish this.
There is a "publication" server, that is responsible for generating the indices that will be used. There is a service implementation that handles the workflows used to build these indices, as well as being able to signal completion (a custom management API exposed via WCF web services).
There are two "site-facing" Lucene.NET servers. Access to the API is provided via WCF Services to the site. They sit behind a physical load balancer and will periodically "ping" the publication server to see if there is a more current set of indicies than what is currently running. If it is, it requests a lock from the publication server and updates the local indices by initiating a transfer to a local "incoming" folder. Once there, it is just a matter of suspending the searcher while the index is attached. It then releases its lock and the other server is available to do the same.
Like I said, we are only approaching the proof of concept stage with this, as a replacement for our current solution, which is a load balanced Endeca cluster. The size of the indices and the amount of time it will take to actually complete the tasks required are the larger questions that have yet to be proved out.
Just some random things that we are considering:
The downtime of a given server could be reduced if two local folders are used on each machine receiving data to achieve a "round-robin" approach.
We are looking to see if the load balancer allows programmatic access to have a node remove and add itself from the cluster. This would lessen the chance that a user experiences a hang if he/she accesses during an update.
We are looking at "request forwarding" in the event that cluster manipulation is not possible.
We looked at solr, too. While a lot of it just works out of the box, we have some bench time to explore this path as a learning exercise - learning things like Lucene.NET, improving our WF and WCF skills, and implementing ASP.NET MVC for a management front-end. Worst case scenario, we go with something like solr, but have gained experience in some skills we are looking to improve on.

I'm creating the Indices on the publishing Backend machines into the filesystem and replicate those over to the marketing.
That way every single, load & fail balanced, node has it's own index without network latency.
Only drawback is, you shouldn't try to recreate the index within the replicated folder, as you'll have the lockfile lying around at every node, blocking the indexreader until your reindex finished.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas