Is it possible to migrate Memcached data to Redis? - redis

I have searched Google and SO and surprisingly haven't been able to find a very good answer to this. Is there any sort of method or best practice for migrating data from a Memcached cluster into a Redis cluster? Are there are any tools or tricks to help with this process?
If it's not possible, is there any recommendation for how to handle this kind of scenario? I assume that pointing to a new cache that hasn't been pre-warmed will cause some application slowness, I'm just wondering if there are any workaround for this, assuming a migration of the Memcache data isn't possible.

Related

Could this be used for redis

I have been advised to take a look at redis for a nodejs app I am building. I think it looks really neat! I have gone through the try redis application but am confused on one point, how do you model something like this in a key based architecture?
When someone vists my webpage, I would like to store: Date, Os, Browser, and Browser version to display as stats on the page.
I could store it all as completely separate keys, but that would make it impossible to do something like: How many people visited my site yesterday on windows running chrome 28.
How would you model something like this in redis? Should I use sql instead? Thanks for all of the help.
I don't think Redis is the best tool to do what you want to do. Redis is great, but is not very much than a key/value storage server. Although you can search for patterns in keys and make indexes do fancy stuff (specially intervals of dates in your case) it would lead to a messy database both to make and to use.
I have used nodejs+redis before, and for pub/sub functionalities it`s nice. But in your case you will be better with sql. Or check mongoDB, it's really powerful and easy to use with mongoose.

Easiest API to learn/methdology to create web applications for running mapreduce on hadoop?

I have hadoop 1.0.4 running on my ubuntu 11.04,configured with eclipse I want to make a web application to run hadoop jobs, or may be Cassandra,Hbase and Hive might be a way but I don't have much time to learn thoroughly all these and I want to do it as quickly as possible.Any advice which one might prove the easiest to get started with ?
I don't know if this question really qualifies to be here on SO in its current form. This is the reason I did not write this initially. But, a lot of SO experts are out there to decide this(they can do it much better than me) :)
Having said that, I would like to share a few things with you based on my personal experience, so that you proceed towards the correct path. First of all, Hadoop jobs(MapReduce) and Hive are actually not a good fit for web services kinda use cases. They are most suitable for offline, batch processing kinda stuff. HBase/Cassandra can be used though, if you have real time needs(like web services).
Coming back to your actual question. Before diving into Hadoop, Hive, HBase etc, I would suggest you to get some hold on web services first(if you are new to web services as well). Reason being, a web service is something which has much wider scope of applicability as compared to tools like Hadoop, Hive, HBase etc. These tools are specific to some particular use cases and cannot be used everywhere. But, web services are used almost everywhere and with n number of different things, like RDBMSs, NoSQL datastores etc etc. So if you know web service concepts you definitely have that extra edge. To begin with you can visit these links :
Web Services Tutorial by W3Schools(Nice n easy. Would serve the quick start guide purpose).
For a detailed tutorial you can visit the oracle web services tutorial.
This link by IBM developerworks has references to some really good web services learning stuff.
You might find this one really helpful to start with(Shows how to create web services using Eclipse).
And you can obviously Google web service tutorials anytime.
One last thing. Although it's not mandatory to be a pro in things like Hadoop, Hive, HBase etc, but having some decent amount of understanding of the concepts would be really helpful in developing your solution in a much better manner. It'll allow you to think accurately in the correct direction.
HTH.

NHibernate Search Clustered Lucene Index

We are using NHibernate Search in an application which is going to be clustered.
I have been reading up on the approaches for maintaining separate collections, in particular the master/slave configuration and I was wondering how to go about implementing it using MSMQ if indeed there is an implementation for this at this time. The JMS implementation (as described in NHibernate Search in Action) seems a little daunting to me, especially as we are using a .NET environment.
Alternatively, I'm open to suggestions with regard to instantiating local RAMDirectories for the lucene collections. I know that Lucene can build a RAMDir from a FSDir and I know how to initialise an NHibernate app with a blank RAMDir but I'm getting a little lost when it comes to initialising an app with a RAMDir from an existing (network shared) FSDir.
Or indeed any other approaches.
Cheers,
Steve
I have actually recently came across this very problem. Primarily because we shared index over several webapps, for realtime updates to the indexes. However, we suffered from index corruption and couldn't really figure out why, aswell as the fact that it wouldn't work in a clustered enviroment.
My approach was this: using a service that schematically indexed new entities at very frequent intervals, aswell as reindexing everything at certain timeintervals. I also run an optimize automatically, since NHSearch doesn't seem to support auto-optimize yet.
At application start, I index everything into a RAMDirectoryProvider.
The choice you make depends highly on the data that you want to index and how sensitive you are to delays in that data, aswell as how frequent it changes. In my case, it was to allow textsearches amongst product-data for a website, so any delay was fine with me.
I did some brief research about Master-Slave providers, however I think I felt that NHSearch is rather imature as opposed to the original Java-implementation.
For me, an optimal solution would be to have a Master-Master provider, that would cross-apply all updates to the index on all nodes. I haven't research how much work it would be to write a DirectoryProvider myself, but that would be an option but much effort aswell.

NHibernate with Sql Azure and Sharding

Does anyone have any good sources of information of using NHibernate with Sql Azure with the implications of sharding (because of the 10gb cap)? I know there are posts on the internet that reference a sharding project for NH but they are from 3rd quarter 09 and I haven't found any much more relevant on google.
Related does anyone have information about manually implementing sharding if the sharding project isn't viable to use yet? Would it just be as simple as creating a session factory for each shard and keep a collection of factories? That seems like it would be problematic reproducing the ISession calls through each factory however I suppose it could be achieved by passing operations as Funcs that get invoked on the ISession from each factory but seems more like the wrong path to be going down.
I wrote a proof of concept about a month ago using NHibernate on SQLAzure/Sharding. As you've pointed out, there are aspects that just do not feel right about it. Until the NH support has evolved, you may have to try a few things to find out what works best for you. I can tell you a general flow of how it worked for us.
We implemented a simple sharding strategy factory that provides strategies that decide which shard to place you in based on our needs. Your needs may vary here. The key is creating strategies that process, merge and order your query results. From there, session creation and usage is all the same as any other session usage, which is highly desirable.
EDIT: I know this post by Ayende is a few months old, but it's exactly how we implemented it and it works. The rumor is better support in nHibernate will be coming.

Amazon SimpleDB

Has anyone considered using something along the lines of the Amazon SimpleDB data store as their backend database?
SQL Server hosting (at least in the UK) is expensive so could something like this along with cloud file storage (S3) be used for building apps that could grow with your application.
Great in theory but would anyone consider using it. In fact is anyone actually using it now for real production software as I would love to read your comments.
This is a good analysis of Amazon services from Dare.
S3 handled what I've typically heard described as "blob storage". A typical Web application typically has media files and other resources (images, CSS stylesheets, scripts, video files, etc) that is simply accessed by name/path. However a lot of these resources also have metadata (e.g. a video file on YouTube has metadata about it's rating, who uploaded it, number of views, etc) which need to be stored as well. This need for queryable, schematized storage is where SimpleDB comes in. EC2 provides a virtual server that can be used for computation complete with a local file system instance which isn't persistent if the virtual server goes down for any reason. With SimpleDB and S3 you have the building blocks to build a large class of "Web 2.0" style applications when you throw in the computational capabilities provided by EC2.
However neither S3 nor SimpleDB provides a solution for a developer who simply wants the typical LAMP or WISC developer experience of building a database driven Web application or for applications that may have custom storage needs that don't fit neatly into the buckets of blob storage or schematized storage. Without access to a persistent filesystem, developers on Amazon's cloud computing platform have had to come up with sophisticated solutions involving backing data up manually from EC2 to S3 to get the desired experience.
I just finished writing a library to make porting an app to simpledb in Perl easy, Net::Amazon::SimpleDB::Simple because I found the Amazon client libraries painful. The library isn't on CPAN yet, but it is at http://rjurneyopen.s3.amazonaws.com/SimpleDB/Simple.pm The idea was to make it trivial to stuff hashes in and out of SimpleDB.
I just ported an app to use it. Overall I am impressed with SimpleDB... even inefficient queries take only 2-3 seconds to return. SimpleDB doesn't seem to care about the size of your table, owing to its Erlang/parallel nature. Tablescans are easy for it.
The pain comes from the fact that you can't count, sum or group by. If you plan on doing any of those things... then SimpleDB probably isn't for you. At the moment in terms of functionality it exists somewhere in between memcached and MySQL. You can SELECT ORDER BY LIMIT, which is nice. Its also nice that you don't have to scale it yourself, and its nice that it doesn't care how much you stuff into it. But more advanced operations like analytics are painful at best. You'll have to do your own calculations server side. Its also a big plus that on any computer I can use the simpledb CLI http://code.google.com/p/amazon-simpledb-cli/ to query my data.
There are some confusing 'gotchas.' For instance, attributes can have more than one value, and you have to explicitly set 'replace' when storing items. Also, storing undef or null string results in a library error, instead of deleting that attribute name/value pair or setting it null/empty string.
Learning to think in terms of a largely un-normalized way is a little strange too, which is why I would second the suggestion above that says it is best for new applications. Porting from a SQL app to SimpleDB would be painful because your application logic would have to change. The way you do things is a bit different. The amazon docs are pretty good at explaining this.
All of this is extractable in a library that sits atop SimpleDB, so for your use of SimpleDB you will want to pick a good library... you probably don't want to deal with it directly. There is some work on the PHP side to make things easy, and there is my library. There is a RAILS activesource, but it doesn't seem to do much for you.
All in all its still early in the game, but compared to other APIs (twitter comes to mind), I have to say that the SimpleDB REST API is pretty simple (especially considering that it is XML) and polite to work with. I would recommend it... depending on the requirements of your application and the economics of your use of it. If you're looking to rapidly scale a service that doesn't put a great load on the DB and don't want to bother with a scalable MySQL/memcache combo... then SimpleDB can offer a 'simple' solution for you.
I expect that its features will continue to grow and it will be a good choice for more and more applications that do more complex and interesting things. But right now it is targeted at and appropriate for your typical Web 2.0 service.
We are using SimpleDB almost exclusively for our new projects. The zero maintenance, high availability, no install aspects are just too good. And for your Ruby developers, check out SimpleRecord, an ActiveRecord like interface for SimpleDB which makes it super easy to use.
But do you really need SQL Server? Can't you live with PostgreSQL or MySQL? Both have proven to be ok for most tasks.
Now if you need SQL Server features then you're out of luck.
Another option is to rent a server. How expensive is expensive?
(I've used Amazon S3 to store images for an application, it's ok and works fine, at least for that)
I haven't used SimpleDB, but have been using combination of S3, EC2, and MySQL for our application.
As long as you are willing to use SimpleDB, then you might as well consider using MySQL (which is very scalable, and not that expensive).
On the S3 and EC2 side, it is great in practice as well.
SimpleDB works great for many applications.... if your project will require a lot of analytic reporting, joining, etc, you may consider MySQL or a hybrid-model.
If you go SimpleDB, we've developed Radquery.com for our internal use and opened it up to the public.