hadoop.mapred vs hadoop.mapreduce? - apache

Why are there two separate packages map-reduce package in Apache's hadoop package tree:
org.apache.hadoop.mapred
http://javasourcecode.org/html/open-source/hadoop/hadoop-1.0.3/org/apache/hadoop/mapred/
org.apache.hadoop.mapreduce
http://javasourcecode.org/html/open-source/hadoop/hadoop-1.0.3/org/apache/hadoop/mapreduce/
Why are they separated out? Is there documentation that clarifies this?

They are separated out because both of these packages represent 2 different APIs. org.apache.hadoop.mapred is the older API and org.apache.hadoop.mapreduce is the new one. And it was done to allow programmers write MapReduce jobs in a more convenient, easier and sophisticated fashion. You might find this presentation useful, which talks about the differences in detail.
Hope this answers your question.

Related

How to generate understanding of PLSQL package, procedure or function?

we can generate ER diagram in SQL developer which help us in understanding of tables in a better way.
Like this Is it possible we can generate some kind of document which can give us overview of what a package/procedure/function is doing?
I'm asking this because in my project we have very long packages like 10000 lines and to read them consume lot of time. If we can generate some kind of document for understanding it will be very helpful.
As far as my experience is concerned there is no tool available which will generate documentation out of PLSQL code (just by reading the code without any comments).
However, I would like to mention the following tools and you can consider using them if they are of any help.
Pldoc
Pldoc is an open-source utility for generating HTML documentation of code written in Oracle PL/SQL.
http://pldoc.sourceforge.net/maven-site/
However, you will have to provide comments in your packages and functions in PLdoc style to ensure that documentation gets created.
Toad's Code Xpert
http://www.toadworld.com/products/toad-for-oracle/w/toad_for_oracle_wiki/11088.code-complexity-analysis-using-toad
This tool will perform an automated review on your code and provide a report. It will also provide a CRUD matrix which you might find useful.
PLSQL Doc Plugin
https://www.allroundautomations.com/plsplsqldoc.html
Similar to PLdoc.
Natural Docs
http://www.naturaldocs.org/
Open-source documentation generator for multiple programming languages.
There is no silver bullet - you cannot automagically create documentation for code.
Worse - the "auto-doc" tools typically look at comments, but there's no guarantee the comments match the code.
However, "working with legacy code" is a common problem. You might want to read this answer, and the book it refers to.
I'm happy with this (new) tool: https://github.com/teotiger/pldocu
It's limited and I miss some automatic export formats (e.g. html), but you can try to do this by yourself. For me it's okay.

'Queueing' tutorials and documentation?

I'm looking for articles and references that give an overview of 'queueing' (I'm probably not even using the right term here). I'm hoping for an introductory styled guide through a world of Redis, RabbitMQ, Celery, Kombu, and whatever other components exist that I haven't read about yet, and how they fit together.
My problem is I need to queue up background tasks for issued by my Django website, and every blog and article I read recommend different solutions.
Lots of options available to you, and your choice will likely come down to personal preference and what dependencies you feel comfortable installing.
I'll give a vote for Redis. I evaluated RabbitMQ, ActiveMQ, HornetQ, and Redis and found Redis to offer the best mix of ease of installation, simplicity, and performance.
It's technically not a message queue, but the push/pop primitives for the list types provide atomic queue-like operations, so it can effectively be used as a queue. It has worked well for us.
One python specific project on top of Redis you might look at:
http://richardhenry.github.com/hotqueue/tutorial.html
Very simple. But again, all the other options, like Celery, are viable too.
RabbitMQ has a good introduction here: http://www.rabbitmq.com/getstarted.html There's examples in Python, even.
HornetQ has a very good documentation, and it's simple to install.
You can find the documentation at www.hornetq.org, and you would have several examples available with the distribution.

Disqus vs custom made vs some other option for adding comments in Rails 3?

I need a commenting system for a site I am working on. It needs to be able to be polymorphic so it can support different types of models that need comments. What would be the smarter thing to do (less code, complexity, and effort required): use Disqus, create my own system, or use some other commenting system (if you know any others please tell me)? Any advice on this topic would be much appreciated. Thanks!
Disclaimer up-front: I haven't used Disqus but they seem to be the best combination of power and effort for your needs, which are relatively straightforward. Because they are a dedicated and competent firm (and used by millions of sites) you are likely to have more features, support, and easy upgrades than if you were to build on your own.
I might have suggested a custom-built solution if your requirements were unusual in any way, but they seem to be typical for a modern commercial site. Good luck!

Where do I begin learning Lucene.NET Solr Hadoop and MapReduce?

I'm a .NET developer and I need to learn Lucene so we can run a very large scale search service that removes entries that the end user doesn't have access to. (ie a User can search for all documents with clearance level 3 or higher, but not clearance level 2 or 1)
Where do I start learning, which products should I consider? To be honest, I'm a little overwhelmed, but I'm determined to figure it all out... eventually.
If you want a book that covers all the basics of Lucene, consider "Lucene in Action". Even though the code samples are Java, you can easily port them to .NET. Of course, there also are tonnes of resources on the web, such as SO and the Lucene mailing lists which should help you along.
For project you describe, you should look at Solr since it abstracts out lots of the issues of scalability etc. and via Solrnet can easily integrate into your .NET app. To restrict access by a level, your index documents should contain a field called "Level" (say) and in the background of your user query, you append the "Level:Level-1" query, using a boolean query construct.
At this stage, my recommendation would be to stay away from Hadoop (Apache Map-reduce implementation) for your project and stick with Solr. If you are however keen to learn about it. It too has a very useful book, you guessed it "Hadoop In Action" (also from Manning Publications).
You seem to be confused about what exactly each project (Lucene/Solr/Hadoop/etc) does. So the first thing to do would be understanding the purpose of each project. Read the docs and blogs about them. If possible, buy and read books about them.
For example, MapReduce and Hadoop have nothing to do with your security requirements. Hadoop is a platform for distributed, scalable computing. But Solr is scalable on its own. You might want to use Hadoop to distribute a crawler though (e.g. Nutch).

Pl/SQL Package Inline Documentation

I am attempting to more fully document our database packages as an API. What we would like is something like JavaDocs for PL/SQL. I have seen a couple tools out there (pldoc, plsqldoc) but would like to know from people who use them how they compare.
I have used PlDoc and find it really good. I haven't used any other tools so can't compare. I found PlDoc did the basics well. I wanted some more advanced features so I built our own tool that added extensions to PlDoc for more tags. Also I don't just do documentation with it I also generate our package headers using some PlDoc tags (e.g #private).
I recommend you try PlDoc then tweak whatever doesn't meet your needs. It doesn't take that long to set up so its not a huge time investment to try it.
I've been using NaturalDocs for a few years now and have found it easy to install and use.
It's pretty much like JavaDocs and supports multiple languages although I've only used it with PL/SQL.
Very configurable although I've not found it necessary to fiddle with that.