First time MapReduce: I need to combine a distinct and count, please help - sql

I have a collection and need to get a distinct count from the data set in MongoDB
db['2011-05-29'].distinct("plugins.HTTPServer.string");
returns all the distinct names for the key
How would I go about getting a count for every occurrence of a particular string?
Example:
Apache 29172
IIS 3932
I've looked at some MapReduce examples but can't seem to get it to work right. As my counts add up to more than the actual items in the collection.
db['2011-04-13-1pm-scan'].distinct("plugins.HTTPServer.string").length;
returns the number of distinct items in that key.
I however want the Key Value and Count for each, as above.

Your question is 100% exactly what the wordcount demo application does.
It's part of the standard set of examples shipped with Hadoop and it's also explained here in great detail in these pages
http://wiki.apache.org/hadoop/WordCount
http://developer.yahoo.com/hadoop/tutorial/module4.html#wordcount
HTH

Related

How to get the data of the newly accessed record by a query on PostgreSQL using it's internal variables and functions?

Let's say I have the following 'items' table in my PostgreSQL database:
id
item
value
1
a
10
2
b
20
3
c
30
For some reason I can't control I need to run the following query:
select max(value) from items;
which will return 30 as the result.
At this point, I know that I can find the record that contains that value using simple select statements, etc. That's not the actual problem.
My real questions are:
Does PostgreSQL know (behind the scenes) what's is the ID of that
record, although the query shows only the max value of the column
'value'?
If yes, can I have access to that information and,
therefore, get the ID and other data from the found record?
I'm not allowed to create indexes and sequences, or change way the max value is retrieved. That's a given. I need to work from that point onward and find a solution (which I have, actually, from regular query work).
I'm just guessing that the database knows in which record that information (30) is and that I could have access to it.
I've been searching for an answer for a couple of hours but wasn't able to find anything.
What am I missing? Any ideas?
Note: postgres (PostgreSQL) 12.5 (Ubuntu 12.5-0ubuntu0.20.10.1)
You can simply extract the whole record that contains max(value) w/o bothering about Postgres internals like this:
select id, item, "value"
from items
order by "value" desc
limit 1;
I do not think that using undocumented "behind the scenes" ways is a good idea at all. The planner is smart enough to do exactly what you need w/o extra work.

Filter on Count Aggregation

I have been looking for the solution on the internet from quite a while and I'm still not sure that if it is possible on Kibana or not.
Suppose I apply filter on term and it gives me count of the respective terms but I want the results to show only those terms where count equals a specific value.
Being more specific,
I want to find out the number of tills which are the most busy (most number of transactions). Currently when I apply a filter on term and count it shows me the all the tills with their respective transaction count. What I want is that to show only those tills where the count is equal to let's say 10.
In other words a similar functionality like HAVING clause in relational dbms.
I have found a lot of work arounds of the same usecase but I'm looking for a solution.
I hope I understand what you're asking. I think You can search the field in question with proper parameters. For example, for the field 'field_name' with more than 10 hits, try following Lucene query:
field_name:(*) AND count:[10 TO *]
For the exact result of field_name with count=10, query:
field_name:(*) AND count:[10]
Let me know if this was what you were looking for!

Count the number of values in a range that are not defined in a list

I have a table that contains details of all of our companies mobile phones. Next to this table, I need some basic stats like how many handsets of each OS there are.
Using COUNTIF I can work it all out, apart from Other. Is there a way of calculating the number of values that do not equal anything in a list of values?
This works for just 'not Android' -
=COUNTIF(Mobiles[OS], "<>Android")
But this does not work when trying to exclude all the major players -
=COUNTIF(Mobiles[OS], AND("<>Android", "<>BlackBerry", "<>iOS", "<>Windows"))
Does anybody know how I can achieve this? Thanks.
This works, it's just not very clever
=COUNTIFS(Mobile[OS],"<>Android",Mobile[OS],"<>Blackberry", Mobile[OS],"<>iOS",Mobile[OS],"<>Windows",)
Don’t count Other, instead count All and subtract Specific (as derived from COUNTIF).

Alfresco: Lucene query by ID returns 2 rows

I'm using Alfresco 3.4d and imported some nodes as well as created a few with NodeService. Today I noticed that a Lucene query by ID does sometimes return two rows instead of just one. Not all nodes show this kind of behavior.
For example, when I execute the following Lucene query in the Alfresco Node Browser, I get the result shown below: ID:"workspace://SpacesStore/96c0cc27-cb8c-49cf-977d-a966e5c5e9ca"
How is it even possible that a query by ID can return more than one row? I tried rebuilding the Lucene index, but it didn't help. When I delete the node, the query returns 0 rows. What can I do to remove those "ghost" nodes from the query result?
I also ran across this problem and asked the Alfresco support for advice. They told me that it is perfectly normal to have duplicate entries in the lucene ID field and that this is related to whether there is an ANCESTOR present or not. They recommended using the sys:node-uuid field when doing a lucene search for the node's ID, e.g.:
#sys\:node-uuid:f13a21dd-b020-4c70-aa21-1a0e5c89d42b
I've seen this problem since Alfresco 3.2r, but maybe it is even older! I used the Lucene index Viewer "Luke" (http://www.getopt.org/luke/) to check the index directly and I saw that the corrupt index entry contains almost no information. As workaround we combined our search to some basic information like node type or aspect. I will ask a colleague if he has more information about this.
I don't know directly how this is possible but in your 'code' where you retrieve the nodes you could always do: if node.isDocument or node.isContainer to get true result or type is cm:content or cm:folder.
You could also try to re-index, but I doubt that will be of any help

Counting occurence of each distinct element in a table

I am writing a log viewer app in ASP.NET / C#. There is a report window, where it will be possible to check some information about the whole database. One kind of information there I want to display on the screen is the number of times each generator (an entity in my domain, not Firebirds sequence) appears in the table. How do I do that using COUNT ?
Do I have to :
Gather the key for each different generator
Run one query for each generator key using count
Display it somehow
Is there any way that I can do it without having to do two queries to the database? The database size can be HUGE, and having to query it "X" times where "X" is the number of generators would just suck.
I am using a Firebird database, is there any way to fetch this information from any metadata schema or there is no such thing available?
Basically, what I want is to count each occurrence of each generator in the table. Result would be something like : GENERATOR A:10 times,GENERATOR B:7 Times,Generator C:0 Times and so on.
If I understand your question correctly, it is a simple matter of using the GROUP BY clause, e.g.:
select
key,
count(*)
from generators
group by key;
Something like the query below should be sufficient (depending on your exact structure and requirements)
SELECT KEY, COUNT(*)
FROM YOUR_TABLE
GROUP BY KEY
I solved my problem using this simple Query:
SELECT GENERATOR_,count(*)
FROM EVENTSGENERAL GROUP BY GENERATOR_;
Thanks for those who helped me.
It took me 8 hours to come back and post the answer,because of the StackOverflow limitation to answer my own questions based in my reputation.