Running queries in MongoDB for data extraction - mongodb-query

I am trying to extract data from a DB in Mongo by running queries in RoboMongo to have data for have the data of one specific user, but the query I wrote dose not work. Can anybody help me with the mistake/s I'v made. Here is the query:
db.getCollection('restaurants').find({
"user" : "auth0|5845e0a768022a5112b0d51c",
"provider" : "fitbit",
"data" : {
"activities-steps-intraday" : {
"datasetType" : "minute",
"datasetInterval" : 1}
}})

Related

Is it correct to do 1-to-1 mapping in Update API request param

There is a need for me to do bulk update of user details.
Let the object details have the following fields,
User First Name
User ID
User Last Name
User Email ID
User Country
An admin can upload the updated data of the users through a csv file. Values with mismatching data needs to be updated. The most probable request format for this bulk update request will be like:(Method 1)
"data" : {
"userArray" : [
{
"id" : 2343565432,
"f_name" : "David",
"email" : "david#testmail.com"
},
{
"id" : 2344354351,
"country" : "United States",
}
.
.
.
]
}
Method 2 : I would send the details in two arrays, one containing the list of similar filed values with respect to their user ids
"data" : {
"userArray" : [
{
"ids" : [23234323432, 4543543543, 45654543543],
"country" : ["United States", "Israel", "Mexico"]
},
{
"ids" : [2323432334543, 567676565],
"email" : ["groove#drivein.com", "zara#foobar.com"]
},
.
.
.
]
}
In method 1, i need to query the database for every user update, which will be more as the no of user edited is more. In contrast, if i use method 2, i query the database only once for each param(i add the array in the query and get those rows whose user id is present in the given array in a single query). And then i can update the each row with their respective details.
But overall in the internet, most of the update api had params in the format specified in method 1 which gives user good readability. But i need to know what will be advantage if i go with method 1 rather than method 2? (I save some query time in method 2 if the no of users count is large which can improve my performance)
I almost always see it being method 1 style.
Woth that said, I don't understand why your DB performance is based on the way the input data is structured. That's just the way information gets into your code.
You can have the client send the data as method 1 and then shim it to method 2 on the backend if that helps you structure the DB queries better

Load data from ES and store as avro in HDFS using pig

I have some data on ElasticSearch that I need to send on HDFS. I'm trying to use pig (this is the first time I'm using it), but I have some problem to define a correct schema for my data.
First of all, I tried loading a JSON using the option 'es.output.json=true' with org.elasticsearch.hadoop.pig.EsStorage, and I can load/dump data correctly, and also save them as a JSON to HDFS using STORE A INTO 'hdfs://path/to/store';. Later, defining an external table on HIVE, I can query this data. This is the full example that is working fine (I removed all SSL attributes from the code):
REGISTER /path/to/commons-httpclient-3.1.jar;
REGISTER /path/to/elasticsearch-hadoop-5.3.0.jar;
A = LOAD 'my-index/log' USING org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes=https://addr1:port,https://addr2:port2,https://addr3:port3',
'es.query=?q=*',
'es.output.json=true');
STORE A INTO 'hdfs://path/to/store';
How can I store my data as AVRO to HDFS? I suppose I need to use AvroStorage, but I should also define a schema loading the data, or the JSON is enough? I tried to define a schema with LOAD...USING...AS command and setting es.mapping.date.rich=false instead of es.output.json=true (my data is quite complex, with map of maps and things like that), but it doesn't work. I'm not sure if the problem is on the syntax, or in the approach itself. Would be nice to have an hint on the correct direction to follow.
UPDATE
This is an example of what I tried with es.mapping.date.rich=false. My problem is that if a field is null, all fields will be in a wrong order.
A = LOAD 'my-index/log' USING org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes=https://addr1:port,https://addr2:port2,https://addr3:port3',
'es.query=?q=*',
'es.mapping.date.rich=false')
AS(
field1:chararray,
field2:chararray,
field3:map[chararray,fieldMap:map[],chararray],
field4:chararray,
field5:map[]
);
B = FOREACH A GENERATE field1, field2;
STORE B INTO 'hdfs://path/to/store' USING AvroStorage('
{
"type" : "foo1",
"name" : "foo2",
"namespace" : "foo3",
"fields" : [ {
"name" : "field1",
"type" : ["null","string"],
"default" : null
}, {
"name" : "field2",
"type" : ["null","string"],
"default" : null
} ]
}
');
For future readers, I decided to use spark instead as it is much faster than pig. To save avro files on hdfs, I'm using the databrick library.

Lucene query where two fields will be compared

I have an elasticsearch cluster. All documents in the cluster have the same index and type. Each document has two number fields -> field1 and field2.
I want to display all documents in Grafana, where value of field1 > value of field2.
Is there a query like:
document_type:test AND field1 > field2 ?
As far as I'm aware there is no way to perform that sort of query using elasticsearch (lucene). It does support range queries, but not comparison between different fields in the document.
You can do this with a (groovy) script query, like this:
{
"query" : {
"term" : {
"document_type" : "test"
}
},
"filter" : {
"script" : {
"script" : "doc['field1'].value > doc['field2'].value"
}
}
}
See also, more documentation on what is available from the Elasticsearch scripting module.

Count in mongoDB

I have something like this for every user in mongoDB:
{
"id" : 1234,
"name" : "Mr. Someone",
"userdata" : {
"living" : {
"city" : "Somecity",
"address" : "Main Street 10.",
"zip" : "1023"
},
"interest" : "Cars"
}
I'm trying to find a way to count how many subscribers live in Somecity.
My best guess was the following:
db.users.count({userdata:{living:{city:"Somecity"}}}
But the result was 0.
How can I properly count "rows" by a given value in mongoDB?
I'm using mongoDB's documentation (for example: http://docs.mongodb.org/manual/reference/sql-comparison/) but could not resolve my problem yet.
I'm using mongoDB trough shell.
I think I have found the sollution to my problem:
db.users.count({"userdata.living.city":"Somecity"})
This "dotting" method allowed me to search for only one value in the array, while the method I tried first wanted to find an exact match.
Further reading: http://docs.mongodb.org/manual/reference/operator/query/elemMatch/
Quote:
Since the $elemMatch only specifies a single condition, the $elemMatch
expression is not necessary, and instead you can use the following
query:
db.survey.find( { "results.product": "xyz" } )

Simple full text search in ElasticSearch

I'm trying to understand how ElasticSearch Query DSL works.
It would be a lot of help if anyone could give me an example how to perform a search like the following MySQL query:
SELECT * FROM products
WHERE shop_id = 1
AND MATCH(title, description) AGAINST ('test' IN BOOLEAN MODE)
Assuming that you indexed some documents containing at least the shop_id, title and description fields, something like the following example:
{
"shop_id" : "here goes your shop_id",
"title" : "here goes your title",
"description" : "here goes your description"
}
You can execute a multi match query against multiple fields, and give them a different weight (usually title is more important). You can also combine the query with a term filter on shop_id:
{
"query" : {
"multi_match" : {
"query" : "here goes your query",
"fields" : [ "title^2", "description" ]
},
"filter" : {
"term" : { "shop_id" : "here goes your shop id" }
}
}
You need to submit the query using the search API. Filters are used to reduce the set of documentsthe query is eecute against. Filters are faster since don't involve scoring and cached. In my example I applied a top level filter, which might be or not a good fit for you depending on what else you want to do next. If you want to make a facet, for instance, the filter would be ignored in the facet. Another way to add a filter, which would be taken into account while computing the facets as well, is the filtered query.