Performing a WHERE - IN query in CouchDB

Performing a WHERE - IN query in CouchDB - sql

I would like to query for a list of particular documents with one call to CouchDB.
With SQL I would do something like
SELECT *
FROM database.table
WHERE database.table.id
IN (2,4,56);
What is a recipe for doing this in CouchDB by either _id or another field?

You need to use views keys query parameter to get records with keys in specified set.
function(doc){
emit(doc.table.id, null);
}
And then
GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]
To retrieve document content in same time just add include_docs=True query parameter to your request.
UPD: Probably, you might be interested to retrieve documents by this reference ids (2,4,56). By default CouchDB views "maps" emitted keys with documents they belongs to. To tweak this behaviour you could use linked documents trick:
function(doc){
emit(doc.table.id, {'_id': doc.table.id});
}
And now request
GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]&include_docs=True
will return rows with id field that points to document that holds 2,4 and 56 keys and doc one that contains referenced document content.

In CouchDB Bulk document APi is used for this:
curl -d '{"keys":["2","4", "56"]}' -X POST http://127.0.0.1:5984/foo/_all_docs?include_docs=true
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

Related

Solr: store original file offset or record number with token

I have a workflow where there is a layer of pre-processing in order to extract fields - this is later handed to another process to be ingested into Solr. The original files comprise documents with records, think tabular data.
Some of these columns are indexed in Solr in order to get the relevant documentID for that value of the field. I.e. you query like
q=indexedField:indexedValue1
fl= documentId
and have a response like:
... response: {documentID1, documentID3}
assuming indexedValue1 is present in field indexedField in documents documentID1, documentID3.
Each record will then have a value on one of the fields we want to index. The pre-processing concats these values to one (long) text field, with each value as a token, so you can later search by them. Indexed fields when handed to Morphlines look like this:
...
value1 value2 ... valueN
...
Some fields are extracted and then regrouped in a field, so if you want to search by a value, you can know in which document it is.
(fairly simple until here)
However, how could I also store in Solr, along with each token that I want to search by, the offset (or record number) on the original file? The problem is not to extract this information (that is another problem, but we can solve it).
i.e. you would query like above, but will get per each document ID, the original record number or file offset where the record is located - something like:
... response:{ {documentID1, [1234, 5678]}, { documentID3, [] } }
Is this possible at all? In that case, what's the correct Solr data structure to efficiently model it?

It sounds that what you are looking for is Payloads. This functionality is present in Solr, but often requires custom code to actually fully benefit from it.
The challenge however would be that you seem to want to return payloads that are associated with the tokens that matched during search. That's even more complicated as the search focuses on returning documents and extracting what matched in the specific document is a separate challenge, usually solved by highlighters.

Show results with solr

i'm using solr http://lucene.apache.org/solr/
I use a tutorial per index my collection and execute some simple queries via the graphical interface, avaiable at the address http://localhost:8983/solr/demo/browse.
But, now i would execute some queries via command line, so i use this:
curl http://localhost:8983/solr/demo/query -d '
q=*:*'
But, in doing so, i only obtain the sorted list of the documents, showing the entire contents of the documents, without the scoring result of each one.
So, how can i do to show only the title and the score of them?

You need to use the fl query parameter which means field list. score for getting the score. e.g. fl=title,score. Assuming you are storing title as title.
curl http://localhost:8983/solr/demo/query -d 'q=*:*&fl=title,score'
For more information: Common Query Parameters

RESTful API - URI Structure Advice

I have REST API URL structure similar to:
/api/contacts GET Returns an array of contacts
/api/contacts/:id GET Returns the contact with id of :id
/api/contacts POST Adds a new contact and return it with an id added
/api/contacts/:id PUT Updates the contact with id of :id
/api/contacts/:id PATCH Partially updates the contact with id of :id
/api/contacts/:id DELETE Deletes the contact with id of :id
My question is about:
/api/contacts/:id GET
Suppose that in addition to fetching the contact by ID, I also want to fetch it by an unique alias.
What should be URI structure be if I want to be able to fetch contact by either ID or Alias?

If you're alias's are not numeric i would suggest using the same URI structure and figuring out if it's an ID or an alias on your end. Just like Facebook does with username and user_id. facebook.com/user_id or facebook.com/username.
Another approach would be to have the client use GET /contacts with some extra GET parameters as filters to first search for a contact and then looking up the ID from that response.
Last option i think would be to use a structure like GET /contacts/alias/:alias. But this would kinda imply that alias is a subresource of contacts.

The path and query part of IRIs are up to you. The path is for hierarchical data, like api/version/module/collection/item/property, the query is for non-hierarchical data, like ?display-fields="id,name,etc..." or ?search="brown teddy bear"&offset=125&count=25, etc...
What you have to keep in mind, that you are working with resources and not operations. So the IRIs are resource identifiers, like DELETE /something, and not operation identifiers, like POST /something/delete. You don't have to follow any structure by IRIs, so for example you could use simply POST /dashuif328rgfiwa. The server would understand, but it would be much harder to write a router for this kind of IRIs, that's why we use nice IRIs.
What is important that a single IRI always belongs only to a single resource. So you cannot read cat properties with GET /cats/123 and write dog properties with PUT /cats/123. What ppl usually don't understand, that a single resource can have multiple IRIs, so for example /cats/123, /cats/name:kitty, /users/123/cats/kitty, cats/123?fields="id,name", etc... can belong to the same resource. Or if you want to give an IRI to a thing (the living cat, not the document which describes it), then you can use /cats/123#thing or /users/123#kitty, etc... You usually do that in RDF documents.
What should be URI structure be if I want to be able to fetch contact
by either ID or Alias?
It can be /api/contacts/name:{name} for example /api/contacts/name:John, since it is clearly hierarchical. Or you can check if the param contains numeric or string in the /api/contacts/{param}.
You can use the query too, but I don't recommend that. For example the following IRI can have 2 separate meanings: /api/contacts?name="John". You want to list every contact with name John, or you want one exact contact. So you have to make some conventions about this kind of requests in the router of your server side application.

I would consider adding a "search" resource when you are trying to resolve a resource with the alias:
GET /api/contacts/:id
and
GET /api/contacts?alias=:alias
or
GET /api/contacts/search?q=:alias

First of all, the 'ID' in the URL doesn't have to be a numerical ID generated by your database. You could use any piece of data (including the alias) in the URL, as long as its unique. Of course, if you are using numerical ID's everywhere, it is more consistent to do the same in your contacts API. But you could choose to use the aliases instead of numeric IDs (as long as they are always unique).
Another approach would be, as Stromgren suggested, to allow both numeric IDs and aliases in the URL:
/api/contacts/123
/api/contacts/foobar
But this can obviously cause problems if aliases can be numeric, because then you wouldn't have any way to differentiate between an ID and a (numeric) alias.
Last but not least, you can implement a way of filtering the complete collection, as shlomi33 already suggested. I wouldn't introduce a search resource, as that isn't really RESTful, so I'd go for the other solution instead:
/api/contacts?alias=foobar
Which should return all contacts with foobar as alias. Since the alias should be unique, this will return 1 or 0 results.

How could I query embed object field name in mongodb / pymongo?

My users' collection used to like:
{'_id':'xxx', 'hobbies':['Dance','Ski']}
Now I add "likes" as:
{'_id':'xxx', 'hobbies':{'Dance':5,'Ski':8}}
I want to query users have at least one same hobby, my old query is like:
db.usr.find({'_id':{'$ne':usr['_id']}, 'hobbies':{'$in':usr['hobbies']} })
Now my query is like:
db.usr.find({'_id':{'$ne':usr['_id']},
'hobbies':{'$in':list(usr['hobbies'].keys())} })
I checked out mongodb documents, found nothing to represent 'hobbies' field name, or python's dictionary key. For mongodb, new 'hobbies' represents embed object, the field name is usually definite.
Do I HAVE TO maintain two arrays(in mongodb) or lists(in python)? Isn't there a simple solution?
{'_id':'xxx', 'hobbies':['Dance','Ski'], 'likes':[5,8]}

Unfotunately, mongodb does not support querying/filtering field names.
Your options are:
do the filtering on client side, after querying in full
keep hobbies names in an array, like you used to, in order to be able to filter on server side

RavenDb - What's faster?

I need to do a query in RavenDb and perform a get on a document by Id and a secondary parameter.
To be more precise, I'd like to load a document by document Id and by ApiKey. If the ApiKey of the given document does not match I want a null back.
My question is, is it faster to do a Query with Id and ApiKey comparison, or is it faster to do a Load by Id and throw away the document in code if the ApiKey does not match. My documents are probably 20k in size.

Do a load by id, then compare.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas