How could I query embed object field name in mongodb / pymongo? - pymongo

My users' collection used to like:
{'_id':'xxx', 'hobbies':['Dance','Ski']}
Now I add "likes" as:
{'_id':'xxx', 'hobbies':{'Dance':5,'Ski':8}}
I want to query users have at least one same hobby, my old query is like:
db.usr.find({'_id':{'$ne':usr['_id']}, 'hobbies':{'$in':usr['hobbies']} })
Now my query is like:
db.usr.find({'_id':{'$ne':usr['_id']},
'hobbies':{'$in':list(usr['hobbies'].keys())} })
I checked out mongodb documents, found nothing to represent 'hobbies' field name, or python's dictionary key. For mongodb, new 'hobbies' represents embed object, the field name is usually definite.
Do I HAVE TO maintain two arrays(in mongodb) or lists(in python)? Isn't there a simple solution?
{'_id':'xxx', 'hobbies':['Dance','Ski'], 'likes':[5,8]}

Unfotunately, mongodb does not support querying/filtering field names.
Your options are:
do the filtering on client side, after querying in full
keep hobbies names in an array, like you used to, in order to be able to filter on server side

Related

RESTful API - URI Structure Advice

I have REST API URL structure similar to:
/api/contacts GET Returns an array of contacts
/api/contacts/:id GET Returns the contact with id of :id
/api/contacts POST Adds a new contact and return it with an id added
/api/contacts/:id PUT Updates the contact with id of :id
/api/contacts/:id PATCH Partially updates the contact with id of :id
/api/contacts/:id DELETE Deletes the contact with id of :id
My question is about:
/api/contacts/:id GET
Suppose that in addition to fetching the contact by ID, I also want to fetch it by an unique alias.
What should be URI structure be if I want to be able to fetch contact by either ID or Alias?
If you're alias's are not numeric i would suggest using the same URI structure and figuring out if it's an ID or an alias on your end. Just like Facebook does with username and user_id. facebook.com/user_id or facebook.com/username.
Another approach would be to have the client use GET /contacts with some extra GET parameters as filters to first search for a contact and then looking up the ID from that response.
Last option i think would be to use a structure like GET /contacts/alias/:alias. But this would kinda imply that alias is a subresource of contacts.
The path and query part of IRIs are up to you. The path is for hierarchical data, like api/version/module/collection/item/property, the query is for non-hierarchical data, like ?display-fields="id,name,etc..." or ?search="brown teddy bear"&offset=125&count=25, etc...
What you have to keep in mind, that you are working with resources and not operations. So the IRIs are resource identifiers, like DELETE /something, and not operation identifiers, like POST /something/delete. You don't have to follow any structure by IRIs, so for example you could use simply POST /dashuif328rgfiwa. The server would understand, but it would be much harder to write a router for this kind of IRIs, that's why we use nice IRIs.
What is important that a single IRI always belongs only to a single resource. So you cannot read cat properties with GET /cats/123 and write dog properties with PUT /cats/123. What ppl usually don't understand, that a single resource can have multiple IRIs, so for example /cats/123, /cats/name:kitty, /users/123/cats/kitty, cats/123?fields="id,name", etc... can belong to the same resource. Or if you want to give an IRI to a thing (the living cat, not the document which describes it), then you can use /cats/123#thing or /users/123#kitty, etc... You usually do that in RDF documents.
What should be URI structure be if I want to be able to fetch contact
by either ID or Alias?
It can be /api/contacts/name:{name} for example /api/contacts/name:John, since it is clearly hierarchical. Or you can check if the param contains numeric or string in the /api/contacts/{param}.
You can use the query too, but I don't recommend that. For example the following IRI can have 2 separate meanings: /api/contacts?name="John". You want to list every contact with name John, or you want one exact contact. So you have to make some conventions about this kind of requests in the router of your server side application.
I would consider adding a "search" resource when you are trying to resolve a resource with the alias:
GET /api/contacts/:id
and
GET /api/contacts?alias=:alias
or
GET /api/contacts/search?q=:alias
First of all, the 'ID' in the URL doesn't have to be a numerical ID generated by your database. You could use any piece of data (including the alias) in the URL, as long as its unique. Of course, if you are using numerical ID's everywhere, it is more consistent to do the same in your contacts API. But you could choose to use the aliases instead of numeric IDs (as long as they are always unique).
Another approach would be, as Stromgren suggested, to allow both numeric IDs and aliases in the URL:
/api/contacts/123
/api/contacts/foobar
But this can obviously cause problems if aliases can be numeric, because then you wouldn't have any way to differentiate between an ID and a (numeric) alias.
Last but not least, you can implement a way of filtering the complete collection, as shlomi33 already suggested. I wouldn't introduce a search resource, as that isn't really RESTful, so I'd go for the other solution instead:
/api/contacts?alias=foobar
Which should return all contacts with foobar as alias. Since the alias should be unique, this will return 1 or 0 results.

Mongoid query in Rails: Can I find only those records which have embedded child objects?

I would like to write a query in a Rails model using mongoid, and I'd like it to return only those records which have embedded child objects (in this case, client work links).
I only want to find clients which have embedded client work links.
This is what I'd like, though obviously it doesn't work because of the "where" parameters.
def self.latest_client_press
Work.where("!self.work_links.empty?").desc(:updated_at).limit(4)
end
While it is possible in MongoDB to query on array's size, this feature is rather limited.
What people do instead (and what is recommended on that page) is store array length along with the array itself. This way you can index this field and query documents very efficiently.

Hibernate Search: how to query for embeded entities

I like to use Hibernate Search for implementing an sophisticated autosuggestion feature across multiple input fields on a web page.
Each input field is for its own entity, let's say Country and City. There is a many-to-one relationship between both entities
(countries contain cities).
The autosuggestion should work such that when typing e.g. a country name prefix and the city field is already filled,
you get only suggestions for countries that have such a city (and vice versa).
The server side autosuggestion service should return list of projections
(entityId, entityName) which are rendered into the input field (dropdown, whatever).
According to the schema and after having read the manual I tried the following index schema:
SearchMapping mapping = new SearchMapping();
mapping.analyzerDef(...
.entity(City.class).indexed().indexName("MyIndex")
.property("cityId", ElementType.FIELD)
.documentId()
.name("id")
.property("name", ElementType.FIELD)
.documentId()
.name("id")
.property("country", ElementType.METHOD)
.indexEmbedded()
.entity(Country.class).indexed()
.property("id", ElementType.FIELD)
.documentId()
.name("id")
.property("name", ElementType.METHOD)
.field()
.name("name")
This mapping defines City to be the main entity, right?
I have indexed all cities and am able to query for them (also by combining both fields). However, I only get matches when querying for cities.
i.e. when querying like
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(City.class).get();
This is not useful for the country field becuse when I type in "Spain", I get a single row for each city of Spain. (Spain, Spain, Spain, Spain ,.... ;-))
The question is: How is it possible to search for country entities? Changing the index structure? The indexing procedure? Or how to query?
The only way I found was to setup a Facet for country, and you the different possible facets as autosuggestion. However, this is also not perfect
since it is not possible to sort facets alphabetically.
Of course, in this example, I could switch both entities in the mapping, but suppose scenarios with more complex entity graphs.
UPDATE: adding queries requested in comment
For building queries, I employ the QueryBuilder. The following produces a result set like in the Spain example:
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(City.class).get();
with query:
country.name:Spain
If I try to use a query builder for countries
fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(Country.class).get();
and query:
name:Spain
I get no results.
You are not showing your actual query. You don't have to use the query DSL, but you can also write native Lucene queries. In both cases (DSL or native Lucene) you can combine queries via boolean logic. Embedded entities follow the java bean notation. The country name would for example in a city query be reached as country.name. Again, without your actual query it is hard to give any more specific feedback.
Last, but not least, facets can also be sorted alphabetically. Check FacetSortOrder.COUNT_DESC.

Lucene complex structure search

Basically I do have pretty simple database that I'd like to index with Lucene.
Domains are:
// Person domain
class Person {
Set<Pair> keys;
}
// Pair domain
class Pair {
KeyItem keyItem;
String value;
}
// KeyItem domain, name is unique field within the DB (!!)
class KeyItem{
String name;
}
I've tens of millions of profiles and hundreds of millions of Pairs, however, since most of KeyItem's "name" fields duplicates, there are only few dozens KeyItem instances.
Came up to that structure to save on KeyItem instances.
Basically any Profile with any fields could be saved into that structure.
Lets say we've profile with properties
- name: Andrew Morton
- eduction: University of New South Wales,
- country: Australia,
- occupation: Linux programmer.
To store it, we'll have single Profile instance, 4 KeyItem instances: name, education,country and occupation, and 4 Pair instances with values: "Andrew Morton", "University of New South Wales", "Australia" and "Linux Programmer".
All other profile will reference (all or some) same instances of KeyItem: name, education, country and occupation.
My question is, how to index all of that so I can search for Profile for some particular values of KeyItem::name and Pair::value. Ideally I'd like that kind of query to work:
name:Andrew* AND occupation:Linux*
Should I create custom Indexer and Searcher? Or I could use standard ones and just map KeyItem and Pair as Lucene components somehow?
I believe you can use standard Lucene methodology.
I would:
Translate every profile to a Lucene Document.
Translate every Pair to a Field in this Document. All Fields need to be indexed, but not necessarily stored.
Add a stored Field with a profile id to the Document.
Search using name:value pairs similarly to your example.
If you choose bare Lucene, you will need a custom Indexer and Searcher, but they are not hard to build.
It may be easier for you to use Solr, where you need less programming. However, I do not know if Solr allows an open-ended schema like the one I described - I believe you have to predefine all field names, so this may prevent you from using Solr.
Lucene returns the list of hit documents essentially based on the occurence of the keyword/s regardless of the type of query. The fundamental segment reader checks for the presence of keywords in the entire index database rather than in specific field of interest.
Suggest to introduce a custom searcher that performs the following.
1.Read the short-listed documents using the document id. ( I guess the collect() method may be overridden to pass the document id from search() of IndexSearcher class ).
2.Get the field value and check the presence of expected keywords.
3.Subject the document for scoring only if the document meets your custom criteria.
Note : The default standard searcher can be modified rather than writing a custom seacher from scratch.

nhibernate and DDD suggestion

I am fairly new to nHibernate and DDD, so please bear with me.
I have a requirement to create a new report from my SQL table. The report is read-only and will be bound to a GridView control in an ASP.NET application.
The report contains the following fields Style, Color, Size, LAQty, MTLQty, Status.
I have the entities for Style, Color and Size, which I use in other asp.net pages. I use them via repositories. I am not sure If should use the same entities for my report or not. If I use them, where I am supposed to map the Qty and Status fields?
If I should not use the same entities, should I create a new class for the report?
As said I am new to this and just trying to learn and code properly.
Thank you
For reports its usually easier to use plain values or special DTO's. Of course you can query for the entity that references all the information, but to put it into the list (eg. using databinding) it's handier to have a single class that contains all the values plain.
To get more specific solutions as the few bellow you need to tell us a little about your domain model. How does the class model look like?
generally, you have at least three options to get "plain" values form the database using NHibernate.
Write HQL that returns an array of values
For instance:
select e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty
from entity1 inner join entity2
where (some condition)
the result will be a list of object[]. Every item in the list is a row, every item in the object[] is a column. This is quite like sql, but on a higher level (you describe the query on entity level) and is database independent.
Or you create a DTO (data transfer object) only to hold one row of the result:
select new ReportDto(e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty)
from entity1 inner join entity2
where (some condition)
ReportDto need to implement a constructor that has all this arguments. The result is a list of ReportDto.
Or you use CriteriaAPI (recommended)
session.CreateCriteria(typeof(Entity1), "e1")
.CreateCriteria(typeof(Entity2), "e2")
.Add( /* some condition */ )
.Add(Projections.Property("e1.Style", "Style"))
.Add(Projections.Property("e1.Color", "Color"))
.Add(Projections.Property("e1.Size", "Size"))
.Add(Projections.Property("e2.LAQty", "LAQty"))
.Add(Projections.Property("e2.MTLQty", "MTLQty"))
.SetResultTransformer(AliasToBean(typeof(ReportDto)))
.List<ReportDto>();
The ReportDto needs to have a proeprty with the name of each alias "Style", "Color" etc. The output is a list of ReportDto.
I'm not schooled in DDD exactly, but I've always modeled my nouns as types and I'm surprised the report itself is an entity. DDD or not, I wouldn't do that, rather I'd have my reports reflect the results of a query, in which quantity is presumably count(*) or sum(lineItem.quantity) and status is also calculated (perhaps, in the page).
You haven't described your domain, but there is a clue on those column headings that you may be doing a pivot over the data to create LAQty, MTLQty which you'll find hard to do in nHibernate as its designed for OLTP and does not even do UNION last I checked. That said, there is nothing wrong with abusing HQL (Hibernate Query Language) for doing lightweight reporting, as long as you understand you are abusing it.
I see Stefan has done a grand job of describing the syntax for that, so I'll stop there :-)