Best design approach to query documents for 'labels'

Best design approach to query documents for 'labels' - ravendb

I am storing documents - and each document has a collection of 'labels' - like this. Labels are user defined, and could be any plain text.
{
"FeedOwner": "4ca44f7d-b3e0-4831-b0c7-59fd9e5bd30d",
"MessageBody": "blablabla",
"Labels": [
{
"IsUser": false,
"Text": "Mine"
},
{
"IsUser": false,
"Text": "Incomplete"
}
],
"CreationDate": "2012-04-30T15:35:20.8588704"
}
I need to allow the user to query for any combination of labels, i.e.
"Mine" OR "Incomplete"
"Incomplete" only
or
"Mine" AND NOT "Incomplete"
This results in Raven queries like this:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND (Labels,
Text:Mine) AND (Labels,Text:Incomplete)
I realise that Raven will generate a 'dynamic index' for queries it has not seen before. I can see with this, this could result in a lot of indexes.
What would be the best approach to achieving this functionality with Raven?
[EDIT]
This is my Linq, but I get an error from Raven "All is not supported"
var result = from candidateAnnouncement in session.Query<FeedAnnouncement>()
where listOfRequiredLabels.All(
requiredLabel => candidateAnnouncement.Labels.Any(
candidateLabel => candidateLabel.Text == requiredLabel))
select candidateAnnouncement;
[EDIT]
I had a similar question, and the answer for that resolved both questions: Raven query returns 0 results for collection contains

Please notice that in case of FeedOwner being a unique property of your documents the query doesn't make a lot of sense at all. In that case, you should do it on the client using standard linq to objects.
Now, given that FeedOwner is not something unique, your query is basically correct. However, depending on what you actually want to return, you may need to create a static index instead:
If you're using the dynamically generated indexes, then you will always get the documents as the return value and you can't get the particular labels which matched the query. If this is ok for you, then just go with that approach and let the query optimizer do its job (only if you have really a lot of documents build the index upfront).
In the other case, where you want to use the actual labels as the query result, you have to build a simple map index upfront which covers the fields you want to query upon, in your sample this would be FeedOwner and Text of every label. You will have to use FieldStorage.Yes on the fields you want to return from a query, so enable that on the Text property of your labels. However, there's no need to do so with the FeedOwner property, because it is part of the actual document which raven will give you as part of any query results. Please refer to ravens documentation to see how you can build a static index and use field storage.

Related

How can I query data from FaunaDb if only some collections have a specific property which I need to filter out

I'm really new to FaunaDb, and I currently have a collection of Users and an Index from that collection: (users_waitlist) that has fewer fields.
When a new User is created, the "waitlist_meta" property is an empty array initially, and when that User gets updated to join the waitlist, a new field is added to the User's waitlist_meta array.
Now, I'm trying to get only the collections that contain the added item to the waitlist_meta array (which by the way, is a ref to another index (products)). In a other words: if the array contains items, then return the collection/index
How can I achieve this? By running this query:
Paginate(Match(Index('users_waitlist')))
Obviously, I'm still getting all collections with the empty array (waitlist_meta: [])
Thanks in advance

you need to add terms to your index, which are explained briefly here.
the way I find it useful to conceptualise this is that when you add terms to an index, it's partitioned into separate buckets so that later when you match that index with a specific term, the results from that particular bucket are returned.
it's a slightly more complicated case here because you need to transform your actual field (the actual value of waitlist_meta) into something else (is waitlist_meta defined or not?) - in fauna this is called a binding. you need something along the lines of:
CreateIndex({
"name": "users_by_is_on_waitlist",
"source": [{
"collection": Collection("users"),
"fields": {
"isOnWaitlist": Query(Lambda("doc", ContainsPath(["data", "waitlist_meta"], Var("doc"))))
}
}],
"terms": [{
"binding": "isOnWaitlist"
}]
})
what this binding does is run a Lambda for each document in the collection to compute a property based on the document's fields, in our case here it's isOnWaitlist, which is defined by whether or not the document contains the field waitlist_meta. we then add this binding as a term to the index, meaning we can later query the index with:
Paginate(Match("users_by_is_on_waitlist", true))
where true here is the single term for our index (it could be an array if our index had multiple terms). this query should now return all the users that have been added to the waitlist!

Error trying to reorder items within another list in Keystone 6

I'm using KeystoneJS v6. I'm trying to enable functionality which allow me to reorder the placement of images when used in another list. Currently i'm setting up the image list below, however I'm unable to set the defaultIsOrderable to true due to the error pasted.
KeystoneJS list:
Image: list({
fields: {
title: text({
validation: { isRequired: true },
isIndexed: 'unique',
isFilterable: true,
isOrderable: true,
}),
images: cloudinaryImage({
cloudinary: {
cloudName: process.env.CLOUDINARY_CLOUD_NAME,
apiKey: process.env.CLOUDINARY_API_KEY,
apiSecret: process.env.CLOUDINARY_API_SECRET,
folder: process.env.CLOUDINARY_API_FOLDER,
},
}),
},
defaultIsOrderable: true
}),
Error message:
The expected type comes from property 'defaultIsOrderable' which is declared here on type 'ListConfig<BaseListTypeInfo, BaseFields<BaseListTypeInfo>>'
Peeking at the definition of the field shows
defaultIsOrderable?: false | ((args: FilterOrderArgs<ListTypeInfo>) => MaybePromise<boolean>);

Looking at the schema API docs, the defaultIsOrderable lets you set:
[...] the default value to use for isOrderable for fields on this list.
You're trying to set this to true but, according to the relevant section of the field docs, the isOrderable field option already defaults to true.
I believe this is why the defaultIsOrderable type doesn't allow you to supply the true literal – doing so would be redundant.
So that explains the specific error your getting but I think you also may have misunderstood the purpose of the orderBy option.
The OrderBy Option
The field docs mention the two effects the field OrderBy option has:
If true (default), the GraphQL API and Admin UI will support ordering by this field.
Take, for example, your Image list above.
As the title field is "orderable", it is included in the list's orderBy GraphQL type (ImageOrderByInput).
When querying the list, you can order the results by the values in this field, like this:
query {
images (orderBy: [{ title: desc }]) {
id
title
images { publicUrl }
}
}
The GraphQL API docs have some details on this.
You can also use the field to order items when listing them in the Admin UI, either by clicking the column heading or selecting the field from the "sort" dropdown:
Note though, these features order items at runtime, by the values stored in orderable fields.
They don't allow an admin to "re-order" items in the Admin UI (unless you did so by changing the image titles in this case).
Specifying an Order
If you want to set the order of items within a list you'd need to store separate values in, for example, a displayOrder field like this:
Image: list({
fields: {
title: text({
validation: { isRequired: true },
isIndexed: 'unique',
isFilterable: true,
}),
displayOrder: integer(),
// ...
},
}),
Unfortunately Keystone doesn't yet give you a great way to manage this the Admin UI (ie. you can't "drag and drop" in the list view or anything like that). You need to edit each item individually to set the displayOrder values.
Ordering Within a Relationship
I notice your question says you're trying to "reorder the placement of images when used in another list" (emphasis mine).
In this case you're talking about relationships, which changes the problem somewhat. Some approaches are..
If the relationship is one-to-many, you can use the displayOrder: integer() solution shown above but the UX is worse again. You're still setting the order values against each item but not in the context of the relationship. However, querying based on these order values and setting them via the GraphQL API should be fairly straight forward.
If the relationship is many-to-many, it's similar but you can't store the "displayOrder" value in the Image list as any one image may be linked to multiple other items. You need to store the order info "with" the relationship itself. It's not trivial but my recent answer on storing additional values on a many-to-many relationship may point you in the right direction.
A third option is to not use the relationship field at all but to link items using the inline relationships functionality of the document field. This is a bit different to work with - easier to manage from the Admin UI but less powerful in GraphQL as you can't traverse the relationship as easily. However it does give you a way to manage a small, ordered set of related items in a many-to-many relationship.
You can save an ordered set of ids to a json field. This is similar to using a document field but a more manual.
Hopefully that clears up what's possible with the current "orderBy" functionality and relationship options. Which of these solutions is most appropriate depends heavily on the specifics of your project and use case.
Note too, there are plans to extend Keystone's functionality for sorting and reordering lists from both the DX and UX perspectives.
See "Sortable lists" on the Keystone roadmap.

FaunaDB: Query for all documents not referenced by another collection

I'm working on an app where users learn about different patterns of grammar in a language. There are three collections; users and patterns are interrelated by progress, which looks like this:
Create(Collection("progress"), {
data: {
userRef: Ref(Collection("users"), userId),
patternRef: Ref(Collection("patterns"), patternId),
initiallyLearnedAt: Now(),
lastReviewedAt: Now(),
srsLevel: 1
}
})
I've learned how to do some basic Fauna queries, but now I have a somewhat more complex relational one. I want to write an FQL query (and the required indexes) to retrieve all patterns for which a given user doesn't have progress. That is, everything they haven't learned yet. How would I compose such a query?

One clarifying assumption - a progress document is created when a user starts on a particular pattern and means the user has some progress. For example, if there are ten patterns and a user has started two, there will be two documents for that user in progress.
If that assumption is valid, your question is "how can we find the other eight?"
The basic approach is:
Get all available patterns.
Get the patterns a user has worked on.
Select the difference between the two sets.
1. Get all available patterns.
This one is trivial with the built-in Documents function in FQL:
Documents(Collection("patterns"))
2. Get the patterns a user has worked on.
To get all the patterns a user has worked on, you'll want to create an index over the progress collection, as you've figured out. Your terms are what you want to search on, in this case userRef. Your values are the results you want back, in this case patternRef.
This looks like the following:
CreateIndex({
name: "patterns_by_user",
source: Collection("progress"),
terms: [
{ field: ["data", "userRef"] }
],
values: [
{ field: ["data", "patternRef"] }
],
unique: true
})
Then, to get the set of all the patterns a user has some progress against:
Match(
"patterns_by_user",
Ref(Collections("users"), userId)
)
3. Select the difference between the two sets
The FQL function Difference has the following signature:
Difference( source, diff, ... )
This means you'll want the largest set first, in this case all of the documents from the patterns collection.
If you reverse the arguments you'll get an empty set, because there are no documents in the set of patterns the user has worked on that are not also in the set of all patterns.
From the docs, the return value of Difference is:
When source is a Set Reference, a Set Reference of the items in source that are missing from diff.
This means you'll need to Paginate over the difference to get the references themselves.
Paginate(
Difference(
Documents(Collection("patterns")),
Match(
"patterns_by_user",
Ref(Collection("users"), userId)
)
)
)
From there, you can do what you need to do with the references. As an example, to retrieve all of the data for each returned pattern:
Map(
Paginate(
Difference(
Documents(Collection("patterns")),
Match(
"patterns_by_user",
Ref(Collection("users"), userId)
)
)
),
Lambda("patternRef", Get(Var("patternRef")))
)
Consolidated solution
Create the index patterns_by_user as in step two
Query the difference as in step three

Creating Mandatory User Filters with multiple element IDs

Mandatory User Filters
I am working on a tool to allow customers to apply Mandatory User Filters. When attributes are loaded like "Year" or "Age", each can have hundreds of elements with the subsequent ids. In the POST request to create a filter (documented here: https://developer.gooddata.com/article/lets-get-started-with-mandatory-user-filters), looks like this:
{
"userFilter": {
"content": {
"expression": "[/gdc/md/{project-id}/obj/{object-id}]=[/gdc/md/{project-id}/obj/{object-id}/elements?id={element-id}]"
},
"meta": {
"category": "userFilter",
"title": "My User Filter Name"
}
}
}
In the "expression" property, it notes how one ID could be set. What I want is to have multiple ids associated with the object-id set with the post. For example, if I user wanted to add a filter to all of the elements in "Year" (there are 150) in the demo project, it seems odd to make 150 post requests.
Is there a better way?
UPDATE
Tomas thank you for your help.
I am not having trouble assigning multiple userfilters to a user. I can easily apply a singular filter to a user with the method outlined in the documentation. However, this overwrites the userfilter field. What is the syntax for this?
Here is my demo POST data:
{ "userFilters":
{ "items": [
{ "user": "/gdc/account/profile/decd0b2e3077cf9c47f8cfbc32f6460e",
"userFilters":["/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808729","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728"]
}
]
}
}
This receives a BAD REQUEST.

I'm not sure what you mean by "have multiple ids associated with the object-id" exactly, but I'll try to tell you all I know about it. :-)
If you indeed made multiple POST requests, created multiple userFilters and set them all for one user, the user wouldn't see anything at all. That's because the system combines separate userFilters using logical AND, and a Year cannot be 2013 and 2014 at the same time. So for the rest of my answer, I'll assume that you want OR instead.
There are several ways to do this. As you may have guessed by now, you can use AND/OR explicitly, using an expression like this:
[/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}] OR [/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}]
This can often be further simplified to:
[/…/obj/{object-id}] IN ( [/…/obj/{object-id}/elements?id={element-id}], [/…/obj/{object-id}/elements?id={element-id}], … )
If the attribute is a date (year, month, …) attribute, you could, in theory, also specify ranges using BETWEEN instead of listing all elements:
[/…/obj/{object-id}] BETWEEN [/…/obj/{object-id}/elements?id={element-id}] AND [/…/obj/{object-id}/elements?id={element-id}]
It seems, though, that this only works in metrics MAQL and is not allowed in the implementation of user filters. I have no idea why.
Also, for your own attribute like Age, you can't do that since user-defined numeric attributes aren't supported. You could, in theory, add a fact that holds the numeric value, and construct a BETWEEN filter based on that fact. It seems that this is not allowed in the implementation of user filters either. :-(
Hope this helps.

not_indexed field is stored in index

I'm trying to optimize my elasticsearch scheme.
I have a field which is a URL - I do not want to be able to query or filter it, just retreive it.
My understanding is that a field that is defined as "index":"no" is not indexed, but is still stored in the index.
(see slide 5 in http://www.slideshare.net/nitin_stephens/lucene-basics)
This should match to Lucene UnIndexed, right?
This confuses me, is there a way to store some fields, without them taking more storage than simply their content, and without encumbering the index for the other fields?
What am I missing?

I'm new to posting on stack exchange but believe I can help a bit!
There are a few considerations here:
Analyzing
As you don't want to do extra work you should set "index": "no". This will mean the field will not be run through any tokenizers and filters.
Furthermore it will not be searchable when directing a query at the specific field: (no hits)
"query": {
"term": {
"url": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
*here "url" is the field name.
However the field will still be searchable in the _all field: (might have a hit)
"query": {
"term": {
"_all": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
_all field
By default every field gets put in the _all field. Set "include_in_all": "false" to stop that. This might not be an issue with you as you are unlikely to search against the _all field with a URL by mistake.
I was working with a schema where countries were stored as 2 letter codes, e.g.: "NO" means Norway, and it is possible someone might do a search against the all field with "NO", so I make sure to set "include_in_all": "false".
Note: Any query where you don't specify a field explicitly will be executed against the _all field.
Storing
By default, elasticsearch will store your entire document (unanalyzed, as you sent it) and this will be returned to you in a hit's _source field. If you turn this off (if your elasticsearch db is getting huge perhaps?) then you need to explicitly set "store": "yes" to store fields individually. (One thing to notice is that store takes yes or no and not true or false - it tripped me up!)
Note: if you do this you will need to request the fields you want returned to you explicitly. e.g.:
curl -XGET http://path/index_name/type_name/id?fields=url,another_field
finally...
I would leave elasticsearch to store your whole document (as the default) and use the following mapping.
"type_name": {
"properties": {
"url": {
"type": "string",
"index": "no",
"include_in_all": "false"
},
// other fields' mappings
}
}
Source: elasticsearch documentation

There are two ways to input data into the index. Indexing and Storing. Indexing a piece of data means that it is tokenized, and placed in the inverted index, and can be searched. Storing data means it is not tokenized, or analyzed or anything, and is not added to the inverted index. It is stored in an entirely separate area, in it's full text form. It can not be searched against, but can be retrieved, in it's original form, by it's document ID.
The typical Lucene query process is to query against indexed data, and get the back Document IDs of matching documents, then to use those document IDs to retrieve the stored data for those documents, and display it to the user.
Data which is indexed, but not stored is searchable, but can not be retrieved in it's original form.
Data which is stored, but not indexed can be retrieved once you have found a hit, but is not searchable.
Data which is indexed and stored can be searched or retrieved.
Data which is neither can not be added to the index at all.
This is covered a bit in the Lucene FAQ.

You are looking for the 'index' => 'not_analyzed' mapping option.
Also, if you use the _source, you do not have to specify the store => false option.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas