How to create elasticsearch index alias that excludes specific fields - indexing

I'm using Elasticsearch's index aliases to create restricted views on a more-complete index to support a legacy search application. This works well. But I'd also like to exclude certain sensitive fields from the returned result (they contain email addresses, and we want to preclude harvesting.)
Here's what I have:
PUT full-index/_alias/restricted-index-alias
{
"_source": {
"exclude": [ "field_with_email" ]
},
"filter": {
"term": { "indexflag": "noindex" }
}
}
This works for queries (I don't see field_with_email), and the filter term works (I get a restricted index) but I still see the field_with_email in query results from the index alias.
Is this supposed to work?
(I don't want to exclude from _source in the mapping, as I'm also using partial updates; these are easier if the entire document is available in _source.)

No, it is not supposed to work, and the documentation doesn't suggest that it should work.

Related

How to specify 'includes' for nested properties in a document in RavenDB

I am trying to use RavenDB's REST API to make some calls to my database and I'm wondering if there's a way to use the 'includes' feature to return documents that are nested in a document.
For example, I have an object, Order, that looks similar to this:
"Order": {
"Lines": [
{
"Product": "products/11-A"
},
{
"Product": "products/42-A"
},
{
"Product": "products/72-A"
}
],
"OrderedAt": "1996-07-04T00:00:00.0000000",
"Company": "companies/85-A"
}
Company maps to another document and is simple enough to include in the query.
{ "Query": "from Orders include Company" }
My problem is Product that is nested in Lines, which is an array of order lines. Since I didn't find anything in the documentation about it I tried things like include Product or include Lines.Product, but those didn't work.
Is this kind of thing possible with the REST API? If so, how would I go about doing it?
The syntax to Query for Related Documents from the client can be found in this demo:
https://demo.ravendb.net/demos/csharp/related-documents/query-related-documents
The matching RQL to be used in the Query when using REST API is:
from 'Orders' include 'Lines[].Product'

Select * Except particular properties in Cosmos DB with SQL API

Consider the following, I have a document that looks something like this:
"id": 2
"properties": {
"desired": {
"Property1": 10,
"Property2": 1,
"Property3": 1,
"$metadata": {
...
},
"$version": 53
}
},
I want to get everything from the document EXCEPT $metadata and $version The obvious solution would be to:
SELECT c["Property1"], c["Property2"] .... FROM c where c["id"] = "2"
However, my document may expand dynamically, hence why the above is suboptimal. I therefore figured that it may be better to exclude just $metadata and $version. I looked at different "interesting" solutions here on stackoverflow, amongst which one suggests to create a temporary table.
Unfortunately, the query needs to be very efficient, because I want to reduce the amount of RUs used. Also I really want to avoid handling the exclusion in the code.
Therefore, how do I exclude particular "columns" from my document, without writing an excessively long query, which may include creating temporary tables.
Cosmos DB does not support "Project Away". You will need to specify properties to project or use * and return all of them.

Creating Mandatory User Filters with multiple element IDs

Mandatory User Filters
I am working on a tool to allow customers to apply Mandatory User Filters. When attributes are loaded like "Year" or "Age", each can have hundreds of elements with the subsequent ids. In the POST request to create a filter (documented here: https://developer.gooddata.com/article/lets-get-started-with-mandatory-user-filters), looks like this:
{
"userFilter": {
"content": {
"expression": "[/gdc/md/{project-id}/obj/{object-id}]=[/gdc/md/{project-id}/obj/{object-id}/elements?id={element-id}]"
},
"meta": {
"category": "userFilter",
"title": "My User Filter Name"
}
}
}
In the "expression" property, it notes how one ID could be set. What I want is to have multiple ids associated with the object-id set with the post. For example, if I user wanted to add a filter to all of the elements in "Year" (there are 150) in the demo project, it seems odd to make 150 post requests.
Is there a better way?
UPDATE
Tomas thank you for your help.
I am not having trouble assigning multiple userfilters to a user. I can easily apply a singular filter to a user with the method outlined in the documentation. However, this overwrites the userfilter field. What is the syntax for this?
Here is my demo POST data:
{ "userFilters":
{ "items": [
{ "user": "/gdc/account/profile/decd0b2e3077cf9c47f8cfbc32f6460e",
"userFilters":["/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808729","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728"]
}
]
}
}
This receives a BAD REQUEST.
I'm not sure what you mean by "have multiple ids associated with the object-id" exactly, but I'll try to tell you all I know about it. :-)
If you indeed made multiple POST requests, created multiple userFilters and set them all for one user, the user wouldn't see anything at all. That's because the system combines separate userFilters using logical AND, and a Year cannot be 2013 and 2014 at the same time. So for the rest of my answer, I'll assume that you want OR instead.
There are several ways to do this. As you may have guessed by now, you can use AND/OR explicitly, using an expression like this:
[/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}] OR [/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}]
This can often be further simplified to:
[/…/obj/{object-id}] IN ( [/…/obj/{object-id}/elements?id={element-id}], [/…/obj/{object-id}/elements?id={element-id}], … )
If the attribute is a date (year, month, …) attribute, you could, in theory, also specify ranges using BETWEEN instead of listing all elements:
[/…/obj/{object-id}] BETWEEN [/…/obj/{object-id}/elements?id={element-id}] AND [/…/obj/{object-id}/elements?id={element-id}]
It seems, though, that this only works in metrics MAQL and is not allowed in the implementation of user filters. I have no idea why.
Also, for your own attribute like Age, you can't do that since user-defined numeric attributes aren't supported. You could, in theory, add a fact that holds the numeric value, and construct a BETWEEN filter based on that fact. It seems that this is not allowed in the implementation of user filters either. :-(
Hope this helps.

not_indexed field is stored in index

I'm trying to optimize my elasticsearch scheme.
I have a field which is a URL - I do not want to be able to query or filter it, just retreive it.
My understanding is that a field that is defined as "index":"no" is not indexed, but is still stored in the index.
(see slide 5 in http://www.slideshare.net/nitin_stephens/lucene-basics)
This should match to Lucene UnIndexed, right?
This confuses me, is there a way to store some fields, without them taking more storage than simply their content, and without encumbering the index for the other fields?
What am I missing?
I'm new to posting on stack exchange but believe I can help a bit!
There are a few considerations here:
Analyzing
As you don't want to do extra work you should set "index": "no". This will mean the field will not be run through any tokenizers and filters.
Furthermore it will not be searchable when directing a query at the specific field: (no hits)
"query": {
"term": {
"url": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
*here "url" is the field name.
However the field will still be searchable in the _all field: (might have a hit)
"query": {
"term": {
"_all": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
_all field
By default every field gets put in the _all field. Set "include_in_all": "false" to stop that. This might not be an issue with you as you are unlikely to search against the _all field with a URL by mistake.
I was working with a schema where countries were stored as 2 letter codes, e.g.: "NO" means Norway, and it is possible someone might do a search against the all field with "NO", so I make sure to set "include_in_all": "false".
Note: Any query where you don't specify a field explicitly will be executed against the _all field.
Storing
By default, elasticsearch will store your entire document (unanalyzed, as you sent it) and this will be returned to you in a hit's _source field. If you turn this off (if your elasticsearch db is getting huge perhaps?) then you need to explicitly set "store": "yes" to store fields individually. (One thing to notice is that store takes yes or no and not true or false - it tripped me up!)
Note: if you do this you will need to request the fields you want returned to you explicitly. e.g.:
curl -XGET http://path/index_name/type_name/id?fields=url,another_field
finally...
I would leave elasticsearch to store your whole document (as the default) and use the following mapping.
"type_name": {
"properties": {
"url": {
"type": "string",
"index": "no",
"include_in_all": "false"
},
// other fields' mappings
}
}
Source: elasticsearch documentation
There are two ways to input data into the index. Indexing and Storing. Indexing a piece of data means that it is tokenized, and placed in the inverted index, and can be searched. Storing data means it is not tokenized, or analyzed or anything, and is not added to the inverted index. It is stored in an entirely separate area, in it's full text form. It can not be searched against, but can be retrieved, in it's original form, by it's document ID.
The typical Lucene query process is to query against indexed data, and get the back Document IDs of matching documents, then to use those document IDs to retrieve the stored data for those documents, and display it to the user.
Data which is indexed, but not stored is searchable, but can not be retrieved in it's original form.
Data which is stored, but not indexed can be retrieved once you have found a hit, but is not searchable.
Data which is indexed and stored can be searched or retrieved.
Data which is neither can not be added to the index at all.
This is covered a bit in the Lucene FAQ.
You are looking for the 'index' => 'not_analyzed' mapping option.
Also, if you use the _source, you do not have to specify the store => false option.

ElasticSearch mapping for nested enumerable objects (i18n)

I'm at a loss as to how to map a document for search with the following structure:
{
"_id": "007ff234cb2248",
"ids": {
"source1": "123",
"source2": "456",
"source3": "789"
}
"names": [
{"en":"Example"},
{"fr":"exemple"},
{"es":"ejemplo"},
{"de":"Beispiel"}
],
"children" : [
{
"ids": {
"source1": "CXXIII",
"source2": "CDLVI",
"source3": "DCCLXXXIX",
}
names: [
{"en":"Example Child"},
{"fr":"exemple enfant"},
{"es":"Ejemplo niño"},
{"de":"Beispiel Kindes"}
]
}
],
"relatives": {
// Typically no "ids" at this level.
"relation": 'uncle',
"children": [
{
"ids": {
"source1": "0x7B",
"source2": "0x1C8",
"source3": "0x315"
},
"names": [
{"en":"Example Cousin"},
{"fr":"exemple cousine"},
{"es":"Ejemplo primo"},
{"de":"Beispiel Cousin"}
]
}
]
}
}
The child object may appear in the children section directly, or further nested in my document as uncle.children (cousins, in this case). The IDs field is common to levels one (the root), level two (the children and the uncle), and to level three (the cousins), the naming structure is also common to levels one and three.
My use-case is to be able to search for IDs (nested objects) by prefix, and by the whole ID. And also to be able to search for child names, following an (as yet undefined) set of analyzer rules.
I haven't been able to find a way to map these in any useful way. I don't believe I'll have much success using the same technique for ids and names, as there's an extra level of mapping between names and the document root.
I'm not even certain that it is even mappable. I believe at least in principle that the ids should be mappable as terms, and perhaps that if I index the names as terms in some way, too.
I'm simply at a loss, and the documentation doesn't seem to cover anything like this level of complex mapping.
I have limited (read: no) control of the document as it's coming from the CouchDB river, and the upstream application already relies on this format, so I can't really change it.
I'm looking for being able to search by the following pseudo conditions, all of which should match:
ID: "123"
ID by source (I don't know how best to mark this up in pseudo language)
ID prefix: "CDL"
Name: "Example", "Example Child"
Localized name (I don't even know how best to pseudo-mark this up!
The specifics of tokenising and analysis I can figure out for myself, when I at least know how to map
Objects when both the key and the value of the object properties are important
Enumerable objects when the key and value are important.
If the mapping from an ID to its children is 1-to-many, then you could store the children's names in a child field, as a field can have multiple values. Each document would then have an ID field, possibly a relation field, and zero or more child fields.