How to match following queries in Azure Search - lucene

I have the default Analyzer set for my index and the fields in Azure Search.
I have following values for a field - name.
Demo 001
Demo Site 001
001 Demo Site
I am trying to get matching values for following . My sample queries are
$count=true&queryType=full&searchFields=name&searchMode=any&$select=name,id&$skip=0&$top=10&search=name:/"Demo(.*)/
I could get all the results
In order to get the query work for getting only Demo S, that is Demo Site 001. What change I should make to the Query? Or what change I should make to the analyzer?
If I want to get a query working with 001, 001 and a space how can I modify the query?
Finally is there any way I could tell the search that I need only the properties which starts with 001?
Is it possible to achieve all the above three conditions with a single setup?

There are 2 probable ways to achieve this.
A. Custom Analyzer with a CharMap filter
1. For index phase, you can use a Custom Analyzer with a character filter to map whitespaces to underscores/emptystring.
eg:If you map whitespaces to emptystring, your data will be stored as:
Demo Site 001 ---> DemoSite001
001 Demo Site ---> 001DemoSite
"charFilters":[
{
"name":"map_dash",
"#odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
"mappings":[" =>"]
}
In query phase,
Step 1. Parse the query and substitute whitespace with the same identifier, as used in the index phase.
So , search query "Demo S" translates to ---> "DemoS"
Step 2. Do a wildcard search for the new query string
search = DemoS*
B. Custom Analyzer with an EdgeNGramToken Filter
Use a custom analyzer , with a EdgeNGram TokenFilter to index your documents.
eg:
"tokenFilters": [
{
"name": "edgeNGramFilter",
"#odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram": 2,
"maxGram": 20
}
],
"analyzers": [
{
"name": "prefixAnalyzer",
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer": "keyword",
"tokenFilters": [ "lowercase", "edgeNGramFilter" ]
}
]
With any of these approach
"Demo S" will return only Demo Site 001
"001 " will only return 001 Demo Site
More details :
How Search works
Custom Analyzers

Related

CosmosDB: Is it a good practice to use ORDER BY on the property that is also used in range filter?

When I ran the query below on CosmosDB Explorer on Azure portal, several hundreds of RUs was consumed according to Query Stats.
select * from c where c.name = "john" and c._ts > 0
But after I added order by c._ts to the query above, only roughly 20 RUs was consumed.
According to the similar question, this behavior is expected.
(But I don't really understand why range filter is not enough to avoid looking at unnecessary indices)
So is it a good practice to use ORDER BY on the properties that are also used in range filter?
There is no guarantee that a ORDER BY query will use a range index although it normally does.
The best way to ensure you get a good index hit and thus lower RU consumption consistently is to use a composite index like below, of course adjusting your other properties as needed but you can see the _ts part in there as well.
This information can be found in the documentation here
{
"automatic":true,
"indexingMode":"Consistent",
"includedPaths":[
{
"path":"/*"
}
],
"excludedPaths":[],
"compositeIndexes":[
[
{
"path":"/foodGroup",
"order":"ascending"
},
{
"path":"/_ts",
"order":"ascending"
}
]
]
}

Using Aggregate to rename a field in MongoDB

Working on query to rename field 'expertise' to skills, expertise is an array which holds more than one so looking to slice it to 1.
Example of table:{ "_id" : "E08", "name" : "Damien Collins", "expertise" : [ "Python", "Java" ] }
Looking to show named, expertise as "skill" and just to show one piece of expertise.
Current query:
db.employees.aggregate([{expertise:{$exists:true}},{$project:{_id:1,"Skill":{expertise{$slice:1}},name:1}}])
Had it working before the rename of skill part.
Use $arrayElemAt to return 1 element of the array:
{$project:{_id:1,"Skill":{$arrayElemAt:["$expertise",0]},name:1}}

Is it possible to add the description or other custom field to query result log?

I have the following scheduled query in combination with a TLS plugin logger.
"vssadmin.exe": {
"query": "select * from file WHERE directory = 'C:\\Windows\\Prefetch\\' and filename like '%vssadmin%';",
"interval": 600,
"description": "Vssadmin Execute, usaullay used to execute activity on Volume Shadow copy",
"platform": "windows"
},
I'd like to add the description field to the result output log of this specific query, so I can use it to map my queries to a framework. Unfortunately the provided documentation doesn't state such option. Is it possible to add the description or other custom field to the logged output?
Like this?
Tag your #osquery queries/logs with MITRE ATT&CK IDs like so:
SELECT username,shell, 'T1136' AS attckID FROM users;

Why does Rails .select alias change attributes to lowercase?

In our controller, we are trying to show a video series, which should return JSON similar to this:
{
id: 1,
name: "Series Name",
videos: [
id: 2,
name: "Video Name",
isInPlaylist: true,
isFavorite: false
]
}
We are adding the isInPlaylist and isInFavorite attributes via another table where we store data if a user has acted upon a video (rated it, favorited it, etc.).
videos = series.videos
.where('videos.is_live = true')
.joins("some join to user_videos join_table")
.select(
'videos.*,
coalesce(user_videos.rating, 0.0) as user_rating,
coalesce(user_videos.enqueue, \'false\') as isInPlaylist,
coalesce(user_videos.favorite, \'false\') as isFavorite'
)
Note that in our select statement those attributes are explicitly aliased as camel-cased values. However when we execute this query, these attributes are returned lower case instead:
{
isinplaylist: true,
isfavorite: false
}
This is not a Rails behavior, but rather a SQL behavior. Alias's are folded to lower case unless explicitly quoted. For an example, here is the output of a simple query in psql (the Postgres CLI program).
=# select created_at as theTimeNow from users limit 5;
thetimenow
----------------------------
2013-03-05 18:45:11.127092
2013-09-07 16:43:01.349823
2013-03-05 18:53:35.888306
2013-09-07 16:53:06.553129
2013-10-29 00:38:56.909418
(5 rows)
=# select created_at as "theTimeNow" from users limit 5;
theTimeNow
----------------------------
2013-03-05 18:45:11.127092
2013-09-07 16:43:01.349823
2013-03-05 18:53:35.888306
2013-09-07 16:53:06.553129
2013-10-29 00:38:56.909418
(5 rows)
Notice the column name outputs
Wrapping the alias in double quotes preserves case-sensitivity.
.select('foo as Bar') # => { bar: true }
.select('foo as "Bar"') # => { Bar: true }
The change to lower case is not an issue with the Rails .select() method but is enforced by the DB, in our case PostgreSQL, and is a practice called "Folding". Its worth noting that while PSQL will fold to lowercase letters, mySQL will fold to upper case letters.
I would argue however that it should still be included in the Rails API docs
¯\_(ツ)_/¯
I like you answer. The behaviour you see is rails default. As an alternative and
a more classic 'rails way' would be to use a json serializing library like jBuilder. It gives you lot more control over your API but your problem would be easy to fix in that using:
json.key_format! camelize: :lower
json.first_name 'David'
# => { "firstName": "David" }
To use something like this you would alias the columns as is_in_playlist format.
Here's a good place to start with jBuilder learning:
http://railscasts.com/episodes/320-jbuilder
Good tutorial on more json serializers:
http://railscasts.com/episodes/409-active-model-serializers

How to boost search based on index type in elasticsearch or lucene?

I have three food type indices "Italian", "Spanish", "American".
When the user searches "Cheese", documents from "Italian" appear to come up at the top. Is it possible to "boost" the results if I were to give preference to say "Spanish"? (I should still get results for Italian, but based on some numeric boost value for index type "Spanish", the ordering of the documents returned in the results give preference to the "Spanish" index. Is this possible in user input lucene and/or ES query? If so, how?
Add a term query with a boost for either the _type field or the _index (or both).
Use a script_score as part of the function score query:
function_score: {
script_score: {
script: "doc['_type'].value == '<your _type>' ? _score * <boost_factor> : _score"
}
}
If querying several indices at once, it is possible to specify indices boost at the top level of object passed to Search API:
curl -XGET localhost:9200/italian,spanish,american/_search -d '
{
"query":{"term":{"food_type":"cheese"}},
"indices_boost" : {
"ilalian" : 1.4,
"spanish" : 1.3,
"american" : 1.1
}
}'
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-index-boost.html#search-request-index-boost
For query-time boosting, queries (ex. query string) generally have a boost attribute you can set. Alternatively, you can wrap queries in a custom boost factor. I would probably prefer the former, usually.