How to query a phrase with stopwords in ElasticSearch - lucene

I am indexing some text with stopwords enabled and I would like to search against these using "match phrase" query without slop, but it looks like stopwords are still taking in account for terms positions.
Building index:
PUT /fr_articles
{
"settings": {
"analysis": {
"analyzer": {
"stop": {
"type": "standard",
"stopwords" : ["the"]
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "string",
"analyzer": "stop"
}
}
}
}
}
Add a document:
POST /fr_articles/test/1
{
"title" : "Tom the king of Toulon!"
}
Search:
POST /fr_articles/_search
{
"fields": [
"title"
],
"explain": true,
"query": {
"match": {
"title": {
"query": "tom king",
"type" : "phrase"
}
}
}
}
Nothing found ;-(
Is there a way to fix it? Or maybe with multiple span queries, but I want the term near each other.
Thanks you,

The position increments cause this issue, yes. While the stop word may be gone and not searchable, it still doesn't shove the two words up next to each other, so the query "tom the king" finds neither "tom king" nor "such that tom will not be their king".
Often, when you remove something in analysis with a filter, it's not quite as if it was never there. The intent of StopFilter, in particular, is to remove search hits resulting from uninteresting terms. It is not to change the structure of the document or a sentence.
You used to be able to disable position increments, on StopFilter, but that option has been removed, as of Lucene 4.4.
Okay, forget that CharFilter tomfoolery. Ugly hack, don't do that.
To query without using position increments, you need to configure that in your query parser, not in the analysis. This can be done in elasticsearch, with a Query String Query, with enable_position_increments set to false.
Something like:
{
"query_string" : {
"default_field" : "title",
"query" : "\"tom king\""
"enable_position_increments" : false
}
}
As a point of interest, similar solution in raw Lucene, by setting QueryParser.setEnablePositionIncrements.

There was an option enable_position_increments: false that you could set e.g. in a stop filter, but it has been deprecated since Lucene 4.4
This is the related Lucene issue: https://issues.apache.org/jira/browse/LUCENE-4065
In other words, the best way to go at the moment is probably to use the slop option, until the Lucene issue is fixed

Related

How to update the pop-up note via API

I'm trying to update the pop-up note via the API. I can easily update the top box (aka the Note) but I don't see how I go about updating the pop-up section. What's odd to me is that the Note doesn't even appear in the WSE, abut when I send the update it does work.
When I retrieve the record, it also doesn't appear to send the data that I have in the pop-up section, and I'm not even clear how I can add it to the WSE.
I've tried just adding it to the JSON update with a couple different names like this (tried popupnote, notepopup), and that still goes through, but only updates the top box:
"note": {
"value": "Travis Update Test!"
},
"notepopup": {
"value": "Travis Pop update Test!"
},
Anyone know if this is possible?
The answer from Acumatica Support is below. In short you need to add a custom field in the items sectionm for the 2 notes and it works perfectly. When loading the items, if you plan to serialize into this class, add this ?$custom=Item.NoteText,Item.NotePopupText to the end of your url:
{
"id": "2a113b2c-d87f-e411-beca-00b56d0561c2",
"custom": {
"Item": {
"NoteText": {
"type": "CustomStringField",
"value": "Regular note 2"
},
"NotePopupText": {
"type": "CustomStringField",
"value": "Popup note 2"
}
}
}
}

i18next: Custom json format with comments for translation bureau

We are using a custom json format for our i18n resources that contain comments for the translation bureau, so they understand better the context of the strings to translate:
Example en.json:
{
"headerbar": {
"search": {
"placeholder": {
"value": "Enter your search here...",
"comment": "This string will be shown in the search input if empty. Truncated after 100 characters."
}
},
"welcome": {
"heading": {
"value": "Welcome, {{name}}!",
"comment": "This string should not be longer than 50 characters."
}
}
}
How can I configure i18next (or react-i18next) such that the translation is always retrieved from the value property? Without having to use {returnObjects} in every t().
t('headerbar.search.placeholder') // === 'Enter your search here...'
t('welcome.heading', {name: 'Bob'}) // === 'Welcome, Bob!'
I also have this requirement, but it appears i18next does not have the capability to define comments or descriptions, because 1) the API doesn't have a way to define those, and 2) the most popular extractor, i18next-parser, doesn't support generating files with comments included.
Alternatively, you could consider Format.JS which has this capability:
https://formatjs.io/docs/getting-started/message-declaration

REST API - One to zero-or-one association

I've tried to find a good way to work with one to zero-or-one association.
I have two resources. The first resource has one to zero-or-one association with the second resource.
(in the example below I will use Page and Line. You can think that a Page can only have "one" or "zero" Line)
At the first moment I thought to retrieve the data by using this approach:
/api/pages/:id/
When the Page has one Line
{
"id": 1,
"name": "Test",
"line": {
"id": 10,
"name": "aaa"
}
}
When the Page hasn't one Line
{
"id": 1,
"name": "Test"
}
That way when the developer gets a list of pages he doesn't need to make more requests to the API to get the Line of each Page.
But if the page doesn't have one Line, is the best way only to avoid to show the "line" and explain it in the documentation? Or add a boolean named "has_line"?
An alternative solution would be
{
"id": 1,
"name": "Test",
"lines": [{
"id": 10,
"name": "aaa"
}]
}
and
{
"id": 1,
"name": "Test",
"lines": []
}
But if you are certain there won't be more lines later, then I would stick with your approach. You need to document it no matter which approach you choose. No need to have a hasLine boolean.

Apache Solr 7.4 : Nesting with "_childDocuments_" not working, the document is still flat

My use case is this : I have a Parent -> Children -> GrandChildren hierarchy.
I would like to ingest documents as nested and would like to do BlockJoin queries to retrieve all grandchildren of a particular parent, all children of particular parent etc.
I have defined the appropriate fields in the schema (using curl) and copy fields and field-types as required by my application. I have also defined "text" as a copy field for everything as I have to support random searches.
I have defined the document to ingest as follows :
{
"id": "3443",
"path": "1.employee",
"employeeId": 3443,
"employeeName": "Tom",
"employeeCounty": "Maricopa",
"_childDocuments_": [{
"id": "3443.54545454",
"path": "2.employee.assets",
"assetId": 54545454,
"assetName": "Lenovo",
"assetType": "Laptop",
"_childDocuments_": [{
"id": "3443.54545454.5764646",
"path": "3.employee.assets.assetType",
"processorId": 5764646,
"processorType": "Intel core i7"
}]
}]
}
Now when I query using the Admin UI, I am getting the following flattened out object, also block join queries don't work as well :
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"_":"1533252181415"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"3443",
"employeeId":3443,
"text":["3443",
"Tom",
"Maricopa"],
"employeeName":"Tom",
"employeeCounty":"Maricopa",
"_childDocuments_.id":[3443.54545454,
3443.643534544],
"_childDocuments_.path":["2.employee.assets],
"_childDocuments_.assetId":[54545454,
643534544],
"_childDocuments_.assetName":["Lenovo"],
What am I missing? How can I make Solr process the nested documents like they are supposed to be rather than flattening them out?
Any help is appreciated.
Found the solution. I was using the wrong URL to post.
I was using http://localhost:8983/solr/my-core/update/json/docs
Instead I should just use http://localhost:8983/solr/my-core/update
Because I am already formatting the doc in Solr format and Solr neednt do any special processing to index it.

Does the JIRA REST API require submitting a transition ID when transitioning an issue?

If I POST an issue transition like this:
{
"fields" : {
"resolution" : {
"name" : "Fixed"
}
}
}
...I get this error:
{
"errorMessages" : ["Missing 'transition' identifier"],
"errors" : {}
}
This seems to imply that I need to include a transition ID along with my list of changed fields. https://stackoverflow.com/a/14642966/565869 seems to say the same. Fine.
However, transition IDs appear to be global. It's not enough to look up the highest transition ID for this issue and increment it; such an ID is probably in use elsewhere. At some expense, I could get the highest transaction ID used anywhere in the system; this might be 68,000 at this moment. But if I were then to use transaction ID 68,001 there's a real chance that a GUI user would attempt a transition of their own and use this ID before I could.
I could use transaction IDs in the range of 1,000,001 and up, but if the JIRA web GUI uses the highest previously used transaction ID when generating new IDs I'll just collide in this range instead of the 68,000 range. I could use 69,000 and trust that there won't be a thousand transitions in the length of time it takes to get the highest transaction ID.
These both seem terribly clumsy, however. Is there no way to post a transition and let JIRA generate its own unique ID? I don't need to retrieve the generated IDs, I just want to update issues' statuses and resolutions.
You're getting mixed up a bit. So lets see if I can explain it better for you.
To transition a JIRA Issue, you use the Transition ID to identify what transition to apply to the issue. You aren't specifying an ID for a transaction or a transition ID to identify that the transition occurred, JIRA takes care of this for you.
The easiest way to understand it is to see it.
So first you can look at what transitions are available to an Issue by doing a GET to the API Call:
/rest/api/2/issue/${issueIdOrKey}/transitions
Example:
/rest/api/2/issue/ABC-123/transitions
Which will show something like this:
{
"expand": "transitions",
"transitions": [
{
"id": "161",
"name": "Resolve",
"to": {
"description": "A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.",
"iconUrl": "https://localhost:8080/images/icons/statuses/resolved.png",
"id": "5",
"name": "Resolved",
"self": "https://localhost:8080/rest/api/2/status/5"
}
}
]
}
So you can see only 1 transition is available for issue ABC-123 and it has an ID of 161.
If you were to browse to that JIRA Issue through the GUI, you would see only 1 Transition available and it would match the API Call. In fact if you inspected the element you should see it having an a tag and in the href something like action=161
So should you want to transition this issue, you'll need to do a POST to the following URL:
/rest/api/2/issue/ABC-123/transitions
With JSON like this:
{
"update": {
"comment": [
{
"add": {
"body": "Bug has been fixed."
}
}
]
},
"fields": {
"assignee": {
"name": "bob"
},
"resolution": {
"name": "Fixed"
}
},
"transition": {
"id": "161"
}
}
Which uses the transition ID found from the call that shows all transitions. I also update the resolution and assignee and add comments at the same time.
That make a bit more sense?