Updating existing documents in elasticsearch - lucene

Is it possible add more fields to existing documents in elasticsearch?
I indexed for instance the following document:
{
"user":"xyz",
"message":"for increase in fields"
}
Now I want to add 1 more field to it i.e date:
{
"user":"xyz",
"message":"for increase in fields",
"date":"2013-06-12"
}
How can this be done?

For Elastic Search Check update
The update API also support passing a partial document (since 0.20),
which will be merged into the existing document (simple recursive
merge, inner merging of objects, replacing core “keys/values” and
arrays)
Solr 4.0 also supports partial updates. check Link

This can be done with a partial update (assuming the document has an ID of 1):
curl -XPOST 'http://localhost:9200/myindex/mytype/1/_update' -d '
{
"doc" : {
"date":"2013-06-12"
}
}'
Then query the document:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?q=user:xyz'
You should see something like:
"_id":"1",
"_source:
{
{
"user":"xyz",
"message":"for increase in fields",
"date":"2013-06-12"
}
}

Related

elasticsearch upsert without document id

I query my index to find a document. If I find a document I know the _id value to update else I don't have _id value.
Using the upsert below, I can update when I have _id. If I dont have _id how can I have elasticsearch to provide one and insert a new document?
Purpose: I dont want to have 2 functions, one to create a new doc and another to update it...
curl -XPOST 'localhost:9200/test/type1/{value_of_id}/_update' -d '{
"doc" : {
"name" : "new_name"
},
"doc_as_upsert" : true
}
Something like "update by query"?
See here:
https://github.com/elastic/elasticsearch/issues/2230
for the original issue/proposal, some experimental work toward implementation, discussion about the pros and cons of including, and a link to the plug-in that was developed to support the behavior.

How to do an automated index creation at ElasticSearch?

How to do an automated index creation at ElasticSearch?
Just like wordpress? See: http://gibrown.wordpress.com/2014/02/06/scaling-elasticsearch-part-2-indexing/
In our case we create one index for every 10 million blogs, with 25 shards per index.
Any light?
Thanks!
You do it in whatever your favorite scripting language is. You first run a query getting a count of the number of documents in the index. If it's beyond a certain amount you create a new one, either via an Elasticsearch API or a curl.
Here's the query to find the number of docs:
curl --XGET 'http://localhost:9200/youroldindex/_count'
Here's the index creation curl:
curl -XPUT 'http://localhost:9200/yournewindex/' -d '{
"settings" : {
"number_of_shards" : 25,
"number_of_replicas" : 2
}
}'
You will also probably want to create aliases so that your code can always point to a single index alias and then change the alias as you change your hot index:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html
You will probably want to predefine your mappings too:
curl -XPUT 'http://localhost:9200/yournewindex/yournewmapping/_mapping' -d '
{
"document" : {
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
'
Elasticsearch has fairly complete documentation, a few good places to look:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

GitHub API (v3): Order tags by creation date

I ran into a problem / question while using the GitHub API.
I need a list of all tags created after a single tag. The only way to do this, is to compare the tags by date. However, the results from the API aren't ordered by date:
Result from the API (rails repository example):
Results from the webinterface:
What i did expect is a list ordered by date. However, as you can see in the pictures: the API is returning v4.0.0rc1 & v4.0.0rc2 before the release of v4.0.0, while 4.0.0 is released after the release candidates. There isn't even a creation / commit date to order at server side.
The releases API isn't a solution either. This API is only returning releases created by Github, not the releases created by tags.
Is there any way to order the tags by date?
Thanks in advance!
Ruben
The Repositories API currently returns tags in the order they would be returned by the "git tag" command, which means they are alphabetically sorted.
The problem with sorting tags chronologically in Git is that there are two types of tags, lightweight and annotated), and for the lightweight type Git doesn't store the creation date.
The Releases/Tags UI currently sorts tags chronologically by the date of the commit to which the tag points to. This again isn't the date on which the tag itself was created, but it does establish a chronological order of things.
Adding this alternative sorting option to the API is on our feature request list.
With GraphQL API v4, we can now filter tags by commit date with field: TAG_COMMIT_DATE inside orderBy. The following will perform ascending sort of tags by commit date :
{
repository(owner: "rails", name: "rails") {
refs(refPrefix: "refs/tags/", last: 100, orderBy: {field: TAG_COMMIT_DATE, direction: ASC}) {
edges {
node {
name
target {
oid
... on Tag {
message
commitUrl
tagger {
name
email
date
}
}
}
}
}
}
}
}
Test it in the explorer
Here, the tagger field inside target will only be filled for annotated tag & will be empty for lightweight tags.
As date property in tagger gives the creation date of the tag (for annotated tag only), it's possible to filter by creation date on the client side easily (without having to retrieve all the tags 1 by 1)
Note that available options for orderBy.field at this time are TAG_COMMIT_DATE & ALPHABETICAL (no TAG_CREATION_DATE)
Edit: This is now possible using the GitHub GraphQL API.
As workaround, there is a node module for this,
which basically fetches the commit details of each tag:
github-api-tags-full
> npm install github-api-tags-full github moment
var GitHubApi = require('github'),
moment = require('moment'),
githubTags = require('github-api-tags-full');
var github = new GitHubApi({
version: '3.0.0'
});
githubTags({ user: 'golang', repo: 'go' }, github)
.then(function(tags) {
var tagsSorted = tags.sort(byAuthorDateAsc).reverse(); // descending
console.log(tagsSorted); // prints the array of tags sorted by their creation date
});
var byAuthorDateAsc = function(tagA, tagB) {
return githubCompareDates(
tagA.commit.author.date,
tagB.commit.author.date
);
};
var githubCompareDates = function(dateStrA, dateStrB) {
return moment(dateStrA).diff(dateStrB);
};
With best regards
You can use the Git References API.
This can return also all the tags matching a certain prefix.
In you case, you probably want something like:
https://api.github.com/repos/rails/rails/git/matching-refs/tags/v
Or in the case of a monorepo:
https://api.github.com/repos/grafana/loki/git/matching-refs/tags/helm-loki-
Downsides:
sorting: the results are sorted in increasing semver order and you will get the oldest first.
you don't get much info about each tag and you might have to parse the versions out of the ref name/path
Upside
you get all the ref/tags that match (i.e. no pagination, until GitHub decides to remove/optimise this :) )
you can use it to filter out tags in a monorepo (that most probably tag release components with prefixed tags)

Delete a field and its contents in all the records and recreate it with new mapping

I have a field field10 which got created by accident when I updated a particular record in my index. I want to remove this field from my index, all its contents and recreate it with the below mapping:
"mytype":{
"properties":{
"field10":{
"type":"string",
"index":"not_analyzed",
"include_in_all":"false",
"null_value":"null"
}
}
}
When I try to create this mapping using the Put Mapping API, I get an error: {"error":"MergeMappingException[Merge failed with failures {[mapper [field10] has different index values, mapper [field10] has different index_analyzer, mapper [field10] has different search_analyzer]}]","status":400}.
How do I change the mapping of this field? I don't want to reindex millions of records just for this small accident.
Thanks
AFAIK, you can't remove a single field and recreate it.
You can not either just modify a mapping and have everything reindexed automagicaly. Imagine that you don't store _source. How can Elasticsearch know what your data look like before it was indexed?
But, you can probably modify your mapping using a multifield with field10.field10 using the old mapping and field10.new with the new analyzer.
If you don't reindex, only new documents will have content in field10.new.
If you want to manage old documents, you have to:
Send again all your docs (it will update everything) - aka reindex (you can use scan & scroll API to get your old documents)
Try to update your docs with the Update API
You can probably try to run a query like:
curl -XPOST localhost:9200/crunchbase/person/1/_update -d '{
"script" : "ctx._source.field10 = ctx._source.field10"
}'
But, as you can see, you have to run it document by document and I think it will take more time than reindexing all with the Bulk API.
Does it help?

Updating deeply nested documents in ravendb

I am having following document structure and I need to insert values in nested documents.
{
"Level-1": {
"Level-2": {
"Level-3": {
"aaa": "bbb"
"Level-4": {
}
}
}
}
}
how can I get keys every time at any level. There is a function for getting keys
var workingDOc = session.Load<RavenJObject>("xyz/b");
workingDoc.Keys will give me all key for this document But how could I get Keys of second level.when I provide key for nested document . For example now I want all keys for "Level-1".Is there any way? How can I check that the key is of nested document. please help .Thanks in advance
Rajdeep, you can't partially load a document. You can certainly have multiple levels of nested objects withing one single document and depending on your data model this is probably a good idea, however, you will always need to load the document as a whole if you want to do modify it.