How to implement our own UID in Lucene? - lucene

I wish to create an index with, lets say the following fields :
UID
title
owner
content
out of which, I don't want UID to be searchable. [ like meta data ]
I want the UID to behave like docID so that when I want to delete or update,
I'll use this.
Is this possible ? How to do this ?

You could mark is as non-searchable by adding it with Store.YES and Index.NO, but that wont allow you easy updating/removal by using it. You'll need to index the field to allow replacing it (using IndexWriter.UpdateDocument(Term, Document) where term = new Term("UID", "...")), so you need to use either Index.ANALYZED with a KeywordAnalyzer, or Index.NOT_ANALYZED. You can also use the FieldCache if you have a single-valued field, which a primary key usually is. However, this makes it searchable.
Summary:
Store.NO (It can be retrieved using the FieldCache or a TermsEnum)
Index.NOT_ANALYZED (The complete value will be indexed as a term, including any whitespaces)

Related

How to tag a key in REDIS so later I can remove all keys that match this tag?

Today we save data like that:
redisClient->set($uniquePageID, $data);
and output the data like that:
redisClient->get($uniquePageID)
But now we need to remove the data base on a userID. So we need something like that:
redisClient->set($uniquePageID, $data)->tag($userID);
So we can remove all the keys that related to this userID only, for example:
redisClient->tagDel($userID);
Does REDIS can solve something like that?
Thanks
There's no built-in way to do that. Instead, you need to tag these pages by yourself:
When setting a page-data pair, also put the page id into a SET of the corresponding user.
When you want to remove all pages of a given user, scan the SET of the user to get the page ids of this user, and delete these pages.
When scanning the SET, you can use either SMEMBERS or SSCAN command, depends on the size of the SET. If it's a big SET, prefer SSCAN to avoid block Redis for a long time.
I used HSET and HDEL to store and delete items like this:
$this->client = new Predis\Client(array...);
$this->client->hset($key, $tag, $value);
$this->client->hdel($key, $tags)
and if you want to delete every item KEY no matter tag or value you can use del key, it works with any data type including hset
$this->client->del($key);

SharePoint change column id for REST requests

I recently started experimenting with the REST API for SharePoint 2013 Foundation and I am trying to return all entries in a list. My GET request returns the data I am looking for, but the IDs used to identify the columns in the list are not helpful for identifying what the information is (see images below). The column IDs between 'Title' and 'ID', in the second image, are a jumble of characters.
SharePoint List View
Response Data
Is there any way to configure the list to use the column names as IDs? Also, is there some significance to the characters currently used as IDs?
You will need to make a second request to get a listing of columns that includes the InternalName and the Title which is what you are trying to reference:
You can use this REST call:
_api/web/lists/GetByTitle('Project Details')/fields
or you can use CSOM:
using (ClientContext context = new ClientContext(url))
{
List list = context.Web.Lists.GetByTitle("Project Details");
context.Load(list, l => l.Fields);
context.ExecuteQuery();
foreach(Field field in list.Fields)
{
Console.WriteLine(field.Title);
Console.WriteLine(field.InternalName);
}
}
SharePoint automatically generates the InternalName and it is a read-only field, at least using REST. It'll be easier to get the Field Data to correlate the InternalName to the Title than changing the values.
The column you are referring to, between Title and Id, is the ID of the content type associated to the item. It is not a column ID.
The SharePoint REST API is OData compliant, so you can use the $select parameter to query for the neccesary fields.
http://server/site/_api/web/lists('guid')/items?$select=Column1,Column2
Please be aware though, lookup fields need to be expanded as well, otherwise you get only the Id of the lookup item.
http://server/site/_api/web/lists('guid')/items?$select=LookupColumn&$expand=LookupColumn/Title

Delete a field and its contents in all the records and recreate it with new mapping

I have a field field10 which got created by accident when I updated a particular record in my index. I want to remove this field from my index, all its contents and recreate it with the below mapping:
"mytype":{
"properties":{
"field10":{
"type":"string",
"index":"not_analyzed",
"include_in_all":"false",
"null_value":"null"
}
}
}
When I try to create this mapping using the Put Mapping API, I get an error: {"error":"MergeMappingException[Merge failed with failures {[mapper [field10] has different index values, mapper [field10] has different index_analyzer, mapper [field10] has different search_analyzer]}]","status":400}.
How do I change the mapping of this field? I don't want to reindex millions of records just for this small accident.
Thanks
AFAIK, you can't remove a single field and recreate it.
You can not either just modify a mapping and have everything reindexed automagicaly. Imagine that you don't store _source. How can Elasticsearch know what your data look like before it was indexed?
But, you can probably modify your mapping using a multifield with field10.field10 using the old mapping and field10.new with the new analyzer.
If you don't reindex, only new documents will have content in field10.new.
If you want to manage old documents, you have to:
Send again all your docs (it will update everything) - aka reindex (you can use scan & scroll API to get your old documents)
Try to update your docs with the Update API
You can probably try to run a query like:
curl -XPOST localhost:9200/crunchbase/person/1/_update -d '{
"script" : "ctx._source.field10 = ctx._source.field10"
}'
But, as you can see, you have to run it document by document and I think it will take more time than reindexing all with the Bulk API.
Does it help?

Best design approach to query documents for 'labels'

I am storing documents - and each document has a collection of 'labels' - like this. Labels are user defined, and could be any plain text.
{
"FeedOwner": "4ca44f7d-b3e0-4831-b0c7-59fd9e5bd30d",
"MessageBody": "blablabla",
"Labels": [
{
"IsUser": false,
"Text": "Mine"
},
{
"IsUser": false,
"Text": "Incomplete"
}
],
"CreationDate": "2012-04-30T15:35:20.8588704"
}
I need to allow the user to query for any combination of labels, i.e.
"Mine" OR "Incomplete"
"Incomplete" only
or
"Mine" AND NOT "Incomplete"
This results in Raven queries like this:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND (Labels,
Text:Mine) AND (Labels,Text:Incomplete)
I realise that Raven will generate a 'dynamic index' for queries it has not seen before. I can see with this, this could result in a lot of indexes.
What would be the best approach to achieving this functionality with Raven?
[EDIT]
This is my Linq, but I get an error from Raven "All is not supported"
var result = from candidateAnnouncement in session.Query<FeedAnnouncement>()
where listOfRequiredLabels.All(
requiredLabel => candidateAnnouncement.Labels.Any(
candidateLabel => candidateLabel.Text == requiredLabel))
select candidateAnnouncement;
[EDIT]
I had a similar question, and the answer for that resolved both questions: Raven query returns 0 results for collection contains
Please notice that in case of FeedOwner being a unique property of your documents the query doesn't make a lot of sense at all. In that case, you should do it on the client using standard linq to objects.
Now, given that FeedOwner is not something unique, your query is basically correct. However, depending on what you actually want to return, you may need to create a static index instead:
If you're using the dynamically generated indexes, then you will always get the documents as the return value and you can't get the particular labels which matched the query. If this is ok for you, then just go with that approach and let the query optimizer do its job (only if you have really a lot of documents build the index upfront).
In the other case, where you want to use the actual labels as the query result, you have to build a simple map index upfront which covers the fields you want to query upon, in your sample this would be FeedOwner and Text of every label. You will have to use FieldStorage.Yes on the fields you want to return from a query, so enable that on the Text property of your labels. However, there's no need to do so with the FeedOwner property, because it is part of the actual document which raven will give you as part of any query results. Please refer to ravens documentation to see how you can build a static index and use field storage.

how to represent trees and their content in MySQL?

I need to be able to store something like this:
where the green is a template type, the gray is a field container type, and the white is field. (That being, a field has a label and some text, both mediumtext for simplicity; and a field container can store other field containers or text, and a template is a top-level field.)
Now, let's say I want to allow users to create any number of these structures, but it is unlikely to be more than say 10. And, of course, we need to be able to link data to it.
This is all to be able to store in a database an associative array that looks for the above like, in pseudo code:
my_template = {
page_info => { description => 'hello!!!' },
id => 0,
content => { background_url => '12121.jpg', text => ...
}
Having an easy way to add a field to all data using the template when the template changes (say, we add a keywords to page_info) would be a big plus.
I can't figure this out at all, thanks a lot!
There are several different ways to store heirarchical data structures (trees) in MySQL. You can for example choose:
Adjacency list
Nested sets
Path enumeration
Closure table
See Bill Karwin's presentation for more details on the pros and cons of each.