Dealing with a change to the type of an existing field? - indexing

If I had a simple object indexed in ES
{ "name" : "Mark", "age" : 30}
and then another object was added to the same index
{ "name" : "Bill", "age" : "forty"}
The mapping would fail to update, the new object would not get indexed. According to the elasticsearch docs
"once a field has been added, its type can not change. For example, if we added age and its value is a number, then it can’t be treated as a string."
Is there any way around this to allow these similar people objects to exist (and be searchable) under the same index?

I'm afraid that you can't do that. Because once you declared the type of your field, you can't change it without re-indexing your whole data set again. (and of course, refactoring your code).
This is true for Apache Solr also.
One option could be introducing a new field (age_in_string) and populate it also with the values you want: "30" and "forty". You can search on that.

Related

Are extensible records useless in Elm 0.19?

Extensible records were one of the most amazing Elm's features, but since v0.16 adding and removing fields is no longer available. And this puts me in an awkward position.
Consider an example. I want to give a name to a random thing t, and extensible records provide me a perfect tool for this:
type alias Named t = { t | name: String }
„Okay,“ says the complier. Now I need a constructor, i.e. a function that equips a thing with specified name:
equip : String -> t -> Named t
equip name thing = { thing | name = name } -- Oops! Type mismatch
Compilation fails, because { thing | name = ... } syntax assumes thing to be a record with name field, but type system can't assure this. In fact, with Named t I've tried to express something opposite: t should be a record type without its own name field, and the function adds this field to a record. Anyway, field addition is necessary to implement equip function.
So, it seems impossible to write equip in polymorphic manner, but it's probably not a such big deal. After all, any time I'm going to give a name to some concrete thing I can do this by hands. Much worse, inverse function extract : Named t -> t (which erases name of a named thing) requires field removal mechanism, and thus is not implementable too:
extract : Named t -> t
extract thing = thing -- Error: No implicit upcast
It would be extremely important function, because I have tons of routines those accept old-fashioned unnamed things, and I need a way to use them for named things. Of course, massive refactoring of those functions is ineligible solution.
At last, after this long introduction, let me state my questions:
Does modern Elm provides some substitute for old deprecated field addition/removal syntax?
If not, is there some built-in function like equip and extract above? For every custom extensible record type I would like to have a polymorphic analyzer (a function that extracts its base part) and a polymorphic constructor (a function that combines base part with additive and produces the record).
Negative answers for both (1) and (2) would force me to implement Named t in a more traditional way:
type Named t = Named String t
In this case, I can't catch the purpose of extensible records. Is there a positive use case, a scenario in which extensible records play critical role?
Type { t | name : String } means a record that has a name field. It does not extend the t type but, rather, extends the compiler’s knowledge about t itself.
So in fact the type of equip is String -> { t | name : String } -> { t | name : String }.
What is more, as you noticed, Elm no longer supports adding fields to records so even if the type system allowed what you want, you still could not do it. { thing | name = name } syntax only supports updating the records of type { t | name : String }.
Similarly, there is no support for deleting fields from record.
If you really need to have types from which you can add or remove fields you can use Dict. The other options are either writing the transformers manually, or creating and using a code generator (this was recommended solution for JSON decoding boilerplate for a while).
And regarding the extensible records, Elm does not really support the “extensible” part much any more – the only remaining part is the { t | name : u } -> u projection so perhaps it should be called just scoped records. Elm docs itself acknowledge the extensibility is not very useful at the moment.
You could just wrap the t type with name but it wouldn't make a big difference compared to approach with custom type:
type alias Named t = { val: t, name: String }
equip : String -> t -> Named t
equip name thing = { val = thing, name = name }
extract : Named t -> t
extract thing = thing.val
Is there a positive use case, a scenario in which extensible records play critical role?
Yes, they are useful when your application Model grows too large and you face the question of how to scale out your application. Extensible records let you slice up the model in arbitrary ways, without committing to particular slices long term. If you sliced it up by splitting it into several smaller nested records, you would be committed to that particular arrangement - which might tend to lead to nested TEA and the 'out message' pattern; usually a bad design choice.
Instead, use extensible records to describe slices of the model, and group functions that operate over particular slices into their own modules. If you later need to work accross different areas of the model, you can create a new extensible record for that.
Its described by Richard Feldman in his Scaling Elm Apps talk:
https://www.youtube.com/watch?v=DoA4Txr4GUs&ab_channel=ElmEurope
I agree that extensible records can seem a bit useless in Elm, but it is a very good thing they are there to solve the scaling issue in the best way.

How to add text to the _content field in a Solr index for a Sitecore implementation?

This is for a Sitecore 7.5 - Solr 4.7 implementation. I would like to be able to modify the text that is stored in the _content field in Solr. I believe that somehow that Sitecore aggregates all of the content fields for an item in the _content field in the index. (I think that is correct) At index time I would like to be able to write my own code that could potentially modify the text that is stored in the _content field in Solr. Is this possible? Any ideas how I would go about this?
_content is a computed field, which means the value is resolved at the point that the item is crawled. You'll see the computed field is defined in your config:
<field fieldName="_content" returnType="string" type="Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor,Sitecore.ContentSearch">
<mediaIndexing ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration/mediaIndexing"/>
</field>
I recommend decompiling the class specified in the type attribute to see what it does. Then you can create your own computed field class (or inherit from that one), and replace the type attribute.
Computed fields are really quite simple to work with. They implement IComputedIndexField which requires a ComputeFieldValue method. The method accepts an argument of type IIndexable (in most cases the concrete class is an Item) and is called every time an item is crawled.
So in the ComputeFieldValue method you could cast the Iindexable to an Item, then return a concatenated string of all the field values you want to include from that item.
See here for more on computed fields:
http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2013/03/sitecore-7-computed-index-fields.aspx
From what I understand, you can add another (separate) _content field with your own IComputedIndexField implementation. The resulting values from all added fields with the same name are aggragated.
See also: https://kamsar.net/index.php/2014/05/indexing-subcontent/ and https://andrewwburns.com/2015/09/03/appending-to-the-_content-field-in-sitecore-search-7-2-and-7-5/

Why does collection+json use anonymous objects instead of key value pairs

I'm trying to find a data schema which can be used for different scenaries and a promissing format I found so far is the collection+json format (http://amundsen.com/media-types/collection/ ).
So far it has a lot of the functionallity I need and is very flexible, however I don't get why it uses anonymous objects ( example: {"name" : "full-name", "value" : "J. Doe", "prompt" : "Full Name"}, ) instead of simple key value pairs. (example: "full-name": "J. Doe", ).
I see how you can transfer more information like the prompt, etc. but the parsing is much slower and it is harder to create a client for it since he has to access the fields by searching in an array. When binding the data to a spezific view, it has to be know which fields exists, so the anonymous objects have to be converted into a key value map again.
So is there a real advante using this anonymous objects instead of a key value map?
I think that the main reason is because a consumer client does not need to know in advance the format of the data.
As it is proposed now in collection+json, you know that in the data object you will find stuff about data simply by parsing through it, 'name' is always the identifying name for the field, 'value' is the value and so on, your client can be agnostic about how many fields or their names:
{
"name" : "full-name",
"value" : "J. Doe",
"prompt" : "Full Name"
},
{
"name" : "age",
"value" : "42",
"prompt" : "Age"
}
if you had instead
{
"full-name" : "J. Doe",
"age" : "42"
}
the client needs to have previous knowledge about your representation, so it should expect and understand 'full-name', 'age, and all the application specific fields.
I wrote this question and then forgott about it, but found the answer I looked for here:
https://groups.google.com/forum/#!searchin/collectionjson/key/collectionjson/_xaXs2Q7N_0/GQkg2mvPjqMJ
From Mike Amundsen the creator of collection+JSON
I understand your desire to add object serialization patterns to CJ.
However, one of the primary goals of my CJ design is to not support
object serialization. I know that serialization is an often-requested
feature for Web messages. It's a great way to optimize both the code
and the programmer experience. But it's just not what I am aiming for
in this particular design.
I think the extension Kevin ref'd is a good option. Not sure if anyone
is really using it, tho. If you'd like to design one (based on your
"body" idea), that's cool. If you'd like to start w/ a gist instead of
doing the whole "pull" thing, that's cool. Post it and let's talk
about it.
On another note, I've written a very basic client-side parser for CJ
(it's in the basic folder of the repo) to show how to hide the
"costly" parts of navigating CJ's state model cleanly. I actually
have a work item to create a client-side lib that can convert the
state representation into an arbitrary local object model, but haven't
had time to do the work yet. Maybe this is something you'd like help
me with?
At a deeper (and possibly more boring) level, this state-model
approach is part of a bigger pattern I have in mind to treat messages
as "type-less" containers and to allow clients and servers to utilize
whatever local object models they prefer - without the need for
cross-web agreement on that object model. This is an "opinionated
design model" that I am working toward.
Finally, as you rightly point out at the top of the thread, HAL and
Siren are much better able to support object serialization style
messages. I think that's very cool and I encourage folks (including my
clients) to use these other formats when object serialization is the
preferred pattern and to use CJ when state transfer is the preferred
pattern.

NHibernate: why field.camelcase?

Can someone tell me why in NHibernate mapping we can set access="field.camelcase", since we have access="field" and access="property"?
EDIT: my question is "why we can do this", not "what does it mean". I think this can be source of error for developper.
I guess you wonder what use field.camelcase have when we can do the same with just field? That's true, but that would give (NH) properties unintuive names when eg writing queries or reference the property from other mappings.
Let's say you have something you want to map using the field, eg
private string _name;
public string Name { get { return _name; } }
You sure can map the field using "field" but then you would have to write "_name" when eg writing HQL queries.
select a from Foo a where a._name = ...
If you instead using field.camelcase the data, the same query would look like
select a from Foo a where a.Name...
EDIT
I now saw you wrote "field.camelcase" but my answer is about "field.camelcase-underscore". The principles are the same and I guess you get the point ;)
the portion after the '.' is the so called naming strategy, that you should specify when the name you write in the hbm differ from the backing field. In the case of field.camelcase you are allowed to write CustomerName in the hbm, and NHibernate would look for a field with name customerName in the class. The reason for that is NHibernate not forcing you to choose a name convention to be compliant, NH will works with almost any naming convention.
There are cases where the properties are not suitable for NH to set values.
They may
have no setter at all
call validation on the data that is set, which is not used when loading from the database
do some other stuff that is only used when the value is changed by the business logic (eg. set other properties)
convert the value in some way, which would cause NH performing unnecessary updates.
Then you don't want NH to call the property setter. Instead of mapping the field, you still map the property, but tell NH to use the field when reading / writing the value. Roger has a good explanation why mapping the property is a good thing.

What is a "field"? "Field" vs "Field Value"

In a passport there is a field: First Name, and that field has a value John.
I assert that it is correct to describe the relationship as follows:
Field First Name:
Has a name (First Name).
Has a set of valid values (e.g. defined by regex [A-Za-z. ]{1,30}
Has a description (name that stands first in the person's full name)
And Passport is a set of pairs (field : field value), such that:
passport has a field "First Name"
passport has a value for field "First Name"
Point here is that it is incorrect to say:
"First Name value is John";
The correct way (conceptually/academically) is to say:
"passport has a value 'John' for field 'First Name'".
In practical terms it means (pseudo C#):
struct Passport {
Map<Field, object> fieldValues;
}
struct Field {
string Name;
string Description;
bool IsValidValue(object value);
}
Q: Does this make sense? Any thoughts?
This is pretty subjective and entirely context sensitive, and seems like a silly thing to nitpick over.
Correct or not, if I'm discussing "passport" with a co-worker, I'd throw something at them if they corrected me every time I said "firstName is 'john'", and told me to say it as "passport's firstname field is 'john'". You'd just come across as annoying.
Well..not really in c# see Scott Bellware's answer to my question about C# not being Object Oriented (kinda).
In C# passport is a class so it makes perfect sense to say
"The Passport has a field FirstName"
For a particular instance "FirstName value is John".
Here the first clause describes the class and the next one the object. In a more OO language like ruby I think saying "passport has a value 'John' for field 'First Name'" would be equivalent, you're just describing two objects - the Passport prototype, and the instance of it in the same sentence.
I'm getting pretty confused in it myself though. The question is oddly phrased since there would doubtless be much more to a passport than just its fields, for example a long-standing and persisted identity.
If you are going to model such thing, then you may take a look at reflection API of java or c#. It is pretty similar to what you described. Class has set of fields, field has name, type and other attributes, not value. Object is an instance of class and you can ask object for the value of specified field. Different objects of the same class have values for the same fields, so you may say they share fields. So if you are trying to model class-based OOP then you are probably right.
However this is not the only way to do OOP. There is prototype-based OOP which looks differently, as there are no classes, only objects, so objects contain field with values so there is not much difference if you say that object contain field and field has a value.
So the answer to "Does this make sense?" I think is "yes" because similar thing is in reflection and is used widely. If it is right or wrong - depends on your needs.
UPD: regarding "value = Passport[Field]" vs "value = Passport.Field.Value"
I'd introduce one more passport to make it clear
firstNameField = PassportClass.Fields["FirstName"]
myName = myPassport[firstNameField]
yourName = yourPassport[firstNameField]
assumes that both passport have same fields, that makes sense. Having different passports with different fields may have a sense, just a different one.
No. At least in OOP, it's the field's responsibility to retain the value. Although the object is responsible for ensuring that value is consistent with the other fields or the object's constraints, the actual "containing of the value is the field's job.
Using your example:
Field First Name:
Has a name (First Name).
Has a type (int, string, object)
Has a description (optional)
Has a value
And Passport is a set fields:
May define constraints on such a field as defined by the model, ensuring the value and the object's state as a whole is valid