SON Manipulator in mongo queries - pymongo

I've written a SON manipulator to encode some custom objects into mongo following http://api.mongodb.org/python/current/examples/custom_type.html. This works brilliantly for encoding and decoding the objects on their way into and out of the collection, but as far as I can tell, there's no way to use it in a query. If I try to query using an object which can be encoded using the manipulator, I get the same cannot encode object error that you get if you try to insert the object directly without using the manipulator. However, I can obviously query perfectly if I manually encode the object into the query. Is there a neat way to use the SON manipulator in queries without resorting to manually encoding the object?

Related

Accord.net Codification can't handle non-strings

I am trying to use the Accord.net library to build test method of several of the machine learning algorithms that library supports.
One of the issues I have run into is that when I am trying to codify my string data, the Codification class does not seem capable of dealing with any datatable columns that are not strings, despite the documentation saying otherwise.
Codification codebook = new Codification(fulldata, AllAttributeNames);
I call that line where fulldata is a datatable, and I have tried including columns of both Int32 type and Double type, and the Codification class has thrown an error saying it is unable to convert them to type String.
"System.InvalidCastException: 'Unable to cast object of type 'System.Double' to type 'System.String'.'"
EDIT: It turns out this error is because the Codification system can only handle alternate data types if it is encoding the entire table. I suppose I can see the logic here, although I would prefer a better error, or that the method was a little smarter.
I now have another issue that has cropped up related to this. After changing my code to this:
Codification codebook = new Codification(fulldata);
I then learning.Learn(inputs, outputs) my algorithm and want to use the newly trained algorithm. So the next step would be to take a bunch of test data, make sure it matches the codebooks encoding, and send it through the algorithm. Unfortunately, when I try and use the
int[][] testinput = codebook.Transform(testData, inputColumnNameArray);
It blows up claiming it could not find a mapping to transform. It does this in reference to an Integer column that the codebook correctly did not map to new values. So now it seems this Transform method is not capable of handling non-string columns, and I have not found an overload of it that can, even though the documentation indicates it should be able to handle this.
Does anyone know how to get around this issue without manually building the entire int[][] testinput array one value at a time?
Turns out I was able to answer my own question eventually.
The Codification class has two methods of using it as near as I can tell. The constructor that takes a list of column names, as well as the Transform methods both lack intelligence in dealing with non-string data types, perhaps these methods are going away in the future.
The constructor that just takes a datatable by itself, as well as the Apply method, are both capable of handling data types other than strings. Once I switched to using these two methods my errors went away.
Codification codebook = new Codification(fulldata);
int[][] testinput = codebook.Apply(testData, inputColumnNameArray);
The confusion for me lay in all the example code seemingly randomly using these two methods, but using the Apply method only when processing the training data, and using the Transform method when encoding test data.
I am not sure why they chose to do this in the documentation example code, but it definitely took me a long time to figure out what was going on enough to stop having this particular issue.

Best way to pass array of objects to Redis Lua script

Question
What is a best practice to pass an array of objects into Lua script? Is there any better way than converting objects to JSON and parse them with cjson within a script?
More context
I have streaming application which keeps it state in Redis. Every second we get 5-100 events, all operations are done within single transaction in order to boost performance like following:
RedisCommands<String, String> cmd = getOrCreateRedisClient();
cmd.multi();
for (Event event: listOfEvents) {
cmd.sadd("users", event.getUserId());
cmd.sadd("actions", event.getActionId());
cmd.incrbyfloat("users:" + event.getUserId(), event.getImpact());
}
cmd.exec();
Now I have to move this logic to Lua scripts. I suppose it also will be faster to pass an array of events to Lua script instead of making up to 100 script invocations (one per event). Am I right? What is the best way to pass list of events to Lua script?
It depends...
If your logic won't change in the future, i.e. you'll only use user id, action id, and impact of an event, you can just pass these three elements to Lua:
redis-cli --eval your-script.lua , userid1 actionid1 impact1 userid2 actionid2 impact2 userid3 actionid3 impact3
In this case, you don't need to convert an event object to JSON string, and Lua script doesn't need to parse JSON string, so it should be faster.
However, if your logic might change in the future, i.e. you might need to use other members of an event, you'd better convert an event object to a JSON string, and pass an array of JSON string to Lua script:
redis-cli --eval your-script.lua , {json1} {json2} {json3}
So that these changes will be transparent to your code, and you only need to change the Lua script.

Performance difference - Jackson ObjctMapper.writeValue(writer, val) vs ObjectMapper.writeValueAsString(val)

Is there any significant performance difference between the following two?
String json = mapper.writeValueAsString(searchResult);
response.getWriter().write(json);
vs
mapper.writeValue(response.getWriter(), searchResult);
writeValueAsString JavaDoc says:
Method that can be used to serialize any Java value as a String.
Functionally equivalent to calling writeValue(Writer,Object) with
StringWriter and constructing String, but more efficient.
So, in case, you want to write JSON to String is much better to use this method than writeValue. Both these methods use _configAndWriteValue.
In your case it is better to write JSON directly to response.getWriter() than generating String object and after that writing it to response.getWriter().

Lucene: build a query by tokenizing a string and pass

I need to extract single terms from a string to build a query using BooleanQuery.
I'm using QueryParser.parse() method for it, this is my code snippet:
booleanQuery.add(
new QueryParser(
org.apache.lucene.util.Version.LUCENE_40,
"tags",
new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)
).parse("tag1 tag2 tag3"),
BooleanClause.Occur.SHOULD);
I'm however wondering if this is correct way to pass single terms to booleanQuery.
QueryParser.parse method returns a SrndQuery object, which I directly pass to booleanQuery.add() method.
Not sure if this is correct. Should I extract single terms instead from SrndQuery... or something like that, and invoke booleanQuery.add() several times ?
Update: printed query
*.*:*.* title:Flickrmeetup_01 description:Michael description:R. description:Ross tags:rochester tags:ny tags:usa tags:flickrmeetup tags:king76 tags:eos350d tags:canon50mmf14 tags:mikros tags:canon tags:ef tags:50mm tags:f14 tags:usm tags:canonef50mmf14 tags:canonef50mmf14usm
I believe you should extract the tokens, wrap each one in a Term, then create a TermQuery for it, then add the TermQuery to the BooleanQuery. SrndQuery is abstract anyway, so I guess your current code would create an instance of a subclass, which is probably not what you mean to do. You may want to create your own custom QueryParser for this.

return a computed field in a serialized django object?

I'm writing an API using Django, and I'm running into some issues around returning data that isn't stored in the database directly, or in other cases organized differently than the database schema.
In particular, given a particular data request, I want to add a field of computed data to my model before I serialize and return it. However, if I just add the field to the model, the built-in serializer (I'm using json) ignores it, presumably because it's getting the list of fields from the model definition.
I could write my own serializer, but what a pain. Or I guess I could run model_to_dict, then serialize the dict instead of the model. Anyone have any better ideas?
Here's what the code vaguely looks like right now:
squidlets = Squidlet.objects.filter(stuff)
for i in range(len(squidlets)):
squidlets[i].newfield = do_some_computation(squid)
return HttpResponse(json_serializer.serialize(squidlets,ensure_ascii=False),
'text/json')
But newfield ain't in the returned json.
i think you should serialize using simple json.. and it doent have to be a queryset... to escape it as json also use marksafe
from django.utils.safestring import mark_safe
from django.utils import simplejson
simplejson.dumps(mark_safe(your_data_structure))
I went with the dict solution, which turned out to be fairly clean.
Here's what the code looks like:
from django.forms.models import model_to_dict
squiddicts = []
squidlets = Squidlet.objects.filter(stuff)
for i in range(len(squidlets)):
squiddict = model_to_dict(squidlets[i])
squiddict["newfield"] = do_some_computation(squidlets[i])
squiddicts.append(squiddict)
return HttpResponse(simplejson.dumps(squiddicts,ensure_ascii=False),
'text/json')
This is maybe slightly more verbose than necessary but I think it's clearer this way.
This does still feel somewhat unsatisfying, but seems to work just fine.