Kotlin - MutableList that does not allow duplicate Object - kotlin

Is there idiomatic expression or extension sample for a MutableList and List that does not allow duplicate object or if possible, update the existing object in list if duplicate? Just like how HashMap works? I am not sure if using .map {} is the most efficient way of doing it.
Note: Still need having a correct index.

The Set interface allows only one of each unique item, where "unique" is in terms of the class's equals function.
The Set interface does not guarantee fixed order of elements as part of its contract, but all the Kotlin standard library functions that create Sets create Sets that do maintain order of elements (the order they were added to the set).
You can iterate a Set with index using forEachIndexed or for((index, item) in set.withIndex()).
There's no indexed access operator for sets, but you could create an extension that enables it. Note that since Sets with order are backed by linked wrappers instead of an array, it is inefficient to access elements by index, because it has to iterate up to that index to retrieve the value.
operator fun <T> Set<T>.get(index: Int) = asSequence().drop(index).first()
If you have a MutableSet, you cannot insert elements at a specific index. You would have to clear it and reinsert everything in the order you want.

Related

Elasticsearch and Spark, how to write one field to docs

I am using Elasticsearch and Spark. I create an index completely in an rdd and write it using rdd.saveToEs("index", mappings).
Later I want to update the index by overwriting a single field in the doc. But the rdd writing code seems to need the entire doc, writing a map with one property will erase the rest of the doc. For example the docs may have a title, body, and popularity. Populartiy being a Double.
When I recreate the popularity number I now have:
RDD[(String, Map[String, Double])]
where a single element will have the following value:
("doc1", Map(("popularity" -> 1.5d)))
if I write this RDD of only popularity fields I believe it will erase the other fields. What I need is some way to do an upsert type of op that overwrites or adds the "popularity" field but leaves the rest of the doc unchanged.
I've read about the include and exclude mappings but not sure if these apply to this case. When I'm writing the popularity I don't know the rest of the doc structure only the single field.

How to prevent a field from not analyzing in lucene

I want some fields like urls, to be indexed and stored but not to be analyzed. Field class had a constructor to do the same.
Field(String name, String value, Field.Store store, Field.Index index)
But this constructor has been deprecated since lucene 4 and it is suggested to use StringField or TextField objects. But they don't have any constructors to specify which field to be indexed. So can it be done?
The correct way to index and store an un-analyzed field, as a single token, is to use StringField. It is designed to handle atomic strings, like id numbers, urls, etc. You can specify whether it is stored similarity to in Lucene 3.X
Such as:
new StringField("myUrl, "http://stackoverflow.com/questions/19042587/how-to-prevent-a-field-from-not-analyzing-in-lucene", Field.Store.YES)
Hello you are totally right with what you are saying. With the new fields provided by Lucene you cannot achieve what you want.
You can either continue using the Field as you described or implement your own field by implementing the interface IndexableField. there you can decide yourself what behaviors you want your Field to have.

Updating SQL from object with groovy

When you read in a result set in Groovy it comes in a collection of maps.
Seems like you should be able to update values inside those maps and write them back out, but I can't find anything built into groovy to allow me to do so.
I'm considering writing a routine that allows me to write a modified map by iterating over the fields of one of the result objects, taking each key/value pair and using them to create the appropriate update statement, but it could be annoying so I was wondering if anyone else had done this or if it'sa vailable already in groovy.
It seems like just a few lines of code so I'd rather not bring in hibernate for this. I'm just thinking a little "update" method that would allow:
def rows=sql.rows(query)
rows[0].name="newName"
update(sql, rows[0])
to update the first guy's name in the database. Anyone seen/created such a monster, or is something like this already built into Groovy Sql and I'm just missing it?
(I suppose you may have to point out to the update method which field is the key field, but that's doable...)
Using the rows method will actually read out all of the values into a List of GroovyRowResult so it's not really possible to update the data without creating an update method like the one you mention.
It's not really possible to do that in the generic case because your query can contain joins or a column reference that is an aggregate, etc.
If you're selecting from a single table use the Sql.eachRow method however and set the ResultSet to be an updatable one, you can use the underlying ResultSet interface to update as you iterate through:
sql.resultSetConcurrency = ResultSet.CONCUR_UPDATABLE
sql.resultSetType = ResultSet.TYPE_FORWARD_ONLY
sql.eachRow(query) { row ->
row.updateString('name', 'newName')
row.updateRow()
}
Depending on the database/driver you use, you may not be able to create an updatable ResultSet.

Django: join, keep one of two columns, then order by that

I have two Django models, roughly summarized by:
class Thing(Model):
date_created = DateTimeField()
class ThingDateOverride(Model):
thing = ForeignKey(Thing)
category = ForeignKey(Category)
override = DateTimeField()
What I want to do is produce a list of Things for a given a Category sorted by the appropriate ThingDateOverride override field, or the Thing's date_created if no such override exists.
In other words:
For each Thing in the QuerySet, keep either Thing.date_created or the override if an appropriate ThingDateOverride exists for that Thing/Category pair.
Order the Thing set by the resulting timestamp.
I can pull this off in SQL, but I'd rather avoid writing possibly engine-specific code. I'm currently implementing the logic around the ORM (in pure python), but I would like the database to handle this.
How can I do it?
Is it ok, that category is stored in ThingDateOverride? If yes, then Thing object doesn't have category without override object.
I assumed that ThingDateOverride object exists for every Thing object (so category is assigned to every Thing object). override field can be NULL, then date_created object will be used to sort things. Then this code should sort by override if it exists, or by date_created if doesn't:
Thing.objects.filter(thingdateoverride__category=category).extra(select={'d': 'if(override, override, date_created)'}).order_by('d')
The idea is to use extra and select override field if it exists, or date_created if doesn't as another column, and then sort by that column.
Note: this code works only because override and date_created fields have different names, so they could be distinguished. Otherwise MySQL will return error, something like "field name is ambiguous", and need to add table names.
If you want Thing, you should start by that model and use reverse relationship:
Thing.objects.filter(thingdateoverride__category=category).order_by("thingdateoverride__override", "date_created")

Django: how to filter for rows whose fields are contained in passed value?

MyModel.objects.filter(field__icontains=value) returns all the rows whose field contains value. How to do the opposite? Namely, construct a queryset that returns all the rows whose field is contained in value?
Preferably without using custom SQL (ie only using the ORM) or without using backend-dependent SQL.
field__icontains and similar are coded right into the ORM. The other version simple doesn't exist.
You could use the where param described under the reference for QuerySet.
In this case, you would use something like:
MyModel.objects.extra(where=["%s LIKE CONCAT('%%',field,'%%')"], params=[value])
Of course, do keep in mind that there is no standard method of concatenation across DMBS. So as far as I know, there is no way to satisfy your requirement of avoiding backend-dependent SQL.
If you're okay with working with a list of dictionaries rather than a queryset, you could always do this instead:
qs = MyModel.objects.all().values()
matches = [r for r in qs if value in r[field]]
although this is of course not ideal for huge data sets.