apache storm: how to transform (make a new) tuple

apache storm: how to transform (make a new) tuple - kotlin

This is probably not the best practice in storm but we are working with a library that requires us to transform the values of a tuple in our bolt to filter out only objects of a certain class. I know how to do the filter, however, I'm not sure how I can put together a new tuple with the transformed (i.e., filtered) tuple:
override fun execute(input: Tuple?) {
val filteredValues = input.values.filterIsInstance(MyClass::class.java)
// ? how do I make a new tuple with filteredValues
val newTuple = ...
if (doExecute(newTuple)) {
this.collector.ack(input)
} else {
this.collector.fail(input)
}
}
where doExecute is out of our control and it only takes a tuple.
EDIT
I guess it's possible if I created a copy of the input tuple and just set the values of the copy to the filteredValues. However, I wonder if there's a more idiomatic way.

I would just split your bolt in two. Have a bolt that does the filtering, and then have a second bolt that receives from the filtering bolt and calls doExecute.

Related

How to get last Key or Value in Kotlin Map collection

How can I get the last key or value in a Kotlin Map collection? It seems like it cannot be done by using an index value.

There's a couple ways it can be done. While you can't elegantly print a map directly, you may print it's entry set.
The first way, and the way that I DO NOT recommend, is by calling the .last() function on the entry set. This can be accomplished with testMap.entries.last(). The reason I don't recommend this method is because in real data this method is non-deterministic -- meaning there's no way to guarantee the characteristics of the value returned.
While I don't recommend this method, I don't know your application and this may be sufficient.
I DO recommend using the .sortedBy() function on your entry set, and then calling .last() on it. This allows you to make some sort of assumption about the results returned, something that is typically necessary, otherwise why do you want the last?
See this example comparing the two methods and then comparing the method against the order you would get if you iterate with the .forEach function:
fun main(args: Array<String>) {
val testMap = mutableMapOf<Long, String>()
testMap[1] = "Hello"
testMap[5] = "World"
testMap[3] = "Foobar"
println(testMap.entries.last())
println(testMap.entries.sortedBy { it.key }.last())
println("\norder via loop:")
testMap
.entries
.forEach {
println("\t$it")
}
}
Take a look at the output:
3=Foobar
5=World
order via loop:
1=Hello
5=World
3=Foobar
Here we see that the value returned from .last(), is the last value that was inserted into the map - the same happens with .forEach. This is okay, but usually we want our map to have some sort of order. In this example, i've called for the entry set to be sorted by the key value, so that our call to .last() on the entry set returns the key/value pair with the largest key.

Kotlin. What is the best way to replace element in immutable list?

What is the best way to update specific item in immutable list. For example I have list of Item. And I have several ways to update list:
1.
fun List<Item>.getList(newItem: Item): List<Item> {
val items = this.toMutableList()
val index = items.indexOf(newItem)
if (index != -1) {
items[index ] = newItem
}
return items
}
fun List<Item>.getList(newItem: Card): List<Item> {
return this.map { item ->
if (item.id == newItem.id) newItem else item
}
}
The second option looks more concise and I like it more. However, in the second option, we will go through each element in the list, which is bad for me, because the list can contain many elements.
Please, is there a better way to fulfill my requirement?

You have a few options - you're already doing the "make a mutable copy and update it" approach, and the "make a copy by mapping each item and changing what you need" one.
Another typical approach is to kinda go half-and-half, copying the parts you need, and inserting the bits you want to change. You could do this by, for example, slicing the list around the element you want to change, and building your final list from those parts:
fun List<Item>.update(item: Item): List<Item> {
val itemIndex = indexOf(item)
return if (itemIndex == -1) this.toList()
else slice(0 until itemIndex) + item + slice(itemIndex+1 until size)
}
This way you get to take advantage of any efficiency from the underlying list copy methods, versus map which has to "transform" each item even if it ends up passing through the original.
But as always, it's best to benchmark to see how well these approaches actually perform! Here's a playground example - definitely not the best place to do benchmarking, but it can be instructive as a general ballpark if you run things a few times:
Mapping all elements: 2500 ms
Slicing: 1491 ms
Copy and update index: 611 ms
Broadly speaking, mapping takes 60-100% more time than the slice-and-combine approach. And slicing takes 2-3x longer than just a straight mutable copy and update.
Considering what you actually need to do here (get a copy of the list and change (up to) one thing) the last approach seems like the best fit! The others have their benefits depending on how you want to manipulate the list to produce the end result, but since you're barely doing anything here, they just add unnecessary overhead. And of course it depends on your use-case - the slicing approach for example uses more intermediate lists than the mapping approach, and that might be a concern in addition to raw speed.
If the verbosity in your first example bothers you, you could always write it like:
fun List<Item>.getList(newItem: Item): List<Item> =
this.toMutableList().apply {
val index = indexOf(newItem)
if (index != -1) set(index, newItem)
}

The second one looks ever so slightly better for performance, but they are both O(n), so it's not a big difference, and hardly worth worrying about. I would go for the second one because it's easier to read.
The first one iterates the list up to 2 times, but the second iteration breaks early once it finds the item. (The first iteration is to copy the list, but it is possibly optimized by the JVM to do a fast array copy under the hood.)
The second one iterates the list a single time, but it does have to do the ID comparison for each item in the list.
Side note: "immutable" is not really the right term for a List. They are called "read-only" Lists because the interface does not guarantee immutability. For example:
private val mutableList = mutableListOf<Int>()
val readOnlyList: List<Int> get() = mutableList
To an outside class, this List is read-only, but not immutable. Its contents might be getting changed internally in the class that owns the list. That would be kind of a fragile design, but it's possible. There are situations where you might want to use a MutableList for performance reasons and pass it to other functions that only expect a read-only List. As long as you don't mutate it while it is in use by that other class, it would be OK.

Another thing you could try is, as apparently each item has an id field that you are using to identify the item, to create a map from it, perform all your replacements on that map, and convert it back into a list. This is only useful if you can batch all the replacements you need to do, though. It will probably also change the order of the items in the list.
fun List<Item>.getList(newItem: Item) =
associateBy(Item::id)
.also { map ->
map[newItem.id] = newItem
}
.values
And then there’s also the possibility to convert your list into a Sequence: this way it will be lazily evaluated; every replacement you add with .map will create a new Sequence that refers to the old one plus your new mapping, and none of it will be evaluated until you run an operation that actually has to read the whole thing, like toList().

Another solution: if the list is truly immutable and not only read-only; or if its contents could change and you would like to see these changes in the resulting list, then you can also wrap the original list into another one. This is fairly easy to do in Kotlin:
fun main() {
val list = listOf(
Item(1, "1-orig"),
Item(2, "2-orig"),
Item(3, "3-orig"),
)
val list2 = list.getList(Item(2, "2-new"))
println(list2)
}
fun List<Item>.getList(newItem: Item): List<Item> {
val found = indexOfFirst { it.id == newItem.id }
if (found == -1) return this
return object : AbstractList<Item>() {
override val size = this#getList.size
override fun get(index: Int) = if (index == found) newItem else this#getList[index]
}
}
data class Item(val id: Int, val name: String)
This is very good for the performance if you don't plan to repeatedly modify resulting lists with further changes. It is O(1) to replace an item and it almost doesn't use any additional memory. However, if you plan to invoke getList() repeatedly on a resulting list, each time creating a new one, that would create a chain of lists, slowing down access to the data and preventing garbage collector to clean up replaced items (if you don't use the original list anymore). You can partially optimize this by detecting you invoke getItem() on your specific implementation, but even better, you can use already existing libraries that does this.
This pattern is called a persistent data structure and it is provided by the library kotlinx.collections.immutable. You can use it like this:
fun main() {
val list = persistentListOf(
Item(1, "1-orig"),
Item(2, "2-orig"),
Item(3, "3-orig"),
)
val list2 = list.set(1, Item(2, "2-new"))
println(list2)
}
By the way, it seems strange to keep a list of items where we identify them by their ids. Did you consider using a map instead?

How to create JSON object strategy according to a schema with rust proptest?

I'd like to create a JSON strategy using rust proptest library. However, I do not want to create an arbitrary JSON. I'd like to create it according to a schema (more specifically, OpenAPI schema). This means that keys of the JSON are known and I do not want to create them using any strategy, but I'd like to create the values using the strategy (pretty-much recursively).
I already implemented the strategy for primitive types, but I do not how to create a JSON object strategy.
I would like the strategy to have the type BoxedStratedy<serde_json::Value> or be able to map the strategy to this type because the JSON objects can contain other objects, and thus I need to be able to compose the strategies.
I found a HashMapStrategy strategy, however, it can be only created by a hash_map function that takes two strategies - one for generating keys and one for values. I thought that I could use Just strategy for the keys, but it did not lead anywhere. Maybe prop_filter_map could be used.
Here is the code. There are tests too. One is passing because it tests only primitive type and the other is failing since I did not find a way to implement generate_json_object function.
I tried this but the types do not match. Instead of a strategy of map from string to JSON value, it is a strategy of a map from string to BoxedStrategy.
fn generate_json_object(object: &ObjectType) -> BoxedStrategy<serde_json::Value> {
let mut json_object = serde_json::Map::with_capacity(object.properties.len());
for (name, schema) in &object.properties {
let schema_kind = &schema.to_item_ref().schema_kind;
json_object.insert(name.clone(), schema_kind_to_json(schema_kind));
}
Just(serde_json::Value::Object(json_object)).boxed()
}

One can create a vector of strategies, which implements a Strategy trait and can be boxed. So to create a serde_json::Value::Object, we create a vector of tuples. The first element will be a Just of key and the second element will be a boxed strategy of value. The boxed strategy of value can be created by schema_kind_to_json function. After we have a vector of tuples which implement a Strategy, we can use .prop_map to transform it to a serde_json::Value::Object.
fn generate_json_object(object: &ObjectType) -> BoxedStrategy<serde_json::Value> {
let mut vec = Vec::with_capacity(object.properties.len());
for (name, schema) in &object.properties {
let schema_kind = &schema.to_item_ref().schema_kind;
vec.push((Just(name.clone()), schema_kind_to_json(schema_kind)));
}
vec.prop_map(|vec| serde_json::Value::Object(serde_json::Map::from_iter(vec)))
.boxed()
}

How to create a new list of Strings from a list of Longs in Kotlin? (inline if possible)

I have a list of Longs in Kotlin and I want to make them strings for UI purposes with maybe some prefix or altered in some way. For example, adding "$" in the front or the word "dollars" at the end.
I know I can simply iterate over them all like:
val myNewStrings = ArrayList<String>()
longValues.forEach { myNewStrings.add("$it dollars") }
I guess I'm just getting nitpicky, but I feel like there is a way to inline this or change the original long list without creating a new string list?
EDIT/UPDATE: Sorry for the initial confusion of my terms. I meant writing the code in one line and not inlining a function. I knew it was possible, but couldn't remember kotlin's map function feature at the time of writing. Thank you all for the useful information though. I learned a lot, thanks.

You are looking for a map, a map takes a lambda, and creates a list based on the result of the lambda
val myNewStrings = longValues.map { "$it dollars" }
map is an extension that has 2 generic types, the first is for knowing what type is iterating and the second what type is going to return. The lambda we pass as argument is actually transform: (T) -> R so you can see it has to be a function that receives a T which is the source type and then returns an R which is the lambda result. Lambdas doesn't need to specify return because the last line is the return by default.

You can use the map-function on List. It creates a new list where every element has been applied a function.
Like this:
val myNewStrings = longValues.map { "$it dollars" }

In Kotlin inline is a keyword that refers to the compiler substituting a function call with the contents of the function directly. I don't think that's what you're asking about here. Maybe you meant you want to write the code on one line.
You might want to read over the Collections documentation, specifically the Mapping section.
The mapping transformation creates a collection from the results of a
function on the elements of another collection. The basic mapping
function is
map().
It applies the given lambda function to each subsequent element and
returns the list of the lambda results. The order of results is the
same as the original order of elements.
val numbers = setOf(1, 2, 3)
println(numbers.map { it * 3 })
For your example, this would look as the others said:
val myNewStrings = longValues.map { "$it dollars" }
I feel like there is a way to inline this or change the original long list without creating a new string list?
No. You have Longs, and you want Strings. The only way is to create new Strings. You could avoid creating a new List by changing the type of the original list from List<Long> to List<Any> and editing it in place, but that would be overkill and make the code overly complex, harder to follow, and more error-prone.

Like people have said, unless there's a performance issue here (like a billion strings where you're only using a handful) just creating the list you want is probably the way to go. You have a few options though!
Sequences are lazily evaluated, when there's a long chain of operations they complete the chain on each item in turn, instead of creating an intermediate full list for every operation in the chain. So that can mean less memory use, and more efficiency if you only need certain items, or you want to stop early. They have overhead though so you need to be sure it's worth it, and for your use-case (turning a list into another list) there are no intermediate lists to avoid, and I'm guessing you're using the whole thing. Probably better to just make the String list, once, and then use it?
Your other option is to make a function that takes a Long and makes a String - whatever function you're passing to map, basically, except use it when you need it. If you have a very large number of Longs and you really don't want every possible String version in memory, just generate them whenever you display them. You could make it an extension function or property if you like, so you can just go
fun Long.display() = "$this dollars"
val Long.dollaridoos: String get() = "$this.dollars"
print(number.display())
print(number.dollaridoos)
or make a wrapper object holding your list and giving access to a stringified version of the values. Whatever's your jam
Also the map approach is more efficient than creating an ArrayList and adding to it, because it can allocate a list with the correct capacity from the get-go - arbitrarily adding to an unsized list will keep growing it when it gets too big, then it has to copy to another (larger) array... until that one fills up, then it happens again...

How to choose between asIterable() vs asSequence()? [duplicate]

Both of these interfaces define only one method
public operator fun iterator(): Iterator<T>
Documentation says Sequence is meant to be lazy. But isn't Iterable lazy too (unless backed by a Collection)?

The key difference lies in the semantics and the implementation of the stdlib extension functions for Iterable<T> and Sequence<T>.
For Sequence<T>, the extension functions perform lazily where possible, similarly to Java Streams intermediate operations. For example, Sequence<T>.map { ... } returns another Sequence<R> and does not actually process the items until a terminal operation like toList or fold is called.
Consider this code:
val seq = sequenceOf(1, 2)
val seqMapped: Sequence<Int> = seq.map { print("$it "); it * it } // intermediate
print("before sum ")
val sum = seqMapped.sum() // terminal
It prints:
before sum 1 2
Sequence<T> is intended for lazy usage and efficient pipelining when you want to reduce the work done in terminal operations as much as possible, same to Java Streams. However, laziness introduces some overhead, which is undesirable for common simple transformations of smaller collections and makes them less performant.
In general, there is no good way to determine when it is needed, so in Kotlin stdlib laziness is made explicit and extracted to the Sequence<T> interface to avoid using it on all the Iterables by default.
For Iterable<T>, on contrary, the extension functions with intermediate operation semantics work eagerly, process the items right away and return another Iterable. For example, Iterable<T>.map { ... } returns a List<R> with the mapping results in it.
The equivalent code for Iterable:
val lst = listOf(1, 2)
val lstMapped: List<Int> = lst.map { print("$it "); it * it }
print("before sum ")
val sum = lstMapped.sum()
This prints out:
1 2 before sum
As said above, Iterable<T> is non-lazy by default, and this solution shows itself well: in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
If you need more control over the evaluation pipeline, there is an explicit conversion to a lazy sequence with Iterable<T>.asSequence() function.

Completing hotkey's answer:
It is important to notice how Sequence and Iterable iterates throughout your elements:
Sequence example:
list.asSequence().filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
Log result:
filter - Map - Each; filter - Map - Each
Iterable example:
list.filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
filter - filter - Map - Map - Each - Each

Iterable is mapped to the java.lang.Iterable interface on the
JVM, and is implemented by commonly used collections, like List or
Set. The collection extension functions on these are evaluated
eagerly, which means they all immediately process all elements in
their input and return a new collection containing the result.
Here’s a simple example of using the collection functions to get the
names of the first five people in a list whose age is at least 21:
val people: List<Person> = getPeople()
val allowedEntrance = people
.filter { it.age >= 21 }
.map { it.name }
.take(5)
Target platform: JVMRunning on kotlin v. 1.3.61 First, the age check
is done for every single Person in the list, with the result put in a
brand new list. Then, the mapping to their names is done for every
Person who remained after the filter operator, ending up in yet
another new list (this is now a List<String>). Finally, there’s one
last new list created to contain the first five elements of the
previous list.
In contrast, Sequence is a new concept in Kotlin to represent a lazily
evaluated collection of values. The same collection extensions are
available for the Sequence interface, but these immediately return
Sequence instances that represent a processed state of the date, but
without actually processing any elements. To start processing, the
Sequence has to be terminated with a terminal operator, these are
basically a request to the Sequence to materialize the data it
represents in some concrete form. Examples include toList, toSet,
and sum, to mention just a few. When these are called, only the
minimum required number of elements will be processed to produce the
demanded result.
Transforming an existing collection to a Sequence is pretty
straightfoward, you just need to use the asSequence extension. As
mentioned above, you also need to add a terminal operator, otherwise
the Sequence will never do any processing (again, lazy!).
val people: List<Person> = getPeople()
val allowedEntrance = people.asSequence()
.filter { it.age >= 21 }
.map { it.name }
.take(5)
.toList()
Target platform: JVMRunning on kotlin v. 1.3.61 In this case, the
Person instances in the Sequence are each checked for their age, if
they pass, they have their name extracted, and then added to the
result list. This is repeated for each person in the original list
until there are five people found. At this point, the toList function
returns a list, and the rest of the people in the Sequence are not
processed.
There’s also something extra a Sequence is capable of: it can contain
an infinite number of items. With this in perspective, it makes sense
that operators work the way they do - an operator on an infinite
sequence could never return if it did its work eagerly.
As an example, here’s a sequence that will generate as many powers of
2 as required by its terminal operator (ignoring the fact that this
would quickly overflow):
generateSequence(1) { n -> n * 2 }
.take(20)
.forEach(::println)
You can find more here.

Iterable is good enough for most use cases, the way iteration is performed on them it works very well with caches because of the spatial locality. But the issue with them is that whole collection must pass through first intermediate operation before it moves to second and so on.
In sequence each item passes through the full pipeline before the next is handled.
This property can be determental to the performance of your code especially when iterating over large data set. so, if your terminal operation is very likely to terminate early then sequence should be preferred choice because you save by not performing unnecessary operations. for example
sequence.filter { getFilterPredicate() }
.map { getTransformation() }
.first { getSelector() }
In above case if first item satisfies the filter predicate and after map transformation meets the selection criteria then filter, map and first are invoked only once.
In case of iterable whole collection must first be filtered then mapped and then first selection starts

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

apache storm: how to transform (make a new) tuple - kotlin

I would just split your bolt in two. Have a bolt that does the filtering, and then have a second bolt that receives from the filtering bolt and calls doExecute.

Related

How to get last Key or Value in Kotlin Map collection

Kotlin. What is the best way to replace element in immutable list?

How to create JSON object strategy according to a schema with rust proptest?

How to create a new list of Strings from a list of Longs in Kotlin? (inline if possible)

How to choose between asIterable() vs asSequence()? [duplicate]

Categories

Resources