IgniteSqlRDD has only one partition

IgniteSqlRDD has only one partition - ignite

I brief the code of IgniteRDD,
class IgniteSqlRDD[R: ClassTag, T, K, V](
ic: IgniteContext,
cacheName: String,
cacheCfg: CacheConfiguration[K, V],
qry: Query[T],
conv: (T) ⇒ R,
keepBinary: Boolean
) extends IgniteAbstractRDD[R, K, V](ic, cacheName, cacheCfg, keepBinary) {
override def compute(split: Partition, context: TaskContext): Iterator[R] = {
new IgniteQueryIterator[T, R](ensureCache().query(qry).iterator(), conv)
}
override protected def getPartitions: Array[Partition] = {
Array(new IgnitePartition(0))
}
}
I noticed that it has hard coded the number of partitions which is only one, this will significantly reduce the performance with parallelism being one . I would ask why it is so designed, thanks!

IgniteSqlRDD is an internal implementation used only for result sets which a are fully fetched to the driver, so this RDD is not distributed. Thus there is only one partition.
IgniteRDD on the other hand represents an Ignite cache which is distributed.

Related

Kotlin: Is there a tool that allows me to control parallelism when executing suspend functions?

I'm trying to execute certain suspend function multiple times, in such a way that never more than N of these are being executed at the same time.
For those acquainted with Akka and Scala Streaming libraries, something like mapAsync.
I did my own implementation using one input channel (as in kotlin channels) and N output channels. But it seems cumbersome and not very efficient.
The code I'm currently using is somewhat like this:
val inChannel = Channel<T>()
val outChannels = (0..n).map{
Channel<T>()
}
launch{
var i = 0
for(t in inChannel){
outChannels[i].offer(t)
i = ((i+1)%n)
}
}
outChannels.forEach{outChannel ->
launch{
for(t in outChannel){
fn(t)
}
}
}
Of course it has error management and everything, but still...
Edit: I did the following test, and it failed.
test("Parallelism is correctly capped") {
val scope = CoroutineScope(Dispatchers.Default.limitedParallelism(3))
var num = 0
(1..100).map {
scope.launch {
num ++
println("started $it")
delay(Long.MAX_VALUE)
}
}
delay(500)
assertEquals(3,num)
}

You can use the limitedParallelism-function on a Dispatcher (experimental in v1.6.0), and use the returned dispatcher to call your asynchronous functions. The function returns a view over the original dispatcher which limits the parallelism to a limit you provide. You can use it like this:
val limit = 2 // Or some other number
val dispatcher = Dispatchers.Default
val limitedDispatcher = dispatcher.limitedParallelism(limit)
for (n in 0..100) {
scope.launch(limitedDispatcher) {
executeTask(n)
}
}

Your question, as asked, calls for #marstran's answer. If what you want is that no more than N coroutines are being actively executed at any given time (in parallel), then limitedParallelism is the way to go:
val maxThreads: Int = TODO("some max number of threads")
val limitedDispatcher = Dispatchers.Default.limitedParallelism(maxThreads)
elements.forEach { elt ->
scope.launch(limitedDispatcher) {
doSomething(elt)
}
}
Now, if what you want is to even limit concurrency, so that at most N coroutines are run concurrently (potentially interlacing), regardless of threads, you could use a Semaphore instead:
val maxConcurrency: Int = TODO("some max number of concurrency coroutines")
val semaphore = Semaphore(maxConcurrency)
elements.forEach { elt ->
scope.async {
semaphore.withPermit {
doSomething(elt)
}
}
}
You can also combine both approaches.

Other answers already explained that it depends whether you need to limit parallelism or concurrency. If you need to limit concurrency, then you can do this similarly to your original solution, but with only a single channel:
val channel = Channel<T>()
repeat(n) {
launch {
for(t in channel){
fn(t)
}
}
}
Also note that offer() in your example does not guarantee that the task will be ever executed. If the next consumer in the round robin is still occupied with the previous task, the new task is simply ignored.

What is the most efficient way to generate random numbers from a union of disjoint ranges in Kotlin?

I would like to generate random numbers from a union of ranges in Kotlin. I know I can do something like
((1..10) + (50..100)).random()
but unfortunately this creates an intermediate list, which can be rather expensive when the ranges are large.
I know I could write a custom function to randomly select a range with a weight based on its width, followed by randomly choosing an element from that range, but I am wondering if there is a cleaner way to achieve this with Kotlin built-ins.

Suppose your ranges are nonoverlapped and sorted, if not, you could have some preprocessing to merge and sort.
This comes to an algorithm choosing:
O(1) time complexity and O(N) space complexity, where N is the total number, by expanding the range object to a set of numbers, and randomly pick one. To be compact, an array or list could be utilized as the container.
O(M) time complexity and O(1) space complexity, where M is the number of ranges, by calculating the position in a linear reduction.
O(M+log(M)) time complexity and O(M) space complexity, where M is the number of ranges, by calculating the position using a binary search. You could separate the preparation(O(M)) and generation(O(log(M))), if there are multiple generations on the same set of ranges.
For the last algorithm, imaging there's a sorted list of all available numbers, then this list can be partitioned into your ranges. So there's no need to really create this list, you just calculate the positions of your range s relative to this list. When you have a position within this list, and want to know which range it is in, do a binary search.
fun random(ranges: Array<IntRange>): Int {
// preparation
val positions = ranges.map {
it.last - it.first + 1
}.runningFold(0) { sum, item -> sum + item }
// generation
val randomPos = Random.nextInt(positions[ranges.size])
val found = positions.binarySearch(randomPos)
// binarySearch may return an "insertion point" in negative
val range = if (found < 0) -(found + 1) - 1 else found
return ranges[range].first + randomPos - positions[range]
}

Short solution
We can do it like this:
fun main() {
println(random(1..10, 50..100))
}
fun random(vararg ranges: IntRange): Int {
var index = Random.nextInt(ranges.sumOf { it.last - it.first } + ranges.size)
ranges.forEach {
val size = it.last - it.first + 1
if (index < size) {
return it.first + index
}
index -= size
}
throw IllegalStateException()
}
It uses the same approach you described, but it calls for random integer only once, not twice.
Long solution
As I said in the comment, I often miss utils in Java/Kotlin stdlib for creating collection views. If IntRange would have something like asList() and we would have a way to concatenate lists by creating a view, this would be really trivial, utilizing existing logic blocks. Views would do the trick for us, they would automatically calculate the size and translate the random number to the proper value.
I implemented a POC, maybe you will find it useful:
fun main() {
val list = listOf(1..10, 50..100).mergeAsView()
println(list.size) // 61
println(list[20]) // 60
println(list.random())
}
#JvmName("mergeIntRangesAsView")
fun Iterable<IntRange>.mergeAsView(): List<Int> = map { it.asList() }.mergeAsView()
#JvmName("mergeListsAsView")
fun <T> Iterable<List<T>>.mergeAsView(): List<T> = object : AbstractList<T>() {
override val size = this#mergeAsView.sumOf { it.size }
override fun get(index: Int): T {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
var remaining = index
this#mergeAsView.forEach { curr ->
if (remaining < curr.size) {
return curr[remaining]
}
remaining -= curr.size
}
throw IllegalStateException()
}
}
fun IntRange.asList(): List<Int> = object : AbstractList<Int>() {
override val size = endInclusive - start + 1
override fun get(index: Int): Int {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
return start + index
}
}
This code does almost exactly the same thing as short solution above. It only does this indirectly.
Once again: this is just a POC. This implementation of asList() and mergeAsView() is not at all production-ready. We should implement more methods, like for example iterator(), contains() and indexOf(), because right now they are much slower than they could be. But it should work efficiently already for your specific case. You should probably test it at least a little. Also, mergeAsView() assumes provided lists are immutable (they have fixed size) which may not be true.
It would be probably good to implement asList() for IntProgression and for other primitive types as well. Also you may prefer varargs version of mergeAsView() than extension function.
As a final note: I guess there are libraries that does this already - probably some related to immutable collections. But if you look for a relatively lightweight solution, it should work for you.

Generating unique random values with SecureRandom

i'm currently implementing a secret sharing scheme.(shamir)
In order to generate some secret shares, I need to generate some random numbers within a range. FOr this purpose, I have this very simple code:
val sharesPRG = SecureRandom()
fun generateShares(k :Int): List<Pair<BigDecimal,BigDecimal>> {
val xs = IntArray(k){ i -> sharesPRG.nextInt(5)}
return xs
}
I have left out the part that actually creates the shares as coordinates, just to make it reproduceable, and picked an arbitrarily small bound of 5.
My problem is that I of course need these shares to be unique, it doesnt make sense to have shares that are the same.
So would it be possible for the Securerandom.nextint to not return a value that it has already returned?
Of course I could do some logic where I was checking for duplicates, but I really thought there should be something more elegant

If your k is not too large you can keep adding random values to a set until it reaches size k:
fun generateMaterial(k: Int): Set<Int> = mutableSetOf<Int>().also {
while (it.size < k) {
it.add(sharesPRG.nextInt(50))
}
}
You can then use the set as the source material to your list of pairs (k needs to be even):
fun main() {
val pairList = generateMaterial(10).windowed(2, 2).map { it[0] to it[1] }
println(pairList)
}

Kotlin: maxBy{} with optimum-value

Let's say I have the following code in Kotlin:
val min = listOf("hello", "", "teeeeeest").minBy { it.length }
What I understand from the implementation of minBy is that it tracks minValue in a variable and iterates through the whole collection and updates it once it finds an even smaller element.
In the case of Strings though, we know that no element can have a value smaller than 0, therefore the empty String "" is optimal and the iteration can be stopped.
Is there a way I can tell minBy (or maxBy) the optimal value so it can stop once that is reached? If not, how can I implement this most easily?

There's no function in the stdlib that can do this, but you can implement it as an extension function yourself.
By using the non-local return feature of inline lambda functions in Kotlin, you can implement it like this:
fun <T, E : Comparable<E>> Iterable<T>.minBy(theoreticalMinimum: E, keySelector: (T) -> E): T? =
minBy {
val key = keySelector(it)
if (key <= theoreticalMinimum) return it // Non-local return.
else key
}
Now you can use it like this, and it will never visit "teeeeeest":
val min = listOf("hello", "", "teeeeeest").minBy(theoreticalMinimum = 0) { it.length }

Return double index of collection's element while iterating

In Kotlin documentation I found the following example:
for ((index, value) in array.withIndex()) {
println("the element at $index is $value")
}
Is it possible (and how) to do the similar with 2D matrix:
for ((i, j, value) in matrix2D.withIndex()) {
// but iterate iver double index: i - row, j - column
if (otherMatrix2D[i, j] > value) doSomething()
}
How to make support this functionality in Kotlin class?

While the solutions proposed by miensol and hotkey are correct it would be the least efficient way to iterate a matrix. For instance, the solution of hotkey makes M * N allocations of Cell<T> plus M allocations of List<Cell<T>> and IntRange plus one allocation of List<List<Cell<T>>> and IntRange. Moreover lists resize when new cells are added so even more allocations happen. That's too much allocations for just iterating a matrix.
Iteration using an inline function
I would recommend you to implement a very similar and very effective at the same time extension function that will be similar to Array<T>.forEachIndexed. This solution doesn't do any allocations at all and as efficient as writing nested for cycles.
inline fun <T> Matrix<T>.forEachIndexed(callback: (Int, Int, T) -> Unit) {
for (i in 0..cols - 1) {
for (j in 0..rows - 1) {
callback(i, j, this[i, j])
}
}
}
You can call this function in the following way:
matrix.forEachIndexed { i, j, value ->
if (otherMatrix[i, j] > value) doSomething()
}
Iteration using a destructive declaration
If you want to use a traditional for-loop with destructive declaration for some reason there exist a way more efficient but hacky solution. It uses a sequence instead of allocating multiple lists and creates only a single instance of Cell, but the Cell itself is mutable.
data class Cell<T>(var i: Int, var j: Int, var value: T)
fun <T> Matrix<T>.withIndex(): Sequence<Cell<T>> {
val cell = Cell(0, 0, this[0, 0])
return generateSequence(cell) { cell ->
cell.j += 1
if (cell.j >= rows) {
cell.j = 0
cell.i += 1
if (cell.i >= cols) {
return#generateSequence null
}
}
cell.value = this[cell.i, cell.j]
cell
}
}
And you can use this function to iterate a matrix in a for-loop:
for ((i, j, item) in matrix.withIndex()) {
if (otherMatrix[i, j] > value) doSomething()
}
This solution is lightly less efficient than the first one and not so robust because of a mutable Cell, so I would really recommend you to use the first one.

These two language features are used for implementing the behaviour that you want:
For-loops can be used with any class that has a method that provides an iterator.
for (item in myItems) { ... }
This code will compile if myItems has function iterator() returning something with functions hasNext(): Boolean and next().
Usually it is an Iterable<SomeType> implementation (some collection), but you can add iterator() method to an existing class as an extension, and you will be able to use that class in for-loops as well.
For destructuring declaration, the item type should have componentN() functions.
val (x, y, z) = item
Here the compiler expects item to have component1(), component2() and component3() functions. You can also use data classes, they have these functions generated.
Destructuring in for-loop works in a similar way: the type that the iterator's next() returns must have componentN() functions.
Example implementation (not pretending to be best at performance, see below):
Class with destructuring support:
class Cell<T>(val i: Int, val j: Int, val item: T) {
operator fun component1() = i
operator fun component2() = j
operator fun component3() = item
}
Or using data class:
data class Cell<T>(val i: Int, val j: Int, val item: T)
Function that returns List<Cell<T>> (written as an extension, but can also be a member function):
fun <T> Matrix<T>.withIndex() =
(0 .. height - 1).flatMap { i ->
(0 .. width - 1). map { j ->
Cell(i, j, this[i, j])
}
}
The usage:
for ((i, j, item) in matrix2d.withIndex()) { ... }
UPD Solution offered by Michael actually performs better (run this test, the difference is about 2x to 3x), so it's more suitable for performance critical code.

The following method:
data class Matrix2DValue<T>(val x: Int, val y: Int, val value: T)
fun withIndex(): Iterable<Matrix2DValue<T>> {
//build the list of values
}
Would allow you to write for as:
for ((x, y, value) in matrix2d.withIndex()) {
println("value: $value, x: $x, y: $y")
}
Bear in mind though that the order in which you declare data class properties defines the values of (x, y, value) - as opposed to for variable names. You can find more information about destructuring in the Kotlin documentation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

IgniteSqlRDD has only one partition - ignite

IgniteSqlRDD is an internal implementation used only for result sets which a are fully fetched to the driver, so this RDD is not distributed. Thus there is only one partition. IgniteRDD on the other hand represents an Ignite cache which is distributed.

Related

Kotlin: Is there a tool that allows me to control parallelism when executing suspend functions?

What is the most efficient way to generate random numbers from a union of disjoint ranges in Kotlin?

Generating unique random values with SecureRandom

Kotlin: maxBy{} with optimum-value

Return double index of collection's element while iterating

Categories

Resources