Remove duplicate entry from list with a condition in kotlin - kotlin

So, I am implementing a POM parser. I've got to the point where I can get all dependencies, but some artifacts may be duplicate with different versions. Like gradle, I want it to just fetch the greater version among those. But in kotlin, I can only see examples of removing duplicates using distinctBy function which does not accept a condition. I wanna filter it by only removing the duplicate entry which has the lowest version and keeping the greater one. How can I do so?
The entries in my list is as follows
data class Artifact(val groupId: String, val artifactId: String, val version: String)
edit: For example, let the list be like
Artifact("com.squareup.okio", "okio-jvm", "3.2.0")
Artifact("org.jetbrains.kotlin", "kotlin-stdlib-jdk8", "1.6.20")
Artifact("org.jetbrains.kotlin", "kotlin-stdlib-common", "1.5.20")
Artifact("org.jetbrains.kotlin", "kotlin-stdlib-common", "1.6.21")
Artifact("org.jetbrains", "annotations", "13.0")
And the output is expected to be like
Artifact("com.squareup.okio", "okio-jvm", "3.2.0")
Artifact("org.jetbrains.kotlin", "kotlin-stdlib-jdk8", "1.6.20")
Artifact("org.jetbrains.kotlin", "kotlin-stdlib-common", "1.6.21")
Artifact("org.jetbrains", "annotations", "13.0")
i.e, to remove the version that is the lowest among the duplicates

The main part of this is fairly straightforward: you can use groupBy to group the artifacts by group ID and artifact ID, and then maxBy* to select the latest version from each group.
A more tricky part is comparing version strings… But luckily, the Java stdlib⁑ has a class for this!
So you can simply:
import java.lang.module.ModuleDescriptor.Version
data class Artifact(val groupId: String, val artifactId: String, val version: String)
fun main() {
// Some simple sample data:
val deps = listOf(Artifact("gp", "art", "9.2"), Artifact("gp", "art", "9.10"), Artifact("gp", "art", "10.0"), Artifact("gp2", "art2", "1.0"))
// Find the latest version of each dependency:
val latestDeps = deps.groupBy{ it.groupId to it.artifactId }.values
.map{ it.maxBy{ Version.parse(it.version) }}
// And display the result:
println(latestDeps)
}
(That creates a set. Since the items will be unique, but have no inherent ordering, it would make sense to call toSet() on the result… but I leave that to you!)
(* That's deprecated now in favour of maxByOrNull(), but I'm ignoring that for simplicity, as the groups here will never be empty.)
(⁑ If you're not running on Kotlin/JVM, then you'll have to write your own version-number-comparing function. I've written one; it's a bit fiddly, but not too hard. — I think you have to split the string on dots and then compare the individual numbers numerically, pairwise, though it's complicated if you have to handle non-numeric parts such as trailing a or -SNAPSHOT or patch numbers… If you need to do that, it would probably be best as a separate question.)

Related

How to chain filter expressions together

I have data in the following format
ArrayList<Map.Entry<String,ByteString>>
[
{"a":[a-bytestring]},
{"b":[b-bytestring]},
{"a:model":[amodel-bytestring]},
{"b:model":[bmodel-bytestring]},
]
I am looking for a clean way to transform this data into the format (List<Map.Entry<ByteString,ByteString>>) where the key is the value of a and value is the value of a:model.
Desired output
List<Map.Entry<ByteString,ByteString>>
[
{[a-bytestring]:[amodel-bytestring]},
{[b-bytestring]:[bmodel-bytestring]}
]
I assume this will involve the use of filters or other map operations but am not familiar enough with Kotlin yet to know this
It's not possible to give an exact, tested answer without access to the ByteString class — but I don't think that's needed for an outline, as we don't need to manipulate byte strings, just pass them around. So here I'm going to substitute Int; it should be clear and avoid any dependencies, but still work in the same way.
I'm also going to use a more obvious input structure, which is simply a map:
val input = mapOf("a" to 1,
"b" to 2,
"a:model" to 11,
"b:model" to 12)
As I understand it, what we want is to link each key without :model with the corresponding one with :model, and return a map of their corresponding values.
That can be done like this:
val output = input.filterKeys{ !it.endsWith(":model") }
.map{ it.value to input["${it.key}:model"] }.toMap()
println(output) // Prints {1=11, 2=12}
The first line filters out all the entries whose keys end with :model, leaving only those without. Then the second creates a map from their values to the input values for the corresponding :model keys. (Unfortunately, there's no good general way to create one map directly from another; here map() creates a list of pairs, and then toMap() creates a map from that.)
I think if you replace Int with ByteString (or indeed any other type!), it should do what you ask.
The only thing to be aware of is that the output is a Map<Int, Int?> — i.e. the values are nullable. That's because there's no guarantee that each input key has a corresponding :model key; if it doesn't, the result will have a null value. If you want to omit those, you could call filterValues{ it != null } on the result.
However, if there's an ‘orphan’ :model key in the input, it will be ignored.

Finding best delimiter by size of resulting array after split Kotlin

I am trying to obtain the best delimiter for my CSV file, I've seen answers that find the biggest size of the header row. Now instead of doing the standard method that would look something like this:
val supportedDelimiters: Array<Char> = arrayOf(',', ';', '|', '\t')
fun determineDelimiter(headerRow): Char {
var headerLength = 0
var chosenDelimiter =' '
supportedDelimiters.forEach {
if (headerRow.split(it).size > headerLength) {
headerLength = headerRow.split(it).size
chosenDelimiter = it
}
}
return chosenDelimiter
}
I've been trying to do it with some in-built Kotlin collections methods like filter or maxOf, but to no avail (the code below does not work).
fun determineDelimiter(headerRow: String): Char {
return supportedDelimiters.filter({a,b -> headerRow.split(a).size < headerRow.split(b)})
}
Is there any way I could do it without forEach?
Edit: The header row could look something like this:
val headerRow = "I;am;delimited;with;'semi,colon'"
I put the '' over an entry that could contain other potential delimiter
You're mostly there, but this seems simpler than you think!
Here's one answer:
fun determineDelimiter(headerRow: String)
= supportedDelimiters.maxByOrNull{ headerRow.split(it).size } ?: ' '
maxByOrNull() does all the hard work: you just tell it the number of headers that a delimiter would give, and it searches through each delimiter to find which one gives the largest number.
It returns null if the list is empty, so the method above returns a space character, like your standard method. (In this case we know that the list isn't empty, so you could replace the ?: ' ' with !! if you wanted that impossible case to give an error, or you could drop it entirely if you wanted it to give a null which would be handled elsewhere.)
As mentioned in a comment, there's no foolproof way to guess the CSV delimiter in general, and so you should be prepared for it to pick the wrong delimiter occasionally. For example, if the intended delimiter was a semicolon but several headers included commas, it could wrongly pick the comma. Without knowing any more about the data, there's no way around that.
With the code as it stands, there could be multiple delimiters which give the same number of headers; it would simply pick the first. You might want to give an error in that case, and require that there's a unique best delimiter. That would give you a little more confidence that you've picked the right one — though there's still no guarantee. (That's not so easy to code, though…)
Just like gidds said in the comment above, I would advise against choosing the delimiter based on how many times each delimiter appears. You would get the wrong answer for a header row like this:
Type of shoe, regardless of colour, even if black;Size of shoe, regardless of shape
In the above header row, the delimiter is obviously ; but your method would erroneously pick ,.
Another problem is that a header column may itself contain a delimiter, if it is enclosed in quotes. Your method doesn't take any notice of possible quoted columns. For this reason, I would recommend that you give up trying to parse CSV files yourself, and instead use one of the many available Open Source CSV parsers.
Nevertheless, if you still want to know how to pick the delimiter based on its frequency, there are a few optimizations to readability that you can make.
First, note that Kotlin strings are iterable; therefore you don't have to use a List of Char. Use a String instead.
Secondly, all you're doing is counting the number of times a character appears in the string, so there's no need to break the string up into pieces just to do that. Instead, count the number of characters directly.
Third, instead of finding the maximum value by hand, take advantage of what the standard library already offers you.
const val supportedDelimiters = ",;|\t"
fun determineDelimiter(headerRow: String): Char =
supportedDelimiters.maxBy { delimiter -> headerRow.count { it == delimiter } }
fun main() {
val headerRow = "one,two,three;four,five|six|seven"
val chosenDelimiter = determineDelimiter(headerRow)
println(chosenDelimiter) // prints ',' as expected
}

min max functions for a list of string is returning on what basis

I'm trying to explore kotlin but when i came across list of strings like below.
I tried min and max functions with it.
And i initially thought it will give compile time error but i didn't get that.
And when i print min i got a555585887996669 as output which is the longest word in array.
val list = listOf<String>("a555585887996669","abtfcr6cr","abcde","abcd")
println(list.min()) //a555585887996669
I need to know on what basis it is returning this value
why min and max is supported to list of strings
The min() and max() extension functions operate on anything that can be compared.  That includes numeric types, but also on anything that implements the Comparable interface, which is the standard way for objects to implement a natural ordering.
In this case, String implements Comparable; it uses lexicographic order (which is roughly the order of words in a dictionary), comparing characters pairwise until it finds a difference, or until one String ends.  So for example "a" < "abc" < "b".
Collection ordering in Kotlin is explained here.
Have a look at this docu:
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/min.html
Since String implements comparable, min will return the smallest value based on alpha-numeric sorting.
For eval. array by string length:
list.minBy { it -> it.length };

Creating 4 digit number with no repeating elements in Kotlin

Thanks to #RedBassett for this Ressource (Kotlin problem solving): https://kotlinlang.org/docs/tutorials/koans.html
I'm aware this question exists here:
Creating a 4 digit Random Number using java with no repetition in digits
but I'm new to Kotlin and would like to explore the direct Kotlin features.
So as the title suggests, I'm trying to find a Kotlin specific way to nicely solve generate a 4 digit number (after that it's easy to make it adaptable for length x) without repeating digits.
This is my current working solution and would like to make it more Kotlin. Would be very grateful for some input.
fun createFourDigitNumber(): Int {
var fourDigitNumber = ""
val rangeList = {(0..9).random()}
while(fourDigitNumber.length < 4)
{
val num = rangeList().toString()
if (!fourDigitNumber.contains(num)) fourDigitNumber +=num
}
return fourDigitNumber.toInt()
}
So the range you define (0..9) is actually already a sequence of numbers. Instead of iterating and repeatedly generating a new random, you can just use a subset of that sequence. In fact, this is the accepted answer's solution to the question you linked. Here are some pointers if you want to implement it yourself to get the practice:
The first for loop in that solution is unnecessary in Kotlin because of the range. 0..9 does the same thing, you're on the right track there.
In Kotlin you can call .shuffled() directly on the range without needing to call Collections.shuffle() with an argument like they do.
You can avoid another loop if you create a string from the whole range and then return a substring.
If you want to look at my solution (with input from others in the comments), it is in a spoiler here:
fun getUniqueNumber(length: Int) = (0..9).shuffled().take(length).joinToString('')
(Note that this doesn't gracefully handle a length above 10, but that's up to you to figure out how to implement. It is up to you to use subList() and then toString(), or toString() and then substring(), the output should be the same.)

Large list literals in Kotlin stalling/crashing compiler

I'm using val globalList = listOf("a1" to "b1", "a2" to "b2") to create a large list of Pairs of strings.
All is fine until you try to put more than 1000 Pairs into a List. The compiler either takes > 5 minutes or just crashes (Both in IntelliJ and Android Studio).
Same happens if you use simple lists of Strings instead of Pairs.
Is there a better way / best practice to include large lists in your source code without resorting to a database?
You can replace a listOf(...) expression with a list created using a constructor or a factory function and adding the items to it:
val globalList: List<Pair<String, String>> = mutableListOf().apply {
add("a1" to "b1")
add("a2" to "b2")
// ...
}
This is definitely a simpler construct for the compiler to analyze.
If you need something quick and dirty instead of data files, one workaround is to use a large string, then split and map it into a list. Here's an example mapping into a list of Ints.
val onCommaWhitespace = "[\\s,]+".toRegex() // in this example split on commas w/ whitespace
val bigListOfNumbers: List<Int> = """
0, 1, 2, 3, 4,
:
:
:
8187, 8188, 8189, 8190, 8191
""".trimIndent()
.split(onCommaWhitespace)
.map { it.toInt() }
Of course for splitting into a list of Strings, you'd have to choose an appropriate delimiter and regex that don't interfere with the actual data set.
There's no good way to do what you want; for something that size, reading the values from a data file (or calculating them, if that were possible) is a far better solution all round — more maintainable, much faster to compile and run, easier to read and edit, less likely to cause trouble with build tools and frameworks…
If you let the compiler finish, its output will tell you the problem.  (‘Always read the error messages’ should be one of the cardinal rules of development!)
I tried hotkey's version using apply(), and it eventually gave this error:
…
Caused by: org.jetbrains.org.objectweb.asm.MethodTooLargeException: Method too large: TestKt.main ()V
…
There's the problem: MethodTooLargeException.  The JVM allows only 65535 bytes of bytecode within a single method; see this answer.  That's the limit you're coming up against here: once you have too many entries, its code would exceed that limit, and so it can't be compiled.
If you were a real masochist, you could probably work around this to an extent by splitting the initialisation across many methods, keeping each one's code just under the limit.  But please don't!  For the sake of your colleagues, for the sake of your compiler, and for the sake of your own mental health…