I have a problem with two lists which contain duplicates
a = [1,1,2,3,4,4]
b = [1,2,3,4]
I would like to be able to extract the differences between the two lists ie.
c = [1,4]
but if I do c = a-b I get c =[]
It should be trivial but I can't find out :(
I tried also to parse the biggest list and remove items from it when I find them in the smallest list but I can't update lists on the fly, it does not work either
has anyone got an idea ?
thanks
You see an empty c as a result, because removing e.g. 1 removes all elements that are equal 1.
groovy:000> [1,1,1,1,1,2] - 1
===> [2]
What you need instead is to remove each occurrence of specific value separately. For that, you can use Groovy's Collection.removeElement(n) that removes a single element that matches the value. You can do it in a regular for-loop manner, or you can use another Groovy's collection method, e.g. inject to reduce a copy of a by removing each occurrence separately.
def c = b.inject([*a]) { acc, val -> acc.removeElement(val); acc }
assert c == [1,4]
Keep in mind, that inject method receives a copy of the a list (expression [*a] creates a new list from the a list elements.) Otherwise, acc.removeElement() would modify an existing a list. The inject method is an equivalent of a popular reduce or fold operation. Each iteration from this example could be visualized as:
--inject starts--
acc = [1,1,2,3,4,4]; val = 1; acc.removeElement(1) -> return [1,2,3,4,4]
acc = [1,2,3,4,4]; val = 2; acc.removeElement(2) -> return [1,3,4,4]
acc = [1,3,4,4]; val = 3; acc.removeElement(3) -> return [1,4,4]
acc = [1,4,4]; val = 4; acc.removeElement(4) -> return [1,4]
-- inject ends -->
PS: Kudos to almighty tim_yates who recommended improvements to that answer. Thanks, Tim!
the most readable that comes to my mind is:
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
b.each {c.removeElement(it)}
if you use this frequently you could add a method to the List metaClass:
List.metaClass.removeElements = { values -> values.each { delegate.removeElement(it) } }
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
c.removeElements(b)
Related
Is there an easier approach to convert an Intellij IDEA environment variable into a list of Tuples?
My environment variable for Intellij is
GROCERY_LIST=[("egg", "dairy"),("chicken", "meat"),("apple", "fruit")]
The environment variable gets accessed into Kotlin file as String.
val g_list = System.getenv("GROCERY_LIST")
Ideally I'd like to iterate over g_list, first element being ("egg", "dairy") and so on.
And then ("egg", "dairy") is a tuple/pair
I have tried to split g_list by comma that's NOT inside quotes i.e
val splitted_list = g_list.split(",(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*\$)".toRegex()).toTypedArray()
this gives me first element as [("egg", second element as "dairy")] and so on.
Also created a data class and tried to map the string into data class using jacksonObjectMapper following this link:
val mapper = jacksonObjectMapper()
val g_list = System.getenv("GROCERY_LIST")
val myList: List<Shopping> = mapper.readValue(g_list)
data class Shopping(val a: String, val b: String)
You can create a regular expression to match all strings in your environmental variable.
Regex::findAll()
Then loop through the strings while creating a list of Shopping objects.
// Raw data set.
val groceryList: String = "[(\"egg\", \"dairy\"),(\"chicken\", \"meat\"),(\"apple\", \"fruit\")]"
// Build regular expression.
val regex = Regex("\"([\\s\\S]+?)\"")
val matchResult = regex.findAll(groceryList)
val iterator = matchResult.iterator()
// Create a List of `Shopping` objects.
var first: String = "";
var second: String = "";
val shoppingList = mutableListOf<Shopping>()
var i = 0;
while (iterator.hasNext()) {
val value = iterator.next().value;
if (i % 2 == 0) {
first = value;
} else {
second = value;
shoppingList.add(Shopping(first, second))
first = ""
second = ""
}
i++
}
// Print Shopping List.
for (s in shoppingList) {
println(s)
}
// Output.
/*
Shopping(a="egg", b="dairy")
Shopping(a="chicken", b="meat")
Shopping(a="apple", b="fruit")
*/
data class Shopping(val a: String, val b: String)
Never a good idea to use regex to match parenthesis.
I would suggest a step-by-step approach:
You could first match the name and the value by
(\w+)=(.*)
There you get the name in group 1 and the value in group 2 without caring about any subsequent = characters that might appear in the value.
If you then want to split the value, I would get rid of start and end parenthesis first by matching by
(?<=\[\().*(?=\)\])
(or simply cut off the first and last two characters of the string, if it is always given it starts with [( and ends in )])
Then get the single list entries from splitting by
\),\(
(take care that the split operation also takes a regex, so you have to escape it)
And for each list entry you could split that simply by
,\s*
or, if you want the quote character to be removed, use a match with
\"(.*)\",\s*\"(.*)\"
where group 1 contains the key (left of equals sign) and group 2 the value (right of equals sign)
This question already has answers here:
Split a list into groups of consecutive elements based on a condition in Kotlin
(4 answers)
Closed 7 months ago.
I want to groupBy a list of items by its value, but only if subsequent, and ignore grouping otherwise:
input:
val values = listOf("Apple", "Apple", "Grape", "Grape", "Apple", "Cherry", "Cherry", "Grape")
output: {"Apple"=2, "Grape"=2, "Apple"=1, "Cherry"=2, "Grape"=1}
There's no built in option for this in Kotlin - it has to be custom, so there are many different options.
Because you need to keep track of the previous element, to compare the current one against, you need to have some sort of state. To achieve this you could use zipWithNext or windowed to group elements. Or use fold and accumulate the values into a list - removing and adding the last element depending on whether there's a break in the sequence.
To try and keep things a bit more clearer (even if it breaks the norms a bit) I recommend using vars and a single loop. I used the buildList { } DSL, which creates a clear scope for the operation.
val result: List<Pair<String, Int>> = buildList {
var previousElement: String? = null
var currentCount: Int = 0
// iterate over each incoming value
values.forEach { currentElement: String ->
// currentElement is new - so increment the count
currentCount++
// if we have a break in the sequence...
if (currentElement != previousElement) {
// then add the current element and count to our output
add(currentElement to currentCount)
// reset the count
currentCount = 0
}
// end this iteration - update 'previous'
previousElement = currentElement
}
}
Note that result will match the order of your initial list.
You cloud use MultiValueMap which can has duplicated keys. Since there is no native model you should implement yourself or use the open-source library.
Here is a reference.
Map implementation with duplicate keys
For comparison purposes, here's a short but inefficient solution written in the functional style using fold():
fun <E> List<E>.mergeConsecutive(): List<Pair<E, Int>>
= fold(listOf()) { acc, e ->
if (acc.isNotEmpty() && acc.last().first == e) {
val currentTotal = acc.last().second
acc.dropLast(1) + (e to currentTotal + 1)
} else
acc + (e to 1)
}
The accumulator builds up the list of pairs, incrementing its last entry when we get a duplicate, or appending a new entry when there's a different item. (You could make it slightly shorter by replacing the currentTotal with a call to let(), but that would be even harder to read.)
It uses immutable Lists and Pairs, and so has to create a load of temporary ones as it goes — which makes this pretty inefficient (𝒪(𝑛²)), and I wouldn't recommend it for production code. But hopefully it's instructive.
here is example of the list. I want to make dynamic where maybe the the value will become more.
val list = arrayListOf("A", "B", "C", "A", "A", "B") //Maybe they will be more
I want the output like:-
val result = list[i] + " size: " + list[i].size
So the output will display every String with the size.
A size: 3
B size: 2
C size: 1
If I add more value, so the result will increase also.
You can use groupBy in this way:
val result = list.groupBy { it }.map { it.key to it.value.size }.toMap()
Jeoffrey's way is better actually, since he is using .mapValues() directly, instead of an extra call to .toMap(). I'm just leaving this answer her since
I believe that the other info I put is relevant.
This will give a Map<String, Int>, where the Int is the count of the occurences.
This result will not change when you change the original list. That is not how the language works. If you want something like that, you'd need quite a bit of work, like overwriting the add function from your collection to refresh the result map.
Also, I see no reason for you to use an ArrayList, especially since you are expecting to increase the size of that collection, I'd stick with MutableList if I were you.
I think the terminology you're looking for is "frequency" here: the number of times an element appears in a list.
You can usually count elements in a list using the count method like this:
val numberOfAs = list.count { it == "A" }
This approach is pretty inefficient if you need to count all elements though, in which case you can create a map of frequencies the following way:
val freqs = list.groupBy { it }.mapValues { (_, g) -> g.size }
freqs here will be a Map where each key is a unique element from the original list, and the value is the corresponding frequency of that element in the list.
This works by first grouping elements that are equal to each other via groupBy, which returns a Map<String, List<String>> where each key is a unique element from the original list, and each value is the group of all elements in the list that were equal to the key.
Then mapValues will transform that map so that the values are the sizes of the groups instead of the groups themselves.
An improved approach, as suggested by #broot is to make use of Kotlin's Grouping class which has a built-in eachCount method:
val freqs = list.groupingBy { it }.eachCount()
I am currently programming/simulating a small plant in CODESYS.
I have several outputs (that correspond to engines) that I need to test several times, so I want to create a condition that incorporates this test so I dont need to write the entire condition.
For instance, i have the condition that verifies if
A=TRUE AND B=TRUE AND C=TRUE AND D=TRUE
Can I create a condition like "verify engine" to use each time ?
Thank you
There are many ways to do this (if I understood you correctly).
Here are two ways for example:
1. Create a variable that has the condition result and use the variable. You have to assign the variable at beginning, and then you can use the variable instead of that long code.
VAR
EnginesOK : BOOL;
END_VAR
//Check engines
EnginesOK := (A = TRUE AND B = TRUE AND C = TRUE AND D = TRUE);
//.. Later ..
IF EnginesOK THEN
//Do something
END_IF
2. Create a function, for example F_VerifyEngines that contains checks and returns the state as BOOL. Note: In this example A,B,C and D need to be global variables. You could also pass them as parameters for the function.
FUNCTION F_VerifyEngines : BOOL
VAR_INPUT
//Add A,B,C,D here if needed
END_VAR
VAR
END_VAR
//Return the result
F_VerifyEngines := (A = TRUE AND B = TRUE AND C = TRUE AND D = TRUE);
Then you can use the function in code:
IF F_VerifyEngines() THEN
//Do something
END_IF
The 2nd way is probably the one you were thinking.
By the way, there is no need to write A = TRUE AND B = TRUE AND C = TRUE AND D = TRUE, in my opinion, it's more clear to read when you use A AND B AND C AND D instead.
Lets say I have a very wide data source:
big_thing = LOAD 'some_path' using MySpecialLoader;
Now I want to generate some smaller thing composed of a subset of big_thing's columns.
smaller_thing = FOREACH big_thing GENERATE
$21,$22,$23 ...... $257;
Is there a way to achieve this without having to write out all the columns?
I'm assuming yes but my searches aren't coming up with much, I think I'm just using the wrong terminology.
EDIT:
So it looks like my question is being very misunderstood. Since I'm a Python person I'll give a python analogy.
Say I have an array l1 which is made up of arrays. So it looks like a grid right? Now I want the array l2 to be a subset of 'l1such thatl2' contains a bunch of columns from l1. I would do something like this:
l2 = [[l[a],l[b],l[c],l[d]] for l in l1]
# a,b,c,d are just some constants.
In pig this is equivalent to something like:
smaller_thing = FOREACH big_thing GENERATE
$1,$22,$3,$21;
But I have a heck of a lot of columns. And the columns I'm interested in are all sequential and there are a lot of those. Then in python I would do this:
l2 = [l[x:y] for l in l2]
#again, x and y are constants, eg x=20, y=180000000. See, lots of stuff I dont want to type out
My question is what is the pig equivalent to this?
smaller_thing = FOREACH big_thing GENERATE ?????
And what about stuff like this:
Python:
l2 = [l[x:y]+l[a:b]+[l[b],l[c],l[d]] for l in l2]
Pig:
smaller_thing = FOREACH big_thing GENERATE ?????
Yes, you can simply load the dataset without columns.
But if you load the data with column names will help you to identify the column details in future scripts.
UDF can help you to perform your query,
For example,
REGISTER UDF\path;
a = load 'data' as (a1);
b = foreach a generate UDF.Func(a1,2,4);
UDF:
public class col_gen extends EvalFunc<String>
{
#Override
public String exec(Tuple tuple) throws IOException {
String data = tuple.get(0).toString();
int x = (int)tuple.get(1);
int y = (int)tuple.get(2);
String[] data3 = data.split(",");
String data2 = data3[x]+",";
x = x+1;
while(x <= y)
{
data2 += data3[x]+",";
x++;
}
data2 = data2.substring(0, data2.length()-1);
return data2;
}
}
The answer can be found in this post: http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/
Distanced = FOREACH Different GENERATE artistLat..songPreview, etc;
The .. says use everything from artistLat to songPreview.
The same thing can be done with positional notation. eg $1..$6