How does sorting for arrays of arrays differ to multidimensional arrays in awk? - awk

I have approached the a problem to list a set of items, which have components, which in turn have properties in awk.
I have tried to approach the problem in two ways.
1) Define an array list[item-number,component-number][properties].
2) Define an array list[item-number][component-number][properties].
This was in many ways interesting, as I noticed (2) maintain the order of insertion, while (1) does not. I know arrays are associative in awk and it could very well be a coincidence this happened. However, as the order of insertion is important in my case (and also, I want to learn more about awk), I would like to know if this is what happening and why.
Any ideas?
BR
Patrik

Neither approach retains any information on the order of insertion, if it seems like either does then that is just coincidence. If the order of insertion is important to you then you need to write some code to track that order, e.g.
key = foo FS bar
if ( !(key in list) ) {
keys[++numKeys] = key
}
list[key] = whatever
would give you an array keys[] of the indices in the order they are inserted and an array list[] that maps each key to it's value so you can later do:
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
print list[key]
}
or similar to print the contents of list[] in the order they were inserted.

Related

How to chain filter expressions together

I have data in the following format
ArrayList<Map.Entry<String,ByteString>>
[
{"a":[a-bytestring]},
{"b":[b-bytestring]},
{"a:model":[amodel-bytestring]},
{"b:model":[bmodel-bytestring]},
]
I am looking for a clean way to transform this data into the format (List<Map.Entry<ByteString,ByteString>>) where the key is the value of a and value is the value of a:model.
Desired output
List<Map.Entry<ByteString,ByteString>>
[
{[a-bytestring]:[amodel-bytestring]},
{[b-bytestring]:[bmodel-bytestring]}
]
I assume this will involve the use of filters or other map operations but am not familiar enough with Kotlin yet to know this
It's not possible to give an exact, tested answer without access to the ByteString class — but I don't think that's needed for an outline, as we don't need to manipulate byte strings, just pass them around. So here I'm going to substitute Int; it should be clear and avoid any dependencies, but still work in the same way.
I'm also going to use a more obvious input structure, which is simply a map:
val input = mapOf("a" to 1,
"b" to 2,
"a:model" to 11,
"b:model" to 12)
As I understand it, what we want is to link each key without :model with the corresponding one with :model, and return a map of their corresponding values.
That can be done like this:
val output = input.filterKeys{ !it.endsWith(":model") }
.map{ it.value to input["${it.key}:model"] }.toMap()
println(output) // Prints {1=11, 2=12}
The first line filters out all the entries whose keys end with :model, leaving only those without. Then the second creates a map from their values to the input values for the corresponding :model keys. (Unfortunately, there's no good general way to create one map directly from another; here map() creates a list of pairs, and then toMap() creates a map from that.)
I think if you replace Int with ByteString (or indeed any other type!), it should do what you ask.
The only thing to be aware of is that the output is a Map<Int, Int?> — i.e. the values are nullable. That's because there's no guarantee that each input key has a corresponding :model key; if it doesn't, the result will have a null value. If you want to omit those, you could call filterValues{ it != null } on the result.
However, if there's an ‘orphan’ :model key in the input, it will be ignored.

Kotlin error "Index Out Of Bounds Exception"

I'm newbie to Kotlin, and new to programming also, so pls be gentle :)
Let's say I have a string (it was optimized to NOT have any duplicated character), i want to compare all characters in that string to the alphabet, which declared as a mutable List of character. I want to delete any character from the alphabet which show up in the string. My code is as below
var alphabet=mutableListOf('a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z')
var key="keyword"
println(key)
for (i in key.indices)
{for (j in alphabet.indices)
{if (key[i] == alphabet[j])
alphabet.removeAt(j) // 1. this line have error
//print(alphabet[j]) //2. but this line runs fine
}}}
In above code, I have error at the "alphabet.removeAt(j)" command, so I try another command to print out the characters instead of delete them, and it runs fine. I read some articles and I know this error related to the invalid index, but I used the "indices" key and I think it's pretty safe. Pls help
It is safe to iterate using alphabet.indices, but it is not safe to iterate over a collection while modifying it. Note that indices returned indices for a full alphabet, but then you removed some items from it, making it shorter, so indices are no longer valid.
You don't need to iterate over a collection to find an item to remove. You can just do:
alphabet.remove(key[i])
But honestly, you don't need to do anything of this. Your problem is really a subtracting of two sets and you can solve it much easier:
('a'..'z').toSet() - "keyword".toSet()
You could simplify that whole loop to just:
alphabet.retainAll{ it !in key })
or
alphabet.retainAll { !key.contains(it) }
or if you want the filtered list to be a new list rather than doing it in-place:
val filtered = alphabet.filter { it !in key }
but I used the "indices" key and I think it's pretty safe
Well, the indices collection is only evaluated once when a loop is entered, not at the start of each iteration. Even if you change the size of alphabet in the inner loop, the inner loop will still loop the same number of times, because it doesn't evaluate alphabet.indices again. It would only do that again on the next iteration of the outer loop, but your code would throw an exception before that point.
Other than decreasing j whenever you remove an item, you can also solve this by
key.forEach(alphabet::remove)

get each number in String and Compare in TCL/tk

I have string output:
1 4 2 1 4
I want to get each character in string to compare.
I did it to want to know whether the list is sorted yet.
It's not exactly clear to me what you are trying to achieve. Going by "to know whether the list is sorted", and assuming a list of integers, you can use tcl::mathop::< or tcl::mathop::<=, depending on whether you want to allow duplicate values:
if {[tcl::mathop::<= {*}$list]} {
puts "List is sorted"
} else {
puts "List is mixed up"
}
This will also work for ASCII comparison of strings. For more complex comparisons, like using dictionary rules or case insensitive, it's probably easiest to combine that with lsort along with the -indices option:
tcl::mathop::< {*}[lsort -indices -dictionary $list]
The -indices option returns the original index of each list element in sorted order. By checking if those indices are in incremental order, you know if the original list was already sorted.
Of course, if the point of the exercise was to avoid unnecessary sorting, then this is no use. But then again, bubble sort of an already sorted list is very fast and will basically do exactly the comparisons you described. So just sorting will probably be faster than first checking for a sorted list via a scripted loop.
To get each character in the string, do split $the_string "" (yes, on the empty string). That gives you a list of all the characters in the string; you can use foreach to iterate over them. Remember, you can iterate over two (or more) lists at once:
foreach c1 [split $the_string ""] c2 $target_comparison_list {
if {$c1 ne $c2} {
puts "The first not equal character is “$c1” when “$c2” was expected"
break
}
}
Note that it's rarely useful to continue comparison after a difference is found as the most common differences are (relative to the target string) insertions and deletions; almost everything after either of those will differ.

Referencing nested arrays in awk

I'm creating a bunch of mappings that can be indexed into using 3 keys such as below:
mappings["foo"]["bar"]["blah"][1]=0
split( "10,13,19,49", mappings["foo"]["bar"]["blah"] )
I can then index into the nested array using for example
mappings[product][format][version][i]
But this is a bit long-winded when I need to refer to the same nested array several times, so in other languages I'd create a reference to the inner array:
map=mappings[product][format][version]
map[i]
However, I can't seem to get this to work in awk (gawk 4.1.3).
I can only find one link over google, that suggests this is impossible in previous versions of awk, and a loop setting the keys and values one-by-one is the only solution. Is this still the case or does anyone have a suggestions for a better solution?
https://developer.apple.com/library/archive/documentation/OpenSource/Conceptual/ShellScripting/Howawk-ward/Howawk-ward.html
EDIT
In response to comments a bit more background on what I'm trying to do. If there is a better approach, I'm all for using it!
I have set of CSV files that I'm feeding into AWK. The idea is to calculate a checksum based on specific columns after applying filtering to the rows.
The columns to checksum on, and the filtering to apply, are derivived from runtime parameters sent into the script.
The runtime parameters are a triple of (product,format,version), hence my use of a 3-nested assoicative array.
Another approach would be to use triple as a single key, rather than nesting, but gawk doesn't seem to natively support this, so I'd end-up concatenating the values as string. This felt a bit less structured to me, but if I'm wrong, happy to change my mind on this apporach.
Anyway, it is these parameters that are used to index into the array to structure to retrieve the column numbers, etc.
You can then build-up a tree-like structure, for example, the below shows 2 formats for product foo on version blah, and so on...:
mappings["product-foo"]["format-bar"]["version-blah"][1]=0
split( "10,13,19,49", mappings["product-foo"]["format-bar"]["version-blah"] )
mappings["product-foo"]["format-moo"]["version-blah"][1]=0
split( "55,23,14,6", mappings["product-foo"]["format-moo"]["version-blah"] )
The magic happens like this, you can see how long-winded the mappings indexing becomes without referencing:
(FNR>1 && (format!="some-format" ||
(version=="some-version" && $1=="some-filter") ||
(version=="some-other-version" && $8=="some-other-filter"))) {
# Loop over each supplied field summing an absolute tally for each
for (i=1; i <= length(mappings[product][format][version]); i++) {
sumarr[i] += ( $mappings[product][format][version][i] < 0 ? -$mappings[product][format][version][i]:$mappings[product][format][version][i] )
}
}
The comment from #ed-morton simplifies this as originally requested, but interested if their is a simpler approach.
The right answer is from #ed-morton above (thanks!).
Ed - if you write it out as an answer I'll accept it, otherwise I'll accept this quote in a few days for good housekeeping.
Right, there is no array copy functionality in awk and there are no pointers/references so you can't create a pointer to an array. You can of course create function map(i) { return mappings[product][format][version][i]}

objective c group array

I have array like this:
{
toNumber = +79995840405;
type = 9;
}
{
toNumber = +79995840405;
type = 65;
}
{
toNumber = +79995840405;
type = 9;
}
{
toNumber = +79995840405;
type = 65;
}
How can I group items by toNumber & type? thanks
You have provided little detail, which makes it hard for people to help you; and haven't shown what you have tried yourself and explained where you got stuck, which is the SO approach - people here will help you, not do the work for you.
The above is why you are getting close votes.
That said let's see if we can point you in the right direction, but understand this is based on guesswork about what you have and your problem.
So it sounds like you have an array (NSArray) of dictionaries (NSDictionary) and wish to produce a dictionary of arrays. A straightforward iteration can be used for that:
Create an empty result dictionary (NSMutableDictionary)
Iterate over your array looking at each element (foreach)
Using the type value of your element as the key value of your result dictionary:
3.1. If there is no entry in your result dictionary for the key create a new array (NSMutableArray), add the element's toNumber value to it, and add the array to your result dictionary.
3.2 Otherwise simply add to toNumber value to the existing array at the key entry of your result dictionary.
That's it, each bullet is a line or two of code.
If you get stuck as a new question, providing details, showing your code, and explaining what you problem is. Someone will undoubtedly help you from there.
HTH