This question already has an answer here:
How to modify/partially remove a range from a BTreeMap?
(1 answer)
Closed 6 years ago.
I'm trying to translate some simple data structures I use in C++ over to Rust, starting with an interval tree, but I don't understand how to modify my underlying data structure (here an std::collections::BTreeSet) during iteration - essentially so I can merge overlapping entries as they appear.
If I use the standard idiom for iterating over a collection, I get the following message about it being immutable "cannot borrow self.storage as mutable because it is also borrowed as immutable", and there doesn't appear to be an option to get a mutable iterator that I can see ... what am I missing?
C++ code:
inline void Insert(const Interval& interval)
{
auto it = storage.insert(interval);
// check to see if we overlap the previous element,
// if we do, start our merge loop from there
if (it != begin()) {
const_iterator prev = std::prev(it);
if (prev->Overlaps(*it)) it = prev;
}
while (it != end()) {
const_iterator nx = std::next(it);
if (nx != end() && it->Overlaps(*nx)) {
const Interval u = it->Union(*nx);
it = storage.erase(it);
it = storage.erase(it);
it = storage.insert(it, u);
} else
break;
}
}
Rust code:
/// Add a new interval into the tree
pub fn insert(&mut self, other: Interval) -> () {
self.storage.insert(other);
for int in self.storage.iter() {
if other <= *int {
break
} else if other.overlaps(int) {
self.storage.remove(&other);
self.storage.remove(int);
self.storage.insert(other.union(int).unwrap());
}
}
}
You cannot mutate a BTreeSet while you're iterating on it – that would invalidate the iterator. Unfortunately, unlike C++, Rust doesn't have insert or remove methods that return updated iterators (and if it did, they would have to be methods on the iterator itself).
BTreeSet doesn't offer a mutable iterator, because the only additional operation you could do is obtain a mutable reference to the elements in the set. However, doing this could potentially screw up the set's ordering, so it's not available.
The most straightforward solution is to build a list of operations to perform during the iteration, then perform them once the iteration is complete. However, for this algorithm, this won't quite work, since you might need to merge an interval that is the result of a previous merge. So, once you've found a pair of intervals to merge, you need to keep track of the relevant values, break out of the iteration, perform the merge, then restart the iteration. BTreeSet provides a range method that lets you iterate over a subset of the set's values, so you might want to use that instead of iter, which always iterates over all the values. However, range is unstable as of Rust 1.8, so you'll need a nightly compiler to be able to use it.
Related
in short, the question is, I want to delete the last element of the DOUBLY LINKED LIST and I wrote this function...
fun deleteLast(){
if(isEmpty()){println("list is empty")}
if (head!=null && head?.nextNode == null){
head = null
}else {
tail = tail?.previousNode
tail?.nextNode = null
}
}
I went to the tail without traversing it and made its previous as new tail and new tail next(old tail) as null. And my result is desirable.
after writing this I pinged Google to check if this is right so, I found this...
void pop_back() {
if(this.head != null) {
//1. if head in not null and next of head
// is null, release the head
if(this.head.next == null) {
this.head = null;
} else {
//2. Else, traverse to the second last
// element of the list
Node temp = new Node();
temp = this.head;
while(temp.next.next != null)
temp = temp.next;
//3. Change the next of the second
// last node to null and delete the
// last node
Node lastNode = temp.next;
temp.next = null;
lastNode = null;
}
}
}
I want to ask If my code is right I just want you masters to review it. Thanks in advance!
The question only contains a snippet, not a minimal, reproducible example, so for the sake of argument (based on existing code and the comment) I'm going to assume that the full code looks something like this:
class Node<T>(var previousNode: Node<T>?, var nextNode: Node<T>?, var value: T)
class DoublyLinkedList<T>(var head: Node<T>?, var tail: Node<T>?) {
fun isEmpty() = head == null
// …other methods…
fun deleteLast(){
if(isEmpty()){println("list is empty")}
if (head!=null && head?.nextNode == null){
head = null
}else {
tail = tail?.previousNode
tail?.nextNode = null
}
}
}
Given this, it looks like your code is about right. There are a few points that could be addressed*, e.g.:
Inside the second if I think it needs to set tail to null as well, for safety.
Inside the second and third branches, you might want to set to null the previousNode and nextNode references in the node you're removing, for the benefit of any other code that still has a reference to it. (That might also give an extra hint to the garbage collector, though that shouldn't be needed.)
Instead of calling isEmpty(), it'd be simpler and more consistent to check whether head is null.
I'd add an else before the second if. As it is, the code works for an empty list, but that may be just luck; it'd be much more robust if neither of the other branches could be called on an empty list. (In some cases, this could let the compiler smart-cast head to non-nullable; but that doesn't happen here because it's mutable.)
In production code, you wouldn't print to stdout like that. If it's allowed to call deleteLast() on an empty list, then there's no need to print anything; if not (which would be more usual), you'd throw an exception. (And the method would have a doc comment explaining that.)
The formatting could be improved.
(* In my experience, any code can be improved — even code you've worked on many times before!)
However, you can't compare your code directly to the Google-sourced code, because it looks like that is for a singly-linked list.
A singly-linked list is a much simpler structure. All it has is a reference to the head node; that has a reference to the next node (and, presumably to the data stored in the node); and so on. So the only way to reach the end of the list is to traverse all the nodes, as the Google-sourced code does. (This also means there's often no point in having a separate class to represent the list as a whole; a reference to the first node is all you need. Many operations can be written concisely with recursive methods. You can also make it immutable, which has many advantages; that's the primary data structure in some languages.)
Your list, though, has a direct reference to the last node, which of course avoids the need to traverse the entire list to find it. As your code demonstrates, this means that some operations are much more efficient in double-linked lists. There are corresponding disadvantages, though: each node takes more memory; there are more references to update when changing the list; it's easier for the list to get into an inconsistent state; and it's less suitable for an immutable list.)
Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
What I though about is creating Array of two Cro::HTTP::Client objects (with base-uri specific to each repo), and when I need to make HTTP request I call #a>>.get, then process the result from the repos together.
I have attached a snippet of what I'm trying to do. But I would like to see if there is a better way to do that. or if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
use Cro::HTTP::Client;
class Repo {
has $.name;
has Cro::HTTP::Client $!client;
has Cro::Uri $.uri;
has Bool $.disable = False;
submethod TWEAK () {
$!client = Cro::HTTP::Client.new(base-uri => $!uri, :json);
}
method get (:$package) {
my $path = <x86_64?>;
my $resp = await $!client.get($path ~ $package);
my $json = await $resp.body;
return $json;
}
}
class AllRepos {
has Repo #.repo;
method get (:$package) {
# check if some repos are disabled
my #candidate = #!repo>>.get(:$package).unique(:with(&[eqv])).flat;
# do furthre processign of the data then return it;
return #candidate;
}
}
my $repo1 = Repo.new: name => 'repo1', uri => Cro::Uri.new(:uri<http://localhost:80>);
my $repo2 = Repo.new: name => 'repo2', uri => Cro::Uri.new(:uri<http://localhost:77>);
my #repo = $repo1, $repo2;
my $repos = AllRepos.new: :#repo;
#my #packages = $repos.get: package => 'rakudo';
Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
The code you showed looks like one good way in principle but not, currently, in practice.
The hyperoperators such as >>:
Distribute an operation (in your case, connect and make a query) ...
... to the leaves of one or two input composite data structures (in your case the elements of one array #!repo) ...
... with logically parallel semantics (by using a hyperoperator you are declaring that you are taking responsibility for thinking that the parallel invocations of the operation will not interfere with each other, which sounds reasonable for connecting and querying) ...
... and then return a resulting composite data structure with the same shape as the original structure if the hyperoperator is a unary operator (which applies in your case, because you applied >>, which is an unary operator which takes a single argument on its left, so the result of the >>.get is just a new array, just like the input #!repo) or whose shape is the hyper'd combination of the shapes of the pair of structures if the hyperoperator is a binary operator, such as >>op<< ...
... which can then be further processed (in your case it is, with .unique, which will produce a resulting Seq) ...
... whose elements you then assign back into another array (#candidate).
So your choice is a decent fit in principle, but the commitment to parallelism is only semantic and right now the Rakudo compiler never takes advantage of it, so it will actually run your code sequentially, which presumably isn't a good fit in practice.
Instead I suggest you consider:
Using map to distribute an operation over multiple elements (in a shallow manner; map doesn't recursively descend into a deep structure like the hyperoperators, deepmap etc., but that's OK for your use case) ...
... in combination with the race method which parallelizes the method it proceeds.
So you might write:
my #candidate =
#!repo.hyper.map(*.get: :$package).unique(:with(&[eqv])).flat;
Alternatively, check out task 94 in Using Perl 6.
if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
I don't think so. That's about constructing a general purpose container that's like an array but with some differences to the built in Array that are worth baking into a new type.
I can just about imagine such things that are vaguely related to your use case -- eg an array type that automatically hyper distributes method calls invoked on it, if they're defined on Any or Mu (rather than Array or List), i.e. does what I described above but with the code #!repo.get... instead of hyper #!repo.map: *.get .... But would it be worth it (assuming it would work -- I haven't thought about it beyond inventing the idea for this answer)? I doubt it.
More generally...
It seems like what you are looking for is cookbook like material. Perhaps a question posted at the reddit sub /r/perl6 is in order?
Using kotlin I can repeat an action in at least two ways:
val times = 5
// First option
for (i in 0 until times) {
print("Action $i")
}
// Second option
repeat(times) {
print("Action $it")
}
I'd like to know the purpose of repeat.
Should the traditional for loop be replaced with repeat function if possible?
Or are there special cases for this function?
Are there any advantages in repeat function?
EDIT
I've made some research about this question. As long as kotlin is open source project, I could download the sources and check git history.
I found that
1) repeat function is a replace for times function extension.
public inline fun Int.times(body : () -> Unit)
2) KT-7074. times function has become deprecated. But why?
It's just a matter of convenience (shortens the code). There are even more ways for example using an IntRange and forEach
(0..4).forEach {
println(it)
}
0 1 2 3 4
They all serve the same purpose, so the choice is yours.
You don't need to worry about performance either, since repeat and forEach are inline functions, which means the lambda code is copied over to the call site at compile time.
Next lines are all just my opinion:
there are no special cases when you should or shouldn't use repeat
function.
it has more concise syntax.
In places where you don't need to manipulate the loop counter or need to repeat only some simple action I would use that function.
It's all up to you to decide when and how to use it.
From Standard.kt:
/**
* Executes the given function [action] specified number of [times].
*
* A zero-based index of current iteration is passed as a parameter to [action].
*
* #sample samples.misc.ControlFlow.repeat
*/
#kotlin.internal.InlineOnly
public inline fun repeat(times: Int, action: (Int) -> Unit) {
contract { callsInPlace(action) }
for (index in 0 until times) {
action(index)
}
}
As you can see repeat(times) is actually for (index in 0 until times).
There is also a zero-based loop counter and it is: it.
Should the traditional for loop be replaced with repeat function if
possible?
I can't find any reason for that
Or are there special cases for this function?
None I can think of.
Are there any advantages in repeat function?
None I can think of, or maybe(?) just 1:
for educational purposes, I suppose it's easier to teach
that repeat(n) { } performs n iterations of the block of statements inside the curly brackets.
It is quite clear to me that iterating over a vector shouldn't let the loop body mutate the vector arbitrarily. This prevents iterator invalidation, which is prone to bugs.
However, not all kinds of mutation lead to iterator invalidation. See the following example:
let mut my_vec: Vec<Vec<i32>> = vec![vec![1,2], vec![3,4], vec![5,6]];
for inner in my_vec.iter_mut() { // <- or .iter()
// ...
my_vec[some_index].push(inner[0]); // <-- ERROR
}
Such a mutation does not invalidate the iterator of my_vec, however it is disallowed. It could invalidate any references to the specific elements in my_vec[some_index] but we do not use any such references anyway.
I know that these questions are common, and I'm not asking for an explanation. I am looking for a way to refactor this so that I can get rid of this loop. In my actual code I have a huge loop body and I can't modularize it unless I express this bit nicely.
What I have thought of so far:
Wrapping the vector with Rc<RefCell<...>>. I think this would still fail at runtime, since the RefCell would be borrowed by the iterator and then will fail when the loop body tries to borrow it.
Using a temporary vector to accumulate the future pushes, and push them after the loop ends. This is okay, but needs more allocations than pushing them on the fly.
Unsafe code, and messing with pointers.
Anything listed in the Iterator documentation does not help. I checked out itertools and it looks like it wouldn't help either.
Using a while loop and indexing instead of using an iterator making use of a reference to the outer vector. This is okay, but does not let me use iterators and adapters. I just want to get rid of this outer loop and use my_vec.foreach(...).
Are there any idioms or any libraries which would let me do this nicely Unsafe functions would be okay as long as they don't expose pointers to me.
You can wrap each of the inner vectors in a RefCell.
use std::cell::RefCell;
fn main() {
let my_vec : Vec<RefCell<Vec<i32>>> = vec![
RefCell::new(vec![1,2]),
RefCell::new(vec![3,4]),
RefCell::new(vec![5,6])];
for inner in my_vec.iter() {
// ...
let value = inner.borrow()[0];
my_vec[some_index].borrow_mut().push(value);
}
}
Note that the value binding here is important if you need to be able to push to the vector that inner refers to. value happens to be a type that doesn't contain references (it's i32), so it doesn't keep the first borrow active (it ends by the end of the statement). Then, the next statement may borrow the same vector or another vector mutably and it'll work.
If we wrote my_vec[some_index].borrow_mut().push(inner.borrow()[0]); instead, then both borrows would be active until the end of the statement. If both my_vec[some_index] and inner refer to the same RefCell<Vec<i32>>, this will panic with RefCell<T> already mutably borrowed.
Without changing the type of my_vec, you could simply use access by indexing and split_at_mut:
for index in 0..my_vec.len() {
let (first, second) = my_vec.split_at_mut(index);
first[some_index].push(second[0]);
}
Note: beware, the indices in second are off by index.
This is safe, relatively easy, and very flexible. It does not, however, work with iterator adaptors.
Consider following code:
enum size = 16;
double[size] arr1 = [...];
double[size] arr2 = [...];
process = (double x) { return (x + 1); };
arr2[] = map!(process)(arr1[]); // here
I have trouble converting results of map back to my plain array. Problem applies not only to map, but also to take, repeat and all those fine tools from std.algorithm and std.range that operate on ranges.
On this assignment, I get Error: cannot implicitly convert expression (map(arr1[])) of type Result to double[]. How can I evaluate range to array without using
uint i = 0;
foreach (x; map!(process)(arr1[])) {
arr2[i] = x;
i++;
}
?
Additionally, can someone please explain, why I must call map!(process)(arr1[]) instead of map!(process)(arr1) with static arrays? Shouldn't static arrays be compatible with dynamic for means of iteration, or I don't get something?
Also, it seems that straightforward enumeration syntax foreach (index, item; sequence) does not work for ranges - are there workarounds? I guess the reason is the same as why ranges cannot be assigned to array slices.
Functions such as map and filter return ranges, not arrays, so simply assigning to an array isn't going to work any more than assigning a string to wstring is going to work. They're different types. And for many range-based functions (including map and filter), the ranges they return are actually lazy in order to avoid unnecessary computation, which makes them that much less compatible with an array. The solution is to use std.array.array, which takes a range and creates a dynamic array from it. So, you could do
auto arr = array(map!process(origArray));
However, I would advise not converting a range into an array before you actually need to, since it can result in unnecessary computations, and it means allocating a new array. If you actually need an array, then by all means, use std.array.array to convert the range, but operating on the range can often be more efficient if you don't need an actual array. However, if you want to convert the result to a static array as opposed to a dynamic one, you're probably better off just assigning each element in a loop (and maybe skipping map altogether), since using std.array.array will then have allocated a dynamic array that you won't be using once you've assigned to the static array. It's a waste of memory.
Also, be aware that using static arrays with range-based functions can be risky in that they must slice the static array to get a dynamic array for the range-based functions to process, and if that dynamic array escapes the scope that the static array was declared in, then you're leaking references to data which no longer exists. For example,
auto func()
{
int[5] arr;
return map!process(arr[]);
}
would be very bad. However, as long as you're done using the slice and nothing refers to it anymore (including any ranges that might have been created) before you exit the scope with the static array in it, you should be fine. It is something to be careful of though.
As for the question about having to slice static arrays, you really should ask that as a separate question, but two existing questions that relate to it are this one and this one. What it pretty much comes down to is that IFTI (Implicit Function Template Instantiation) instantiates using the exact type that it's given, and a static array is neither a dynamic array nor a range, so any templated function which specifically requires a dynamic array or a range will fail to compile with a static array. The compiler will implicitly slice static arrays to convert them to dynamic arrays for functions which explicitly take dynamic arrays, but those sort of implicit conversions don't happen with template instantiation, so you must explicitly slice static arrays to pass them to range-based functions.
As for the question about using foreach with indices and ranges, again, you shouldn't be asking multiple questions in the same question. Please post separate questions for each question that you have. What it comes down to though is that
foreach(elem; range)
{
//stuff
}
gets lowered to something close to
for(; !range.empty; range.popFront())
{
auto elem = range.front;
//stuff
}
And that doesn't involve indices at all. It could be change to create an index variable for you, but it doesn't always make sense for ranges to have their index iterating by one like that on every iteration (much as it usually would be fine), and so that hasn't been done. It's simple enough to add your own counter though.
{
size_t i;
foreach(elem; range)
{
//stuff
++i;
}
}
opApply does support using indices with foreach, but it isn't a range, and doesn't work with range-based functions.