What is the Rust equivalent of Rx's `startWith`? - iterator

Is there any predefined function to start iteration with a custom element using Rust's iterators?

iteration with a custom element
If you have a single element, use iter::once.
If you have multiple elements, use iter::repeat coupled with Iterator::take.
to start iteration with
Use Iterator::chain.
Put together:
use std::iter;
fn main() {
let some_iterator = 1..10;
let start_with = iter::repeat(42).take(5);
let together = start_with.chain(some_iterator);
for i in together {
println!("{}", i);
}
}

Related

Kotlin Map and Compute Functions

I am trying to re-compute values for all entries in a mutable map without changing the keys.
Here is the code (with comments showing what I am expecting)
fun main(args: Array<String>) {
var exampleMap: MutableMap<Int, String> = mutableMapOf(1 to "One", 2 to "Two")
// Before it's {1-"One", 2-"Two"}
with(exampleMap) {
compute(k){
_,v -> k.toString + "__" + v
}
}
// Expecting {1->"1__One", 2->"2__Two"}
}
I don't want to use mapValues because it creates a copy of the original map. I think compute is what I need, but I am struggling to write this function. Or, rather I don't know how to write this. I know the equivalent in Java (as I am a Java person) and it would be:
map.entrySet().stream().forEach(e-> map.compute(e.getKey(),(k,v)->(k+"__"+v)));
How can I achieve that in Kotlin with compute ?
Regards,
The best way to do this is by iterating over the map's entries. This ensures you avoid any potential concurrent modification issues that might arise due to modifying the map while iterating over it.
map.entries.forEach { entry ->
entry.setValue("${entry.key}__${entry.value}")
}
Use onEach function:
var exampleMap: MutableMap<Int, String> = mutableMapOf(1 to "One", 2 to "Two")
println(exampleMap)
exampleMap.onEach {
exampleMap[it.key] = "${it.key}__${it.value}"
}
println(exampleMap)
Output:
{1=One, 2=Two}
{1=1__One, 2=2__Two}
onEach performs the given action on each element and returns the list/array/map itself afterwards.

How to remove element from iterator if condition depends on property of the object the iterator is based on?

Let me elaborate:
I need to be able to iterate over a list of objects. Each of the objects has a property which is a list, and I have to check if that list contains any elements that are not in another list.
When I tried to do it by using nested for loops, it kept giving me concurrent modification exceptions, so I tried to use an iterator, but now I'm stuck, since if I make an iterator based on the list of objects, I can't access the individual object's properties to then iterate over.
Here's some example code of what I was trying to accomplish:
for (preference in preferencesWithRestaurant) {
for (restaurantID in preference.restaurantIDs) {
// One method I tried using
preferencesWithRestaurant.removeIf{ !listOfIds.contains(restaurantID) }
/* alternate method I tried using
if (!listOfIds.contains(restaurantID)) {
preferencesWithRestaurant.remove(preference)
}
*/
}
}
If you can replace the value of preferencesWithRestaurant or store the result in another variable then you can filter it:
preferencesWithRestaurant = preferencesWithRestaurant.filter { preference ->
preference.restaurantIDs.all { it in listOfIds }
}
Depending on the exact type of preferencesWithRestaurant you may need to convert it to the proper type, e.g. invoke toMutableList() at the end.
If you prefer to modify preferencesWithRestaurant in-place, then you can use retainAll() (thanks #Tenfour04):
preferencesWithRestaurant.retainAll { preference ->
preference.restaurantIDs.all { it in listOfIds }
}
Alternatively, you can keep your original approach, but use a mutable iterator to remove an item while iterating:
val iter = preferencesWithRestaurant.listIterator()
for (preference in iter) {
for (restaurantID in preference.restaurantIDs) {
if (!listOfIds.contains(restaurantID)) {
iter.remove()
break
}
}
}

What is the Rust equivalent of a C local static variable? [duplicate]

What is the best way to create and use a struct with only one instantiation in the system? Yes, this is necessary, it is the OpenGL subsystem, and making multiple copies of this and passing it around everywhere would add confusion, rather than relieve it.
The singleton needs to be as efficient as possible. It doesn't seem possible to store an arbitrary object on the static area, as it contains a Vec with a destructor. The second option is to store an (unsafe) pointer on the static area, pointing to a heap allocated singleton. What is the most convenient and safest way to do this, while keeping syntax terse?
Non-answer answer
Avoid global state in general. Instead, construct the object somewhere early (perhaps in main), then pass mutable references to that object into the places that need it. This will usually make your code easier to reason about and doesn't require as much bending over backwards.
Look hard at yourself in the mirror before deciding that you want global mutable variables. There are rare cases where it's useful, so that's why it's worth knowing how to do.
Still want to make one...?
Tips
In the 3 following solutions:
If you remove the Mutex then you have a global singleton without any mutability.
You can also use a RwLock instead of a Mutex to allow multiple concurrent readers.
Using lazy-static
The lazy-static crate can take away some of the drudgery of manually creating a singleton. Here is a global mutable vector:
use lazy_static::lazy_static; // 1.4.0
use std::sync::Mutex;
lazy_static! {
static ref ARRAY: Mutex<Vec<u8>> = Mutex::new(vec![]);
}
fn do_a_call() {
ARRAY.lock().unwrap().push(1);
}
fn main() {
do_a_call();
do_a_call();
do_a_call();
println!("called {}", ARRAY.lock().unwrap().len());
}
Using once_cell
The once_cell crate can take away some of the drudgery of manually creating a singleton. Here is a global mutable vector:
use once_cell::sync::Lazy; // 1.3.1
use std::sync::Mutex;
static ARRAY: Lazy<Mutex<Vec<u8>>> = Lazy::new(|| Mutex::new(vec![]));
fn do_a_call() {
ARRAY.lock().unwrap().push(1);
}
fn main() {
do_a_call();
do_a_call();
do_a_call();
println!("called {}", ARRAY.lock().unwrap().len());
}
Using std::sync::LazyLock
The standard library is in the process of adding once_cell's functionality, currently called LazyLock:
#![feature(once_cell)] // 1.67.0-nightly
use std::sync::{LazyLock, Mutex};
static ARRAY: LazyLock<Mutex<Vec<u8>>> = LazyLock::new(|| Mutex::new(vec![]));
fn do_a_call() {
ARRAY.lock().unwrap().push(1);
}
fn main() {
do_a_call();
do_a_call();
do_a_call();
println!("called {}", ARRAY.lock().unwrap().len());
}
A special case: atomics
If you only need to track an integer value, you can directly use an atomic:
use std::sync::atomic::{AtomicUsize, Ordering};
static CALL_COUNT: AtomicUsize = AtomicUsize::new(0);
fn do_a_call() {
CALL_COUNT.fetch_add(1, Ordering::SeqCst);
}
fn main() {
do_a_call();
do_a_call();
do_a_call();
println!("called {}", CALL_COUNT.load(Ordering::SeqCst));
}
Manual, dependency-free implementation
There are several existing implementation of statics, such as the Rust 1.0 implementation of stdin. This is the same idea adapted to modern Rust, such as the use of MaybeUninit to avoid allocations and unnecessary indirection. You should also look at the modern implementation of io::Lazy. I've commented inline with what each line does.
use std::sync::{Mutex, Once};
use std::time::Duration;
use std::{mem::MaybeUninit, thread};
struct SingletonReader {
// Since we will be used in many threads, we need to protect
// concurrent access
inner: Mutex<u8>,
}
fn singleton() -> &'static SingletonReader {
// Create an uninitialized static
static mut SINGLETON: MaybeUninit<SingletonReader> = MaybeUninit::uninit();
static ONCE: Once = Once::new();
unsafe {
ONCE.call_once(|| {
// Make it
let singleton = SingletonReader {
inner: Mutex::new(0),
};
// Store it to the static var, i.e. initialize it
SINGLETON.write(singleton);
});
// Now we give out a shared reference to the data, which is safe to use
// concurrently.
SINGLETON.assume_init_ref()
}
}
fn main() {
// Let's use the singleton in a few threads
let threads: Vec<_> = (0..10)
.map(|i| {
thread::spawn(move || {
thread::sleep(Duration::from_millis(i * 10));
let s = singleton();
let mut data = s.inner.lock().unwrap();
*data = i as u8;
})
})
.collect();
// And let's check the singleton every so often
for _ in 0u8..20 {
thread::sleep(Duration::from_millis(5));
let s = singleton();
let data = s.inner.lock().unwrap();
println!("It is: {}", *data);
}
for thread in threads.into_iter() {
thread.join().unwrap();
}
}
This prints out:
It is: 0
It is: 1
It is: 1
It is: 2
It is: 2
It is: 3
It is: 3
It is: 4
It is: 4
It is: 5
It is: 5
It is: 6
It is: 6
It is: 7
It is: 7
It is: 8
It is: 8
It is: 9
It is: 9
It is: 9
This code compiles with Rust 1.55.0.
All of this work is what lazy-static or once_cell do for you.
The meaning of "global"
Please note that you can still use normal Rust scoping and module-level privacy to control access to a static or lazy_static variable. This means that you can declare it in a module or even inside of a function and it won't be accessible outside of that module / function. This is good for controlling access:
use lazy_static::lazy_static; // 1.2.0
fn only_here() {
lazy_static! {
static ref NAME: String = String::from("hello, world!");
}
println!("{}", &*NAME);
}
fn not_here() {
println!("{}", &*NAME);
}
error[E0425]: cannot find value `NAME` in this scope
--> src/lib.rs:12:22
|
12 | println!("{}", &*NAME);
| ^^^^ not found in this scope
However, the variable is still global in that there's one instance of it that exists across the entire program.
Starting with Rust 1.63, it can be easier to work with global mutable singletons, although it's still preferable to avoid global variables in most cases.
Now that Mutex::new is const, you can use global static Mutex locks without needing lazy initialization:
use std::sync::Mutex;
static GLOBAL_DATA: Mutex<Vec<i32>> = Mutex::new(Vec::new());
fn main() {
GLOBAL_DATA.lock().unwrap().push(42);
println!("{:?}", GLOBAL_DATA.lock().unwrap());
}
Note that this also depends on the fact that Vec::new is const. If you need to use non-const functions to set up your singleton, you could wrap your data in an Option, and initially set it to None. This lets you use data structures like Hashset which currently cannot be used in a const context:
use std::sync::Mutex;
use std::collections::HashSet;
static GLOBAL_DATA: Mutex<Option<HashSet<i32>>> = Mutex::new(None);
fn main() {
*GLOBAL_DATA.lock().unwrap() = Some(HashSet::from([42]));
println!("V2: {:?}", GLOBAL_DATA.lock().unwrap());
}
Alternatively, you could use an RwLock, instead of a Mutex, since RwLock::new is also const as of Rust 1.63. This would make it possible to read the data from multiple threads simultaneously.
If you need to initialize with non-const functions and you'd prefer not to use an Option, you could use a crate like once_cell or lazy-static for lazy initialization as explained in Shepmaster's answer.
From What Not To Do In Rust
To recap: instead of using interior mutability where an object changes
its internal state, consider using a pattern where you promote new
state to be current and current consumers of the old state will
continue to hold on to it by putting an Arc into an RwLock.
use std::sync::{Arc, RwLock};
#[derive(Default)]
struct Config {
pub debug_mode: bool,
}
impl Config {
pub fn current() -> Arc<Config> {
CURRENT_CONFIG.with(|c| c.read().unwrap().clone())
}
pub fn make_current(self) {
CURRENT_CONFIG.with(|c| *c.write().unwrap() = Arc::new(self))
}
}
thread_local! {
static CURRENT_CONFIG: RwLock<Arc<Config>> = RwLock::new(Default::default());
}
fn main() {
Config { debug_mode: true }.make_current();
if Config::current().debug_mode {
// do something
}
}
Use SpinLock for global access.
#[derive(Default)]
struct ThreadRegistry {
pub enabled_for_new_threads: bool,
threads: Option<HashMap<u32, *const Tls>>,
}
impl ThreadRegistry {
fn threads(&mut self) -> &mut HashMap<u32, *const Tls> {
self.threads.get_or_insert_with(HashMap::new)
}
}
static THREAD_REGISTRY: SpinLock<ThreadRegistry> = SpinLock::new(Default::default());
fn func_1() {
let thread_registry = THREAD_REGISTRY.lock(); // Immutable access
if thread_registry.enabled_for_new_threads {
}
}
fn func_2() {
let mut thread_registry = THREAD_REGISTRY.lock(); // Mutable access
thread_registry.threads().insert(
// ...
);
}
If you want mutable state(NOT Singleton), see What Not to Do in Rust for more descriptions.
Hope it's helpful.
If you are on nightly, you can use LazyLock.
It more or less does what the crates once_cell and lazy_sync do. Those two crates are very common, so there's a good chance they might already by in your Cargo.lock dependency tree. But if you prefer to be a bit more "adventurous" and go with LazyLock, be prepered that it (as everything in nightly) might be a subject to change before it gets to stable.
(Note: Up until recently std::sync::LazyLock used to be named std::lazy::SyncLazy but was recently renamed.)
A bit late to the party, but here's how I worked around this issue (rust 1.66-nightly):
#![feature(const_size_of_val)]
#![feature(const_ptr_write)]
static mut GLOBAL_LAZY_MUT: StructThatIsNotSyncNorSend = unsafe {
// Copied from MaybeUninit::zeroed() with minor modifications, see below
let mut u = MaybeUninit::uninit();
let bytes = mem::size_of_val(&u);
write_bytes(u.as_ptr() as *const u8 as *mut u8, 0xA5, bytes); //Trick the compiler check that verifies pointers and references are not null.
u.assume_init()
};
(...)
fn main() {
unsafe {
let mut v = StructThatIsNotSyncNorSend::new();
mem::swap(&mut GLOBAL_LAZY_MUT, &mut v);
mem::forget(v);
}
}
Beware that this code is unbelievably unsafe, and can easily end up being UB if not handled correctly.
You now have a !Send !Sync value as a global static, without the protection of a Mutex. If you access it from multiple threads, even if just for reading, it's UB. If you don't initialize it the way shown, it's UB, because it calls Drop on an actually unitialized value.
You just convinced the rust compiler that something that is UB is not UB. You just convinced that putting a !Sync and !Send in a global static is fine.
If unsure, don't use this snippet.
My limited solution is to define a struct instead of a global mutable one. To use that struct, external code needs to call init() but we disallow calling init() more than once by using an AtomicBoolean (for multithreading usage).
static INITIATED: AtomicBool = AtomicBool::new(false);
struct Singleton {
...
}
impl Singleton {
pub fn init() -> Self {
if INITIATED.load(Ordering::Relaxed) {
panic!("Cannot initiate more than once")
} else {
INITIATED.store(true, Ordering::Relaxed);
Singleton {
...
}
}
}
}
fn main() {
let singleton = Singleton::init();
// panic here
// let another_one = Singleton::init();
...
}

Iterating over a slice's values instead of references in Rust?

When looping over a slice of structs, the value I get is a reference (which is fine), however in some cases it's annoying to have to write var as (*var) in many places.
Is there a better way to avoid re-declaring the variable?
fn my_fn(slice: &[MyStruct]) {
for var in slice {
let var = *var; // <-- how to avoid this?
// Without the line above, errors in comments occur:
other_fn(var); // <-- expected struct `MyStruct`, found reference
if var != var.other {
// ^^ trait `&MyStruct: std::cmp::PartialEq<MyStruct>>` not satisfied
foo();
}
}
}
See: actual error output (more cryptic).
You can remove the reference by destructuring in the pattern:
// |
// v
for &var in slice {
other_fn(var);
}
However, this only works for Copy-types! If you have a type that doesn't implement Copy but does implement Clone, you could use the cloned() iterator adapter; see Chris Emerson's answer for more information.
In some cases you can iterate directly on values if you can consume the iterable, e.g. using Vec::into_iter().
With slices, you can use cloned or copied on the iterator:
fn main() {
let v = vec![1, 2, 3];
let slice = &v[..];
for u in slice.iter().cloned() {
let u: usize = u; // prove it's really usize, not &usize
println!("{}", u);
}
}
This relies on the item implementing Clone or Copy, but if it doesn't you probably do want references after all.

Does Rust optimize for loops over calculated ranges?

As an exercise I'm trying to micro-optimize code in Rust 1.3.0. I have a loop of a loop over an array. Something like this:
loop {
for i in 0..arr.len() {
// something happens here
}
}
Since arrays are fixed size in Rust, will the compiler optimize the code by evaluating arr.len() just once and reusing the value, or will the expression be evaluated with each pass of the top-level loop? The question can be expanded to more calculation-heavy functions without side-effects, other than arr.len().
In other words, would the above code be equivalent to this:
let arr_len = arr.len();
loop {
for i in 0..arr_len {
// something happens here
}
}
The .. is a range operator, which forms a Range<Idx> object (or a derivative: RangeFrom, RangeFull or RangeTo). Those objects only contain indexes (the Idx type), so you can rest assured that .len() is only evaluated once.
In general, it is a good idea to inspect the LLVM IR. If you have a synthetic example, you can use the playground easily enough. For example:
// A black-box prevents optimization, and its calls are easy to spot.
extern {
fn doit(i: i32) -> ();
}
fn main() {
let arr = [1, 2, 3, 4, 5];
for i in 0..arr.len() {
unsafe { doit(arr[i]); }
}
}
Yields the following function:
; Function Attrs: uwtable
define internal void #_ZN4main20hd87dea49c835fe43laaE() unnamed_addr #1 {
entry-block:
tail call void #doit(i32 1)
tail call void #doit(i32 2)
tail call void #doit(i32 3)
tail call void #doit(i32 4)
tail call void #doit(i32 5)
ret void
}
In this case, with a fixed length, there is no loop at all: it has been unrolled.
At least in a quick check with using arr.len() nested inside another loop, no code seems to be generated for the "call" to arr.len() at all. In the generated code, the size of the array is simply hard-coded into the output.
In other words, I would not expect your second snippet to execute any faster than the first one.