Generate a sorted map in Rust - iterator

I am looking for a way to generate a sorted map, having the keys sorted by multiple values.
Something like this:
let mut f = File::open("storing_data_like_a_pro.csv")?;
let mut buffer = String::new();
f.read_to_string(&mut buffer)?;
let sorted_cache_map: BTreeMap<String, FancyParsedData> = buffer.lines()
.map(|line| FancyParsedData::new(line)
.map(|data| (data.key, data))
// .sortBy(|key, value| value.foo, ASCENDING)
// .thenSortBy(|key, value| value.bar, DESCENDING)...
.collect();
When I know the key I would get a snappy response. If I would search like one or multiple values, I want them to be ordered (useful for pagination).

Related

How do I partition on saving parquet files with polars using Rust?

So I was able to save some struct to Parquet file using the following code:
pub fn store(data: Vec<Data>) -> () {
let json = serde_json::to_string(&data).unwrap();
let cursor = Cursor::new(json);
let mut df: DataFrame = JsonReader::new(cursor)
.finish()
.unwrap();
let mut buffer = File::create("foo.txt").unwrap();
let pw = ParquetWriter::new(buffer);
pw.finish(&mut df);
println!("{:?}", df)
}
I was wondering if there is a way to partition based on columns? This feature is supported in pandas, you can find the documentation here where it is possible to pass partition_cols when calling the to_parquet function
Is this feature supported in polars?

How to access row elements in a polars LazyFrame/DataFrame

I struggle accessing the row-elements of a Frame.
One idea I have is to filter the dataframe down to a row, convert it to a vec or something similar and access the elements this way ?!
In Panadas I used to just use ".at / .loc / .iloc / etc."; with Polars in Rust I have no clue.
Any suggestions on what the proper way to do this is ?
Thanks to #isaactfa ... he got me onto the right track. I ended up getting the row not with "get_row" but rather with "get" ... this is probably due to my little RUST understanding (my 2nd week).
Here is a working code sample:
use polars::export::arrow::temporal_conversions::date32_to_date;
use polars::prelude::*;
fn main() -> Result<()> {
let days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])?;
let options = StrpTimeOptions {
date_dtype: DataType::Date, // the result column-datatype
fmt: Some("%Y-%m-%d".into()), // the source format of the date-string
strict: false,
exact: true,
};
// convert date_string into dtype(date) and put into new column "date_type"
// we convert the days DataFrame to a LazyFrame ...
// because in my real-world example I am getting a LazyFrame
let mut new_days_lf = days.lazy().with_column(
col("date_string")
.alias("date_type")
.str()
.strptime(options),
);
// Getting the weekday as a number:
// This is what I wanted to do ... but I get a string result .. need u32
// let o = GetOutput::from_type(DataType::Date);
// new_days_lf = new_days_lf.with_column(
// col("date_type")
// .alias("weekday_number")
// .map(|x| Ok(x.strftime("%w").unwrap()), o.clone()),
// );
// This is the convoluted workaround for getting the weekday as a number
let o = GetOutput::from_type(DataType::Date);
new_days_lf = new_days_lf.with_column(col("date_type").alias("weekday_number").map(
|x| {
Ok(x.date()
.unwrap()
.clone()
.into_iter()
.map(|opt_name: Option<i32>| {
opt_name.map(|datum: i32| {
// println!("{:?}", datum);
date32_to_date(datum)
.format("%w")
.to_string()
.parse::<u32>()
.unwrap()
})
})
.collect::<UInt32Chunked>()
.into_series())
},
o,
));
new_days_lf = new_days_lf.with_column(
col("weekday_number")
.shift_and_fill(-1, 9999)
.alias("next_weekday_number"),
);
// now we convert the LazyFrame into a normal DataFrame for further processing:
let mut new_days_df = new_days_lf.collect()?;
// convert the column to a series
// to get a column by name we need to collect the LazyFrame into a normal DataFrame
let col1 = new_days_df.column("weekday_number")?;
// convert the column to a series
let col2 = new_days_df.column("next_weekday_number")?;
// now I can use series-arithmetics
let diff = col2 - col1;
// create a bool column based on "element == 2"
// add bool column to DataFrame
new_days_df.replace_or_add("weekday diff eq(2)", diff.equal(2)?.into_series());
// could not figure out how to filter the eager frame ...
let result = new_days_df
.lazy()
.filter(col("weekday diff eq(2)").eq(true))
.collect()
.unwrap();
// could not figure out how to access ROW elements
// thus I used "get" instead af of "get_row"
// getting the date where diff is == 2 (true)
let filtered_row = result.get(0).unwrap();
// within the filtered_row get element with an index
let date = filtered_row.get(0).unwrap();
println!("\n{:?}", date);
Ok(())
}

Combining Two List in Kotlin with Index

There is a data class as fruits.
data class Fruits(
val code: String, //Unique
val name: String
)
The base list indexed items with boolean variable is as below.
val indexList: MutableList<Boolean> = MutableList(baseFruitList.size) { false }
Now the Favourite Indexed list is as below
val favList: MutableList<Boolean> = MutableList(favFruitList.size) { true}
I want a combined full list which basically has the fav item indicated as true.
Ex:
baseFruitList = {[FT1,apple],[FT2,grapes],[FT3,banana],[FT4,mango],[FT5,pears]}
favList = {[FT2,grapes],[FT4,mango]}
The final index list should have
finalIndexed = {false,true,false,true,false}
How can we achieve in Kotlin, without iterating through each element.
You can do
val finalIndexed = baseFruitList.map { it in favList }
assuming, like #Tenfour04 is asking, that name is guaranteed to be a specific value (including matching case) for a specific code (since that combination is how a data class matches another, e.g. for checking if it's in another list)
If you can't guarantee that, this is safer:
val finalIndexed = baseFruitList.map { fruit ->
favList.any { fav.code == fruit.code }
}
but here you have to iterate over all the favs (at least until you find a match) looking to see if one has the code.
But really, if code is the unique identifier here, why not just store those in your favList?
favList = listOf("FT2", "FT4") // or a Set would be more efficient, and more correct!
val finalIndexed = baseFruitList.map { it.code in favList }
I don't know what you mean about "without iterating through each element" - if you mean without an explicit indexed for loop, then you can use these simple functions like I have here. But there's always some amount of iteration involved. Sets are always an option to help you minimise that

How to get size of specfic value inside array Kotlin

here is example of the list. I want to make dynamic where maybe the the value will become more.
val list = arrayListOf("A", "B", "C", "A", "A", "B") //Maybe they will be more
I want the output like:-
val result = list[i] + " size: " + list[i].size
So the output will display every String with the size.
A size: 3
B size: 2
C size: 1
If I add more value, so the result will increase also.
You can use groupBy in this way:
val result = list.groupBy { it }.map { it.key to it.value.size }.toMap()
Jeoffrey's way is better actually, since he is using .mapValues() directly, instead of an extra call to .toMap(). I'm just leaving this answer her since
I believe that the other info I put is relevant.
This will give a Map<String, Int>, where the Int is the count of the occurences.
This result will not change when you change the original list. That is not how the language works. If you want something like that, you'd need quite a bit of work, like overwriting the add function from your collection to refresh the result map.
Also, I see no reason for you to use an ArrayList, especially since you are expecting to increase the size of that collection, I'd stick with MutableList if I were you.
I think the terminology you're looking for is "frequency" here: the number of times an element appears in a list.
You can usually count elements in a list using the count method like this:
val numberOfAs = list.count { it == "A" }
This approach is pretty inefficient if you need to count all elements though, in which case you can create a map of frequencies the following way:
val freqs = list.groupBy { it }.mapValues { (_, g) -> g.size }
freqs here will be a Map where each key is a unique element from the original list, and the value is the corresponding frequency of that element in the list.
This works by first grouping elements that are equal to each other via groupBy, which returns a Map<String, List<String>> where each key is a unique element from the original list, and each value is the group of all elements in the list that were equal to the key.
Then mapValues will transform that map so that the values are the sizes of the groups instead of the groups themselves.
An improved approach, as suggested by #broot is to make use of Kotlin's Grouping class which has a built-in eachCount method:
val freqs = list.groupingBy { it }.eachCount()

difference between two lists that include duplicates

I have a problem with two lists which contain duplicates
a = [1,1,2,3,4,4]
b = [1,2,3,4]
I would like to be able to extract the differences between the two lists ie.
c = [1,4]
but if I do c = a-b I get c =[]
It should be trivial but I can't find out :(
I tried also to parse the biggest list and remove items from it when I find them in the smallest list but I can't update lists on the fly, it does not work either
has anyone got an idea ?
thanks
You see an empty c as a result, because removing e.g. 1 removes all elements that are equal 1.
groovy:000> [1,1,1,1,1,2] - 1
===> [2]
What you need instead is to remove each occurrence of specific value separately. For that, you can use Groovy's Collection.removeElement(n) that removes a single element that matches the value. You can do it in a regular for-loop manner, or you can use another Groovy's collection method, e.g. inject to reduce a copy of a by removing each occurrence separately.
def c = b.inject([*a]) { acc, val -> acc.removeElement(val); acc }
assert c == [1,4]
Keep in mind, that inject method receives a copy of the a list (expression [*a] creates a new list from the a list elements.) Otherwise, acc.removeElement() would modify an existing a list. The inject method is an equivalent of a popular reduce or fold operation. Each iteration from this example could be visualized as:
--inject starts--
acc = [1,1,2,3,4,4]; val = 1; acc.removeElement(1) -> return [1,2,3,4,4]
acc = [1,2,3,4,4]; val = 2; acc.removeElement(2) -> return [1,3,4,4]
acc = [1,3,4,4]; val = 3; acc.removeElement(3) -> return [1,4,4]
acc = [1,4,4]; val = 4; acc.removeElement(4) -> return [1,4]
-- inject ends -->
PS: Kudos to almighty tim_yates who recommended improvements to that answer. Thanks, Tim!
the most readable that comes to my mind is:
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
b.each {c.removeElement(it)}
if you use this frequently you could add a method to the List metaClass:
List.metaClass.removeElements = { values -> values.each { delegate.removeElement(it) } }
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
c.removeElements(b)