How to access row elements in a polars LazyFrame/DataFrame - dataframe

I struggle accessing the row-elements of a Frame.
One idea I have is to filter the dataframe down to a row, convert it to a vec or something similar and access the elements this way ?!
In Panadas I used to just use ".at / .loc / .iloc / etc."; with Polars in Rust I have no clue.
Any suggestions on what the proper way to do this is ?

Thanks to #isaactfa ... he got me onto the right track. I ended up getting the row not with "get_row" but rather with "get" ... this is probably due to my little RUST understanding (my 2nd week).
Here is a working code sample:
use polars::export::arrow::temporal_conversions::date32_to_date;
use polars::prelude::*;
fn main() -> Result<()> {
let days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])?;
let options = StrpTimeOptions {
date_dtype: DataType::Date, // the result column-datatype
fmt: Some("%Y-%m-%d".into()), // the source format of the date-string
strict: false,
exact: true,
};
// convert date_string into dtype(date) and put into new column "date_type"
// we convert the days DataFrame to a LazyFrame ...
// because in my real-world example I am getting a LazyFrame
let mut new_days_lf = days.lazy().with_column(
col("date_string")
.alias("date_type")
.str()
.strptime(options),
);
// Getting the weekday as a number:
// This is what I wanted to do ... but I get a string result .. need u32
// let o = GetOutput::from_type(DataType::Date);
// new_days_lf = new_days_lf.with_column(
// col("date_type")
// .alias("weekday_number")
// .map(|x| Ok(x.strftime("%w").unwrap()), o.clone()),
// );
// This is the convoluted workaround for getting the weekday as a number
let o = GetOutput::from_type(DataType::Date);
new_days_lf = new_days_lf.with_column(col("date_type").alias("weekday_number").map(
|x| {
Ok(x.date()
.unwrap()
.clone()
.into_iter()
.map(|opt_name: Option<i32>| {
opt_name.map(|datum: i32| {
// println!("{:?}", datum);
date32_to_date(datum)
.format("%w")
.to_string()
.parse::<u32>()
.unwrap()
})
})
.collect::<UInt32Chunked>()
.into_series())
},
o,
));
new_days_lf = new_days_lf.with_column(
col("weekday_number")
.shift_and_fill(-1, 9999)
.alias("next_weekday_number"),
);
// now we convert the LazyFrame into a normal DataFrame for further processing:
let mut new_days_df = new_days_lf.collect()?;
// convert the column to a series
// to get a column by name we need to collect the LazyFrame into a normal DataFrame
let col1 = new_days_df.column("weekday_number")?;
// convert the column to a series
let col2 = new_days_df.column("next_weekday_number")?;
// now I can use series-arithmetics
let diff = col2 - col1;
// create a bool column based on "element == 2"
// add bool column to DataFrame
new_days_df.replace_or_add("weekday diff eq(2)", diff.equal(2)?.into_series());
// could not figure out how to filter the eager frame ...
let result = new_days_df
.lazy()
.filter(col("weekday diff eq(2)").eq(true))
.collect()
.unwrap();
// could not figure out how to access ROW elements
// thus I used "get" instead af of "get_row"
// getting the date where diff is == 2 (true)
let filtered_row = result.get(0).unwrap();
// within the filtered_row get element with an index
let date = filtered_row.get(0).unwrap();
println!("\n{:?}", date);
Ok(())
}

Related

Filter data from arrays

What I need is to sort data I get from an API into different arrays but add '0' value where there is no value at 1 type but is at the other type/s. Now is this possible with array.filter since its faster then a bunch of for and if loops ?
So let's say I get following data from SQL to the API:
Day Type Amount
-----------------------
12.1.2022 1 11
12.1.2022 2 4
13.1.2022 1 5
14.1.2022 2 9
16.1.2022 2 30
If I run this code :
this.data = result.Data;
let date = [];
const data = { 'dataType1': [], 'dataType2': [], 'dataType3': [], 'dataType4': [] }
/*only writing example for 2 types since for 4 it would be too long but i desire
answer that works for any amount of types or for 4 types */
this.data.forEach(x => {
var lastAddress = date[date.length - 1]
if (x.type == 1) {dataType1.push(x.Amount) }
if (x.type == 2) {dataType2.push(x.Amount) }}
lastAddress != x.Day ? date.push(x.Day) : '';
The array I get for type1 is [11,5]
and for type2 I get [4,9,30].
And for dates i get all the unique dates.
But the data I would like is: [11,5,0,0] and [4,0,9,30]
The size of array also has to match the size of Day array at the end.
which would be unique dates.. in this case:
[12.1.2022, 13.1.2022, 14.1.2022, 16.1.2022]
I have already tried to solve this with some for, if and while loops but it gets way too messy, so I'm looking for an alternative.
Also i have 4 types but for reference i only wrote sample for 2.
you can
first get the uniq values
loop over the data to create an array of object with
{date:string,values:number[]}
//create a function:
matrix(data:any[])
{
const uniqTypes=data.reduce((a,b)=>a.indexOf(b.type)>=0?a:
[...a,b.type],[])
const result=[]
this.data.forEach((x:any)=>{
let index=result.findIndex(r=>r.date==x.date)
if (index<0)
{
result.push({date:x.date,values:[...uniqTypes].fill(0)})
index=result.length-1;
}
result[index].values[uniqTypes.indexOf(x.type)]=x.amount
})
return result
}
//and use like
result=this.matrix(this.data);
NOTE: You can create the uniqType outside the function as variable and pass as argument to the function
stackblitz
const type1 = [];
const type2 = [];
data.forEach(item=>{
if(item.type==1){
type1.push(item.amount)
type2.push(0)
}else{
type1.push(0)
type2.push(item.amount)
}
})
console.log(type1)
console.log(type2)

Kotlin nested for loops to asSequence

I'm trying to convert my nested for loop to asSequence in Kotlin. Here, my goal is to get and update the value of all my object array from another object array with the same key.
nested for loop:
val myFields = getMyFields()
val otherFields = getOtherFields()
for (myField in myFields) { // loop tru the my fields
for (otherField in otherFields) { // find the same fields
if (myField.key == otherField.key) { // if the same, update the value
val updatedMyField = myField.copy(value = otherValue.value)
myFields[myFields.indexOf(myField)] = updatedMyField // update my field value
break
}
}
}
What I've tried:
val updatedMyFields = getMyFields().asSequence()
.map { myField ->
getOtherFields().asSequence()
.map { otherField ->
if (myField.key == otherField.key) {
return#map otherField.value
} else {
return#map ""
}
}
.filter { it?.isNotEmpty() == true }
.first()?.map { myField.copy(value = it.toString()) }
}
.toList()
but this does not compile as it will return List<List<MyField>>.
I'm just looking for something much cleaner for this.
As comments suggest, this would probably be much more efficient with a Map.
(More precisely, a map solution would take time proportional to the sum of the list lengths, while the nested for loop takes time proportional to their product — which gets bigger much faster.)
Here's one way of doing that:
val otherFields = getOtherFields().associate{ it.key to it.value }
val myFields = getMyFields().map {
val otherValue = otherFields[it.key]
if (otherValue != null) it.copy(value = otherValue) else it
}
The first line creates a Map from the ‘other fields’ keys to their values.  The rest then uses it to create a new list from ‘my fields’, substituting the values from the ‘other fields’ where present.
I've had to make assumptions about the types &c, since the code in the question is incomplete, but this should do the same.  Obviously, you can change how it merges the values by amending the it.copy().
There are likely to be even simpler and more efficient ways, depending on the surrounding code.  If you expanded it into a Minimal, Complete, and Verifiable Example — in particular, one that illustrates how you already use a Map, as per your comment — we might be able to suggest something better.
Why do you want to use asSequence() ? You can go for something like that:
val myFields = getMyFields()
val otherFields = getOtherFields()
myFields.forEach{firstField ->
otherFields.forEach{secondField ->
if (firstField.key == secondField.key) {
myFields[myFields.indexOf(firstField)] = secondField.value
}
}
}
This will do the same job than your nested for loop and it's easier to read, to understand and so to maintain than your nested asSequence().

making a linegraph that shows population decay with dc.js and crossfilter

I am creating a dashboard in DC.js. One of the visualizations is a survival curve showing the percentage of survival on the y-axis and the time in weeks on the x-axis
Each record in the dataset contains a deathAfter column called recidiefNa. This shows the number of weeks after death occurred, and shows -99 for survival.
See sketches for example dataset and desired chart form:
I created this code to create the dimensions and groups and draw the desired chart.
var recDim = cf1.dimension(dc.pluck('recidiefNa'));//sets dimension
var recGroup = recDim.group().reduceCount();
var resDim = cf1.dimension(dc.pluck('residuNa'));
var resGroup = resDim.group().reduceCount();
var scChart = dc.compositeChart("#scStepChart");
scChart
.width(600)
.height(400)
.x(d3.scale.linear().domain([0,52]))
.y(d3.scale.linear().domain([0,100]))
.clipPadding(10)
.brushOn(false)
.xAxisLabel("tijd in weken")
.yAxisLabel("percentage vrij van residu/recidief")
.compose([
dc.lineChart(scChart)
.dimension(recDim)
.group(recGroup)
.interpolate("step-after")
.renderDataPoints(true)
.renderTitle(true)
.keyAccessor(function(d){return d.key;})
.valueAccessor(function(d){return (d.value/cf1.groupAll().reduceCount().value()*100);}),
dc.lineChart(scChart)
.dimension(resDim)
.group(resGroup)
.interpolate("step-after")
.renderDataPoints(true)
.colors(['orange'])
.renderTitle(true)
.keyAccessor(function(d){return d.key;})
.valueAccessor(function(d){return (d.value/cf1.groupAll().reduceCount().value()*100 );})
])
.xAxis().ticks(4);
scChart.render();
This gives the following result:
As you can see my first problem is that I need the line to extend until the y-axis showing x=0weeks and y=100% as the first datapoint.
So that's question number one: is there a way to get that line to look more like my sketch(starting on the y-axis at 100%?
My second and bigger problem is that it is showing the inverse of the percentage I need (eg. 38 instead of 62). This is because of the way the data is structured (which is somehting i rather not change)
First I tried changing the valueaccessor to 100-*calculated number. Which is obviously the normal way to solve this issue. However my result was this:
As you can see now the survival curve is a positive incline which is never possible in a survival curve. This is my second question. Any ideas how to fix this?
Ah, it wasn't clear from the particular example that each data point should be based on the last, but your comment makes that clear. It sounds like what you are looking for is a kind of cumulative sum - in your case, a cumulative subtraction.
There is an entry in the FAQ for this.
Adapting that code to your use case:
function accumulate_subtract_from_100_group(source_group) {
return {
all:function () {
var cumulate = 100;
return source_group.all().map(function(d) {
cumulate -= d.value;
return {key:d.key, value:cumulate};
});
}
};
}
Use it like this:
var decayRecGroup = accumulate_subtract_from_100_group(recGroup)
// ...
dc.lineChart(scChart)
// ...
.group(decayRecGroup)
and similarly for the resGroup
While we're at it, we can concatenate the data to the initial point, to answer your first question:
function accumulate_subtract_from_100_and_prepend_start_point_group(source_group) {
return {
all:function () {
var cumulate = 100;
return [{key: 0, value: cumulate}]
.concat(source_group.all().map(function(d) {
cumulate -= d.value;
return {key:d.key, value:cumulate};
}));
}
};
}
(ridiculous function name for exposition only!)
EDIT: here is #Erik's final adapted answer with the percentage conversion built in, and a couple of performance improvements:
function fakeGrouper(source_group) {
var groupAll = cf1.groupAll().reduceCount();
return {
all:function () {
var cumulate = 100;
var total = groupAll.value();
return [{key: 0, value: cumulate}]
.concat(source_group.all().map(function(d) {
if(d.key > 0) {
cumulate -= (d.value/total*100).toFixed(0);
}
return {key:d.key, value:cumulate};
}));
}
};
}

Aligning time series in bacon.js

I would like to use Bacon to combine time series with irregular timestamps in a single EventStream. Each event would contain the last value of each time series at the given time.
Here is a an example
var ts1 = new Bacon.fromArray([
[1,2], //Each array is an event,i.e here timestamp is 1 and value is 2
[2,3],
[5,9]
])
var ts2 = new Bacon.fromArray([
[4,2],
[9,3],
[12,9]
])
What I would like to have is something like this
var ts12 =[
[1,2,undefined], //At time 1, only ts1 was defined
[2,3,undefined],
[4,3,2], //at time 4, we take the last value of ts1 (3) and ts2 (2)
[5,9,2],
[9,9,3],
[12,9,9],
]
I tried to implement it using Bacon.update but I didn't go very far. How would you approach the problem?
I make the assumption that your time series values arrive as values in the order of the time parameter. Hence the change to Bacon.later in my code.
var ts1 = new Bacon.mergeAll([
Bacon.later(100, [1,2]),
Bacon.later(200, [2,3]),
Bacon.later(500, [5,9])
])
var ts2 = new Bacon.mergeAll([
Bacon.later(400, [4,2]),
Bacon.later(900, [9,3]),
Bacon.later(1200, [12,9])
])
var ts12 = ts1.toProperty(null).combine(ts2.toProperty(null), function(v1, v2) {
if(v1 && v2) {
return [Math.max(v1[0], v2[0]), v1[1], v2[1]]
} else if(v1) {
return [v1[0], v1[1], undefined]
} else if(v2) {
return [v2[0], undefined, v2[1]]
}
}).changes()
ts12.log()
You can play around with the solution in this JSfiddle.

Adding x axis labels when using dojox.charting.DataSeries

I'm creating a Dojo line chart from a dojo.data.ItemFileReadStore using a dojox.charting.DataSeries. I'm using the third parameter (value) of the constructor of DataSeries to specify a method which will generate the points on the chart. e.g.
function formatLineGraphItem(store,item)
{
var o = {
x: graphIndex++,
y: store.getValue(item, "fileSize"),
};
return o;
}
The graphIndex is an integer which is incremented for every fileSize value. This gives me a line chart with the fileSize shown against a numeric count. This works fine.
What I'd like is to be able to specify the x axis label to use instead of the value of graphIndex i.e. the under lying data will still be 1,2,3,4 but the label will show text (in this case the time at which the file size was captured).
I can do this by passing in an array of labels into the x asis when I call chart.addAxis() but this requires me to know the the values before I iterate through the data. e.g.
var dataSeriesConfig = {query: {id: "*"}};
var xAxisLabels = [{text:"2011-11-20",value:1},{text:"2011-11-21",value:2},{text:"2011-11-22",value:3}];
var chart1 = new dojox.charting.Chart("chart1");
chart1.addPlot("default", {type: "Lines", tension: "4"});
chart1.addAxis("x", {labels: xAxisLabels});
chart1.addAxis("y", {vertical: true});
chart1.addSeries("Values", new dojox.charting.DataSeries(dataStore, dataSeriesConfig, formatLineGraphItem));
chart1.render();
The xAxisLabels array can be created by preparsing the dataSeries but it's not a very nice work around.
Does anyone have any ideas how the formatLineGraphItem method could be extended to provide the x axis labels. Or does anyone have any documentation on what values the object o can contain?
Thanks in advance!
This will take a unix timestamp, multiply the value by 1000 (so that it has microseconds for JavaScript, and then pass the value to dojo date to format it).
You shouldn't have any problems editing this to the format you need.
You provided examples that your dates are like "1", "2", "3", which is clearly wrong. Those aren't dates.. so this is the best you can do unless you edit your question.
chart1.addAxis("x",{
labelFunc: function(n){
if(isNaN(dojo.number.parse(n)) || dojo.number.parse(n) % 1 != 0){
return " ";
}
else {
// I am assuming that your timestamp needs to be multiplied by 1000.
var date = new Date(dojo.number.parse(n) * 1000);
return dojo.date.locale.format(date, {
selector: "date",
datePattern: "dd MMMM",
locale: "en"
});
}
},
maxLabelSize: 100
}