We are building a LSTM network using the C# API for CNTK but finds it very difficult based on the current level of the CNTK documentation to settle on the proper shape / dimensions of the inputs.
We have a time series with a value (one number) at each time t and we want to use a sequence of the previous 744 values of the time series to make a prediction using LSTM. furthermore, is we want to make a minibatch with 25 sequences how the shape of the CNTK.InputVariable should look like:
[0] 744
[1] 1
[2] 25
or
[0] 1
[1] 744
[2] 25
… and then, if we instead of one value at each time t have two values, how will the CNTK.InputVariable shape then look like?
If you use recurrent networks (LSTM, GRU), then you need to know what static and dynamic axes are.
The static axis is used to describe the input data forms (in the first case it is a vector of rank 1 and size 1: new int {1}).
The dynamic axis is used to specify the sequence (variable length in your case 744) of your input data (in your case new int {1}). To indicate that the dynamic axis should be used for sequences, specify this in the input parameter dynamicAxes: new[] { Axis.DefaultBatchAxis() }
var inputDimension = 1; //for two values is 2 etc.
var inputShape = new { inputDimension };
var input = Variable.InputVariable(inputShape, DataType.Double, "input", new[] { Axis.DefaultBatchAxis() });
And be sure to prepare the minibatches correctly (example of creating one minibatch):
var device = DeviceDescriptor.CPUDevice;
var inputDimension = 1;
var outputDimension = 1;
var minibatchSize = 25;
var oneMinibatchFeaturesData = new List<List<double[]>>(minibatchSize)
{
new List<double[]> //first sequence
{
new double[] { 23 },//t=1. Array.Length = inputDimension
new double[] { 25 },//t=2
//...
new double[] { 65 },//t=744
},
new List<double[]> //second seqeunce
{
new double[] { 76 }, //t=1
new double[] { 236 },//t=2
//...
new double[] { 87 }, //t=744
},
//...
new List<double[]> //twenty fifth sequence
{
new double[] { 9 }, //t=1
new double[] { 2 },//t=2
//...
new double[] { 90 }, //t=744
},
};
var oneMinibatchLabelsData = new List<double[]>(minibatchSize)
{
new double[] { 1 },//label of first sequence. Array.Length = outputDimension
new double[] { 5 },//label of second sequence
//...
new double[] { 3 }//label of twenty fifth sequence
};
var features = Value.CreateBatchOfSequences(new[] { inputDimension }, oneMinibatchFeaturesData.Select(sequence => sequence.SelectMany(value => value)), device);
var labels = Value.CreateBatch(new[] { outputDimension }, oneMinibatchLabelsData.SelectMany(value => value), device);
The length of the sequence can be arbitrary. One minibatch may contain sequences of different lengths.
LSTM is difficult to train on sequences of this length. If the length of your sequence is always 744, then you should probably use a simple FNN with input dimension 744.
In depth study of the CNTK Readers is what I recommend!
#Stanislav Grigorev is correct.
The Input Dimensions is entirely dependant on your Dataset. For example, the ATIS in the example here, looks like this:
The code example can be found here.
Data is read in with the reader:
IList<StreamConfiguration> streamConfigurations = new StreamConfiguration[]
{
new StreamConfiguration(featuresName, inputDim, true, "S0"),
// new StreamConfiguration(featuresName, inputDim, true, "S1"), // Not used in the old example.
new StreamConfiguration(labelsName, numOutputClasses, false, "S2")
};
and the reading with the TextFormatMinibatchSource:
var minibatchSource = MinibatchSource.TextFormatMinibatchSource(
Path.Combine(DataFolder, "Train.ctf"),
streamConfigurations,
MinibatchSource.InfinitelyRepeat,
true);
var featureStreamInfo = minibatchSource.StreamInfo(featuresName);
var labelStreamInfo = minibatchSource.StreamInfo(labelsName);
then the line, in the while loop:
var minibatchData = minibatchSource.GetNextMinibatch(minibatchSize, device);
Reads each minibatch. This is all obvious to anyone reading the code, but to illustrate the way the data is read in, I have provided this example.
The Dataset Parameters are given in the code example:
const int inputDim = 2000;
const int numOutputClasses = 5;
It is important these numbers are correct!
I have started a website: http://www.cntking.com/ to try to get the ball rolling on C# and CNTK, I think its a very underestimated Language C# for Machine Learning.
Related
I struggle accessing the row-elements of a Frame.
One idea I have is to filter the dataframe down to a row, convert it to a vec or something similar and access the elements this way ?!
In Panadas I used to just use ".at / .loc / .iloc / etc."; with Polars in Rust I have no clue.
Any suggestions on what the proper way to do this is ?
Thanks to #isaactfa ... he got me onto the right track. I ended up getting the row not with "get_row" but rather with "get" ... this is probably due to my little RUST understanding (my 2nd week).
Here is a working code sample:
use polars::export::arrow::temporal_conversions::date32_to_date;
use polars::prelude::*;
fn main() -> Result<()> {
let days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])?;
let options = StrpTimeOptions {
date_dtype: DataType::Date, // the result column-datatype
fmt: Some("%Y-%m-%d".into()), // the source format of the date-string
strict: false,
exact: true,
};
// convert date_string into dtype(date) and put into new column "date_type"
// we convert the days DataFrame to a LazyFrame ...
// because in my real-world example I am getting a LazyFrame
let mut new_days_lf = days.lazy().with_column(
col("date_string")
.alias("date_type")
.str()
.strptime(options),
);
// Getting the weekday as a number:
// This is what I wanted to do ... but I get a string result .. need u32
// let o = GetOutput::from_type(DataType::Date);
// new_days_lf = new_days_lf.with_column(
// col("date_type")
// .alias("weekday_number")
// .map(|x| Ok(x.strftime("%w").unwrap()), o.clone()),
// );
// This is the convoluted workaround for getting the weekday as a number
let o = GetOutput::from_type(DataType::Date);
new_days_lf = new_days_lf.with_column(col("date_type").alias("weekday_number").map(
|x| {
Ok(x.date()
.unwrap()
.clone()
.into_iter()
.map(|opt_name: Option<i32>| {
opt_name.map(|datum: i32| {
// println!("{:?}", datum);
date32_to_date(datum)
.format("%w")
.to_string()
.parse::<u32>()
.unwrap()
})
})
.collect::<UInt32Chunked>()
.into_series())
},
o,
));
new_days_lf = new_days_lf.with_column(
col("weekday_number")
.shift_and_fill(-1, 9999)
.alias("next_weekday_number"),
);
// now we convert the LazyFrame into a normal DataFrame for further processing:
let mut new_days_df = new_days_lf.collect()?;
// convert the column to a series
// to get a column by name we need to collect the LazyFrame into a normal DataFrame
let col1 = new_days_df.column("weekday_number")?;
// convert the column to a series
let col2 = new_days_df.column("next_weekday_number")?;
// now I can use series-arithmetics
let diff = col2 - col1;
// create a bool column based on "element == 2"
// add bool column to DataFrame
new_days_df.replace_or_add("weekday diff eq(2)", diff.equal(2)?.into_series());
// could not figure out how to filter the eager frame ...
let result = new_days_df
.lazy()
.filter(col("weekday diff eq(2)").eq(true))
.collect()
.unwrap();
// could not figure out how to access ROW elements
// thus I used "get" instead af of "get_row"
// getting the date where diff is == 2 (true)
let filtered_row = result.get(0).unwrap();
// within the filtered_row get element with an index
let date = filtered_row.get(0).unwrap();
println!("\n{:?}", date);
Ok(())
}
I am creating a dashboard in DC.js. One of the visualizations is a survival curve showing the percentage of survival on the y-axis and the time in weeks on the x-axis
Each record in the dataset contains a deathAfter column called recidiefNa. This shows the number of weeks after death occurred, and shows -99 for survival.
See sketches for example dataset and desired chart form:
I created this code to create the dimensions and groups and draw the desired chart.
var recDim = cf1.dimension(dc.pluck('recidiefNa'));//sets dimension
var recGroup = recDim.group().reduceCount();
var resDim = cf1.dimension(dc.pluck('residuNa'));
var resGroup = resDim.group().reduceCount();
var scChart = dc.compositeChart("#scStepChart");
scChart
.width(600)
.height(400)
.x(d3.scale.linear().domain([0,52]))
.y(d3.scale.linear().domain([0,100]))
.clipPadding(10)
.brushOn(false)
.xAxisLabel("tijd in weken")
.yAxisLabel("percentage vrij van residu/recidief")
.compose([
dc.lineChart(scChart)
.dimension(recDim)
.group(recGroup)
.interpolate("step-after")
.renderDataPoints(true)
.renderTitle(true)
.keyAccessor(function(d){return d.key;})
.valueAccessor(function(d){return (d.value/cf1.groupAll().reduceCount().value()*100);}),
dc.lineChart(scChart)
.dimension(resDim)
.group(resGroup)
.interpolate("step-after")
.renderDataPoints(true)
.colors(['orange'])
.renderTitle(true)
.keyAccessor(function(d){return d.key;})
.valueAccessor(function(d){return (d.value/cf1.groupAll().reduceCount().value()*100 );})
])
.xAxis().ticks(4);
scChart.render();
This gives the following result:
As you can see my first problem is that I need the line to extend until the y-axis showing x=0weeks and y=100% as the first datapoint.
So that's question number one: is there a way to get that line to look more like my sketch(starting on the y-axis at 100%?
My second and bigger problem is that it is showing the inverse of the percentage I need (eg. 38 instead of 62). This is because of the way the data is structured (which is somehting i rather not change)
First I tried changing the valueaccessor to 100-*calculated number. Which is obviously the normal way to solve this issue. However my result was this:
As you can see now the survival curve is a positive incline which is never possible in a survival curve. This is my second question. Any ideas how to fix this?
Ah, it wasn't clear from the particular example that each data point should be based on the last, but your comment makes that clear. It sounds like what you are looking for is a kind of cumulative sum - in your case, a cumulative subtraction.
There is an entry in the FAQ for this.
Adapting that code to your use case:
function accumulate_subtract_from_100_group(source_group) {
return {
all:function () {
var cumulate = 100;
return source_group.all().map(function(d) {
cumulate -= d.value;
return {key:d.key, value:cumulate};
});
}
};
}
Use it like this:
var decayRecGroup = accumulate_subtract_from_100_group(recGroup)
// ...
dc.lineChart(scChart)
// ...
.group(decayRecGroup)
and similarly for the resGroup
While we're at it, we can concatenate the data to the initial point, to answer your first question:
function accumulate_subtract_from_100_and_prepend_start_point_group(source_group) {
return {
all:function () {
var cumulate = 100;
return [{key: 0, value: cumulate}]
.concat(source_group.all().map(function(d) {
cumulate -= d.value;
return {key:d.key, value:cumulate};
}));
}
};
}
(ridiculous function name for exposition only!)
EDIT: here is #Erik's final adapted answer with the percentage conversion built in, and a couple of performance improvements:
function fakeGrouper(source_group) {
var groupAll = cf1.groupAll().reduceCount();
return {
all:function () {
var cumulate = 100;
var total = groupAll.value();
return [{key: 0, value: cumulate}]
.concat(source_group.all().map(function(d) {
if(d.key > 0) {
cumulate -= (d.value/total*100).toFixed(0);
}
return {key:d.key, value:cumulate};
}));
}
};
}
In my app i use ios-charts library (swift alternative of MPAndroidChart).
All i need is to display line chart with dates and values.
Right now i use this function to display chart
func setChart(dataPoints: [String], values: [Double]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<dataPoints.count {
let dataEntry = ChartDataEntry(value: values[i], xIndex: i)
dataEntries.append(dataEntry)
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Items count")
let lineChartData = LineChartData(xVals: dataPoints, dataSet: lineChartDataSet)
dateChartView.data = lineChartData
}
And this is my data:
xItems = ["27.05", "03.06", "17.07", "19.09", "20.09"] //String
let unitsSold = [25.0, 30.0, 45.0, 60.0, 20.0] //Double
But as you can see - xItems are dates in "dd.mm" format. As they are strings they have same paddings between each other. I want them to be more accurate with real dates. For example 19.09 and 20.09 should be very close. I know that i should match each day with some number in order to accomplish it. But i don't know what to do next - how i can adjust x labels margins?
UPDATE
After small research where i found out that many developers had asked about this feature but nothing happened - for my case i found very interesting alternative to this library in Swift - PNChart. It is easy to use, it solves my problem.
The easiest solution will be to loop through your data and add a ChartDataEntry with a value of 0 and a corresponding label for each missing date.
In response to the question in the comments here is a screenshot from one of my applications where I am filling in date gaps with 0 values:
In my case I wanted the 0 values rather than an averaged line from data point to data point as it clearly indicates there is no data on the days skipped (8/11 for instance).
From #Philipp Jahoda's comments it sounds like you could skip the 0 value entries and just index the data you have to the correct labels.
I modified the MPAndroidChart example program to skip a few data points and this is the result:
As #Philipp Jahoda mentioned in the comments the chart handles missing Entry by just connecting to the next data point. From the code below you can see that I am generating x values (labels) for the entire data set but skipping y values (data points) for index 11 - 29 which is what you want. The only thing remaining would be to handle the x labels as it sounds like you don't want 15, 20, and 25 in my example to show up.
ArrayList<String> xVals = new ArrayList<String>();
for (int i = 0; i < count; i++) {
xVals.add((i) + "");
}
ArrayList<Entry> yVals = new ArrayList<Entry>();
for (int i = 0; i < count; i++) {
if (i > 10 && i < 30) {
continue;
}
float mult = (range + 1);
float val = (float) (Math.random() * mult) + 3;// + (float)
// ((mult *
// 0.1) / 10);
yVals.add(new Entry(val, i));
}
What I did is fully feed the dates for x data even no y data for it, and just not add the data entry for the specific xIndex, then it will not draw the y value for the xIndex to achieve what you want, this is the easiest way since you just write a for loop and continue if you detect no y value there.
I don't suggest use 0 or nan, since if it is a line chart, it will connect the 0 data or bad things will happen for nan. You might want to break the lines, but again ios-charts does not support it yet (I also asked a feature for this), you need to write your own code to break the line, or you can live with connecting the 0 data or just connect to the next valid data.
The down side is it may has performance drop since many xIndex there, but I tried ~1000 and it is acceptable. I already asked for such feature a long time ago, but it took lot of time to think about it.
Here's a function I wrote based on Wingzero's answer (I pass NaNs for the entries in the values array that are empty) :
func populateLineChartView(lineChartView: LineChartView, labels: [String], values: [Float]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<labels.count {
if !values[i].isNaN {
let dataEntry = ChartDataEntry(value: Double(values[i]), xIndex: i)
dataEntries.append(dataEntry)
}
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Label")
let lineChartData = LineChartData(xVals: labels, dataSet: lineChartDataSet)
lineChartView.data = lineChartData
}
The solution which worked for me is splitting Linedataset into 2 Linedatasets. First would hold yvals till empty space and second after emptyspace.
//create 2 LineDataSets. set1- till empty space set2 after empty space
set1 = new LineDataSet(yVals1, "DataSet 1");
set2= new LineDataSet(yVals2,"DataSet 1");
//load datasets into datasets array
ArrayList<ILineDataSet> dataSets = new ArrayList<ILineDataSet>();
dataSets.add(set1);
dataSets.add(set2);
//create a data object with the datasets
LineData data = new LineData(xVals, dataSets);
// set data
mChart.setData(data);
I am doing a very simple stuff, my goal is to move one skeleton based on the position of the other skeleton, for this i am based myself on a HipCenter position.
(This algoritm could be wrong, this question is about a exception ocurring in the foreach loop)
Here is my actual code:
public static Skeleton MoveTo(this Skeleton skOrigin, Skeleton skDestiny)
{
Skeleton skReturn = skOrigin; // just making a copy
// find the factor to move, based on the HipCenter.
float whatToMultiplyX = skOrigin.Joints[JointType.HipCenter].Position.X / skDestiny.Joints[JointType.HipCenter].Position.X;
float whatToMultiplyY = skOrigin.Joints[JointType.HipCenter].Position.Y / skDestiny.Joints[JointType.HipCenter].Position.Y;
float whatToMultiplyZ = skOrigin.Joints[JointType.HipCenter].Position.Z / skDestiny.Joints[JointType.HipCenter].Position.Z;
SkeletonPoint movedPosition = new SkeletonPoint();
Joint movedJoint = new Joint();
foreach (JointType item in Enum.GetValues(typeof(JointType)))
{
// Updating the position
movedPosition.X = skOrigin.Joints[item].Position.X * whatToMultiplyX;
movedPosition.Y = skOrigin.Joints[item].Position.Y * whatToMultiplyY;
movedPosition.Z = skOrigin.Joints[item].Position.Z * whatToMultiplyZ;
// Setting the updated position to the skeleton that will be returned.
movedJoint.Position = movedPosition;
skReturn.Joints[item] = movedJoint;
}
return skReturn;
}
Using F10 to debug everything works fine ultin the second pass in te foreach loop.
When i am passing for the second time in the foreach i get a exception on this line
skReturn.Joints[item] = movedJoint;
The exception says:
JointType index value must match Joint.JointType
But the value is actualy the Spine.
Whats wrong?
Solved, here is the solution
Joint newJoint = new Joint(); // declare a new Joint
// Iterate in the 20 Joints
foreach (JointType item in Enum.GetValues(typeof(JointType)))
{
newJoint = skToBeMoved.Joints[item];
// applying the new values to the joint
SkeletonPoint pos = new SkeletonPoint()
{
X = (float)(newJoint.Position.X + (someNumber)),
Y = (float)(newJoint.Position.Y + (someNumber)),
Z = (float)(newJoint.Position.Z + (someNumber))
};
newJoint.Position = pos;
skToBeChanged.Joints[item] = newJoint;
}
This will work.
I'm creating a Dojo line chart from a dojo.data.ItemFileReadStore using a dojox.charting.DataSeries. I'm using the third parameter (value) of the constructor of DataSeries to specify a method which will generate the points on the chart. e.g.
function formatLineGraphItem(store,item)
{
var o = {
x: graphIndex++,
y: store.getValue(item, "fileSize"),
};
return o;
}
The graphIndex is an integer which is incremented for every fileSize value. This gives me a line chart with the fileSize shown against a numeric count. This works fine.
What I'd like is to be able to specify the x axis label to use instead of the value of graphIndex i.e. the under lying data will still be 1,2,3,4 but the label will show text (in this case the time at which the file size was captured).
I can do this by passing in an array of labels into the x asis when I call chart.addAxis() but this requires me to know the the values before I iterate through the data. e.g.
var dataSeriesConfig = {query: {id: "*"}};
var xAxisLabels = [{text:"2011-11-20",value:1},{text:"2011-11-21",value:2},{text:"2011-11-22",value:3}];
var chart1 = new dojox.charting.Chart("chart1");
chart1.addPlot("default", {type: "Lines", tension: "4"});
chart1.addAxis("x", {labels: xAxisLabels});
chart1.addAxis("y", {vertical: true});
chart1.addSeries("Values", new dojox.charting.DataSeries(dataStore, dataSeriesConfig, formatLineGraphItem));
chart1.render();
The xAxisLabels array can be created by preparsing the dataSeries but it's not a very nice work around.
Does anyone have any ideas how the formatLineGraphItem method could be extended to provide the x axis labels. Or does anyone have any documentation on what values the object o can contain?
Thanks in advance!
This will take a unix timestamp, multiply the value by 1000 (so that it has microseconds for JavaScript, and then pass the value to dojo date to format it).
You shouldn't have any problems editing this to the format you need.
You provided examples that your dates are like "1", "2", "3", which is clearly wrong. Those aren't dates.. so this is the best you can do unless you edit your question.
chart1.addAxis("x",{
labelFunc: function(n){
if(isNaN(dojo.number.parse(n)) || dojo.number.parse(n) % 1 != 0){
return " ";
}
else {
// I am assuming that your timestamp needs to be multiplied by 1000.
var date = new Date(dojo.number.parse(n) * 1000);
return dojo.date.locale.format(date, {
selector: "date",
datePattern: "dd MMMM",
locale: "en"
});
}
},
maxLabelSize: 100
}