Related
Below is example data which are probabilities from predict_proba. I want to split this data frame into deciles but with equal number of rows in each decile. I used pd.qcut but with that because of the repeating values at the boundary the rows in each decile become unequal.
I used below method to get equal splits which worked but problem is with this approach I can't get bins(range).
test_df["TOP_DECILE"] = pd.qcut(test_df["VALIDATION_PROB_1"].rank(method='first'), 10, retbins = False, labels = [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
For each decile we need to see the probability range as well. This is how we need final output:
Is there a clean implementation we can do achieve this?
This is how I implemented finally:
test_df["TOP_DECILE"] = pd.qcut(test_df["VALIDATION_PROB_1"].rank(method='first'), 10, retbins = False, labels = [100, 90, 80, 70, 60, 50, 40, 30, 20, 10])
test_df = test_df.merge(test_df.groupby('TOP_DECILE')["VALIDATION_PROB_1"].agg(['min', 'max']), right_index=True, left_on='TOP_DECILE')
test_df["PROBILITY_RANGE"] = "[" + (test_df["min"]).astype(str) + " - " + test_df["max"].astype(str) + "]"
But there should be a cleaner approach.
I'm attempting to iterate over a list of lists that includes a date range of 1978-2020, but with only built-in Python modules. For instance, my nested list looks something like:
listing =[['0010', 'green', '1978', 'light'], ['0020', 'blue', '1978', 'dark'], ... ['2510', 'red', '2020', 'light']]
As I am iterating through, I am trying to make an aggregated count of colors and shades for that year, and then append that year's totals into a new list such as:
# ['year', 'blues', 'greens', 'light', dark']
annual_totals = [['1978', 12, 34, 8, 16], ['1979', 14, 40, 13, 9], ... , ['2020', 48, 98, 14, 10]]
So my failed code looks something like this:
annual_totals = []
for i in range(1978, 2021):
for line in listing:
while i == line[2] #if year in list same as year in iterated range, count tally for year
blue = 0
green = 0
light = 0
dark = 0
if line[1] == 'blue'
blue += 1
if line[1] == 'green'
blue += 1
if line[3] == 'light'
light += 1
if line[3] == 'dark'
dark += 1
tally = [i, 'blue', 'green', 'light', dark']
annual_totals.append(tally)
Of course, I never get out of the While loop to get a new year for iterable i.
I'm developing a pythong script where I receive angular measurements from a motor which has a low resolution encoder attached to it. The data I get from the motor has a very low resolution (about 5 degrees division in between measurments). This is an example of the sensor output whilst it is rotating with a constant speed (in degrees):
sensor output = ([5, 5, 5, 5, 5, 10, 10, 10, 10 ,10, 15, 15, 20, 20, 20, 20, 25, 25, 30, 30, 30, 30, 30, 35, 35....])
As you can see, some of these measurements are repeating themselves.
From these measurements, I would like to interpolate in order to get the measurements in between the 1D data-points. For instance, if I at time k receive the angular measurement theta=5 and in the next instance at t=k+1 also receive a measurement of theta=5, I would like to compute an estimate that would be something like theta = 5+(1/5).
I have also been considering using some sort of predictive filtering, but I'm not sure which one to apply if that is even applicable in this case (e.g. Kalman filtering). The estimated output should be in a linear form since the motor is rotating with a constast angular velocity.
I have tried using numpy.linspace in order to acheive what I want, but cannot seem to get it to work the way I want:
# Interpolate for every 'theta_div' values in angle received through
# modbus
for k in range(np.size(rx)):
y = T.readSensorData() # take measurement (call read sensor function)
fp = np.linspace(y, y+1, num=theta_div)
for n in range(theta_div):
if k % 6 == 0:
if not y == fp[n]:
z = fp[n]
else:
z = y
print(z)
So for the sensor readings: ([5, 5, 5, 5, 5, 10, 10, 10, 10 ,10, 15, 15, 20, 20, 20, 20, 25, 25, 30, 30, 30, 30, 30, 35, 35....]) # each element at time=k0...kn
I would like the output to be something similar to:
theta = ([5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17.5, 20...])
So in short, I need some sort of prediction and then update the value with the actual reading from the sensor, similar to the procedure in a Kalman filter.
why dont just make a linear fit?
import numpy as np
import matplotlib.pyplot as plt
messurements = np.array([5, 5, 5, 5, 5, 10, 10, 10, 10 ,10, 15, 15, 20, 20, 20, 20, 25, 25, 30, 30, 30, 30, 30, 35, 35])
time_array = np.arange(messurements.shape[0])
fitparms = np.polyfit(time_array,messurements,1)
def line(x,a,b):
return a*x +b
better_time_array = np.linspace(0,np.max(time_array))
plt.plot(time_array,messurements)
plt.plot(better_time_array,line(better_time_array,fitparms[0],fitparms[1]))
I have the dataframe like
ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
I am trying to parse the text in column named Series to different columns named Series1 Series2 etc upto the highest number of texts parsed.
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
something like this:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
The format of your final result is not easy to understand, but maybe you can follow the concept to create your new columns:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
And if you want to create N new columns (N = max_len(Series_list)), I think you can calculate N first. Then, follow the concept above and fill in NaN properly to create N new columns.
what are the interpretation values in the"" points assignment. all I want is a "plane jane" line where I can manipulate the length and width. Any help will greatly appreciated.
var line = new Kinetic.Line({
x: 100,
y: 50,
points: [73, 70, 340, 23, 450, 60, 500, 20],
stroke: 'red',
tension: 1
});
The points array is a series of x,y coordinates:
// [73, 70, 340, 23, 450, 60, 500, 20],
{x:73,y:70},
{x:340,y:23},
{x:450,y:60},
{x:500,y:20}
This is your "plain jane" line:
// draw a black line from 25,25 to 100,50 and width of 5
var line = new Kinetic.Line({
points:[25,25, 100,50],
y:100,
stroke: 'black',
strokeWidth: 5
});