I am simply trying to cycle through a list of (10) names using an incrementing counter by taking the modulus of the counter with respect to the length of the list. However, the code seems to skip a number here and there. I have tried both modf() and modff() and different type castings, but no luck.
Here is an example of the code:
defaultNameList = [NSArray arrayWithObjects:#"RacerX",#"Speed",#"Sprittle",#"Chim-Chim",#"Pops",#"Dale",#"Junior",#"Chip",#"Fred",#"Barney", nil];
float intpart;
int pickName = (int)(modff(entryCount/10.0,&intpart) * 10.0);
NSLog(#"%ld %f %f %f %d %#",entryCount, entryCount/10.0, modff(entryCount/10.0,&intpart), modff(entryCount/10.0,&intpart) * 10.0 ,pickName, [defaultNameList objectAtIndex:pickName]);
The console gives:
0 0.000000 0.000000 0.000000 0 RacerX
1 0.100000 0.100000 1.000000 1 Speed
2 0.200000 0.200000 2.000000 2 Sprittle
3 0.300000 0.300000 3.000000 3 Chim-Chim
4 0.400000 0.400000 4.000000 4 Pops
5 0.500000 0.500000 5.000000 5 Dale
6 0.600000 0.600000 6.000000 6 Junior
7 0.700000 0.700000 7.000000 6 Junior
8 0.800000 0.800000 8.000000 8 Fred
9 0.900000 0.900000 9.000000 8 Fred
10 1.000000 0.000000 0.000000 0 RacerX
As far as I can tell it should not skip pickName = 7 or 9, but it does.
Casting to (int) truncates the file. That is, if it cannot be exactly represented in the floating-point system which is used on the actual architecture, and is a bit less than the exact value, it will be rounded towards zero. To solve this problem, round the number instead of truncating:
int pickName = (int)(modff(entryCount / 10.0, &intpart) * 10.0 + 0.5);
(This assumes that the number is not negative.)
However, since you're working with integers here, and floating-point operations are expensive, you should consider using the modulo operator instead (which operates on integers):
int pickName = entryCount % 10;
Related
Following is the code I am using to generate a list and write it to a text file:
import numpy as np
c = 0
a = []
for i in range(1, 16,1):
b = i/10
c += 1
a.append([c,b])
np.savetxt('test.txt', a, delimiter=" ", fmt="%s")
When the list a is printed, the values taken by c are integers. However, when the list a is written to a file, c becomes float. Is it possible to append float and also integer to a text file using numpy.savetxt?
You can specify the format of each value. In your case where np.array(a) produce a 2D array with 2 columns:
np.savetxt('your_file.txt',a,delimiter=' ',fmt='%d %f')
Where fmt = '%d %f' correspond to an integer followed by a float.
The .txt file now contains:
1 0.100000
2 0.200000
3 0.300000
4 0.400000
5 0.500000
6 0.600000
7 0.700000
8 0.800000
9 0.900000
10 1.000000
11 1.100000
12 1.200000
13 1.300000
14 1.400000
15 1.500000
I have two dataframes extracted from two attached files.
I want to compute JaroWinkler Similarity for tokens inside the files. I am using below code.
from similarity.jarowinkler import JaroWinkler
jarowinkler = JaroWinkler()
df_gt['jarowinkler_sim'] = [jarowinkler.similarity(x.lower(), y.lower()) for x, y in zip(df_ex['abstract_ex'], df_gt['abstract_gt'])]
I am facing two problems:
1. Order of the tokens are not being handled.
When position of the token 'can' and 'interesting' is changed similarity index is wrongly computed!!
Unnamed: 0 abstract_gt jarowinkler_sim
0 0 Bipartite 1.000000
1 1 fluctuations 0.914141
2 2 can 0.474747 <--|
3 3 provide 1.000000 |-- Position swapped in one file
4 4 interesting 0.474747 <--|
5 5 information 1.000000
6 6 about 1.000000
7 7 entanglement 1.000000
8 8 properties 1.000000
9 9 and 1.000000
10 10 correlations 1.000000
2. Size of the dataframe might not be always same.
When one of the dataframe contains less elements my solution gives an error.
raise ValueError( ValueError: Length of values (10) does not match
length of index (11)
How can I solve these two problems and compute the similarity accurately?
Thanks !!
TSV FILES
1. df_ex
abstract_ex
0 Bipartite
1 fluctuations
2 interesting
3 provide
4 can
5 information
6 about
7 entanglement
8 properties
9 and
10 correlations
df_gt
abstract_gt
0 Bipartite
1 fluctuations
2 interesting
3 provide
4 can
5 information
6 about
7 entanglement
8 properties
9 and
10 correlations
I have following dataframe.
precision recall F1 cutoff
cutoff
0 0.690148 1.000000 0.814610 0
1 0.727498 1.000000 0.839943 1
2 0.769298 0.916667 0.834051 2
3 0.813232 0.916667 0.859741 3
4 0.838062 0.833333 0.833659 4
5 0.881454 0.833333 0.854946 5
6 0.925455 0.750000 0.827202 6
7 0.961111 0.666667 0.786459 7
8 0.971786 0.500000 0.659684 8
9 0.970000 0.166667 0.284000 9
10 0.955000 0.083333 0.152857 10
I want to plot cutoff column on x-axis and precision,recall and F1 values as separate lines on the same plot (in different color). How can I do it?
When I am trying to plot the dataframe, it is taking the cutoff column also for plotting.
Thanks
Remove column before ploting:
df.drop('cutoff', axis=1).plot()
But maybe problem is how is created index, maybe help change:
df = df.set_index(df['cutoff'])
df.drop('cutoff', axis=1).plot()
to:
df = df.set_index('cutoff')
df.plot()
Suppose, I have this kind of data:
date pollution dew temp press wnd_dir wnd_spd snow rain
2010-01-02 00:00:00 129.0 -16 -4.0 1020.0 SE 1.79 0 0
2010-01-02 01:00:00 148.0 -15 -4.0 1020.0 SE 2.68 0 0
2010-01-02 02:00:00 159.0 -11 -5.0 1021.0 SE 3.57 0 0
2010-01-02 03:00:00 181.0 -7 -5.0 1022.0 SE 5.36 1 0
2010-01-02 04:00:00 138.0 -7 -5.0 1022.0 SE 6.25 2 0
I want to apply neural network for the time-series prediction of pollution.
It should be noted that other variables: dew, temp, press, wnd_dir, wnd_spd, snow, rain are independent variables of pollution.
If I implement LSTM as in here the LSTM learns for all the variables as independent; and the model can predict for all variables.
But it is not necessary to predict for all independent variables, the only requirement is pollution, a dependent variable.
Is there any way to implement LSTM or another better architecture which learns and predict for only the dependent variable, by considering other independent variables as independent, and perform much better prediction of pollution?
It seems like the example is predicting only pollution already. If you see the reframed:
var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) \
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290
2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811
3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332
4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391
5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912
var7(t-1) var8(t-1) var1(t)
1 0.000000 0.0 0.148893
2 0.000000 0.0 0.159960
3 0.000000 0.0 0.182093
4 0.037037 0.0 0.138833
5 0.074074 0.0 0.109658
The var1 seems to be pollution. As you see, you have the values from the previous step (t-1) for all variables and the value for the current step t for pollution (var1(t)).
This last variable is what the example is feeding as y, as you can see in lines:
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
So the network should be already predicting only on pollution.
I am interested in sorting a grouped dataframe by the number of entries for each group. As far as I can see, I can either sort by the group labels or not at all. Say I have 10 entries that belong to three groups. Group A has 6 members, group B has three members, and group C has 1 member. Now when I e.g. do a grouped.describe(), I would like the output to be ordered so that the group with the most entries is shown first.
I would unstack the statistics from describe(), then you can simply use sort(), so:
incsv = StringIO("""Group,Value
B,1
B,2
B,3
C,8
A,5
A,10
A,15
A,25
A,35
A,40""")
df = pd.read_csv(incsv)
groups = df.groupby('Group').describe().unstack()
Value
count mean std min 25% 50% 75% max
Group
A 6 21.666667 14.023789 5 11.25 20 32.5 40
B 3 2.000000 1.000000 1 1.50 2 2.5 3
C 1 8.000000 NaN 8 8.00 8 8.0 8
dfstats.xs('Value', axis=1).sort('count', ascending=True)
count mean std min 25% 50% 75% max
Group
C 1 8.000000 NaN 8 8.00 8 8.0 8
B 3 2.000000 1.000000 1 1.50 2 2.5 3
A 6 21.666667 14.023789 5 11.25 20 32.5 40
I reversed the sort just for illustration because it was already sorted by default, but you can sort anyway you want of course.
Bonus for anyone who can sort by count without dropping or stacking the 'Value' level. :)