why would a line of code work when evaluated, but not when run - dataframe

I have passed a DataFrame from an ipython notebook to a function inside a standard .py file.
In the function, i'm using df['column_name'].values to try to extract all the values from that column.
While debugging (in pycharm), I am running this line in the 'evaluate' tool pycharm provides and it works fine. However, when I run the same line normally (outside the tool window), i get an error:
"TypeError: list indices must be integers or slices, not str"
When looking at my dataframe in the workspace variables section, it is interpreted correctly as a dataframe.
The dataframe object I am passing has 3 columns ('x','y' and 'likelyhood'), each containing integers/floats. The line in question that unpacks the values inside one of these columns is the first line of the function.
Can anyone explain how this can happen? What is the difference that causes the fact that a line of code can work in one, and return an error in the other?
and also what might cause this specific TypeError and how can I solve this bug?
the data frame as printed by df.tail(10):
coords x y likelihood
105570 297.497525 332.355521 1.0
105571 297.463208 332.353797 1.0
105572 297.439774 332.383908 1.0
105573 297.457581 332.458205 1.0
105574 297.487260 332.402202 1.0
105575 297.519772 332.451551 1.0
105576 297.495998 332.431064 1.0
105577 297.516722 332.113481 1.0
105578 297.542539 332.080923 1.0
105579 297.528317 332.046282 1.0
full function:
import math
import numpy as np
import pandas as pd
def filter_jumps(bp, thresh=10):
"""
the function watches for jumps that cannot happen, and interpolates the location of the point acoordingly
"""
x=bp['x'].values
y=bp['y'].values
for i in range(1,len(x)):
if np.abs(x[i]+y[i]-x[i-1]-y[i-1]) > thresh:
start = i-1; end = None; idx=i+1
while not end:
if np.abs(x[idx]+y[idx]-x[idx-1]-y[idx-1]) > thresh:
end = idx
else:
idx += 1
rang = end-start
x[start:end] = np.linspace(x[start], x[end], rang)
y[start:end] = np.linspace(y[start], y[end], rang)
bp['x'] = x
bp['y'] = y
return bp

Related

pycharm ignores command "integ()" from numpy.polynomial import Polynomialfunction / on jupyter it works

When using pycharm the integ() function is ignored:
from numpy.polynomial import Polynomial as P
p = P([1, 2, 3])
p.integ()
print(p)
outcome: 1.0 + 2.0 x**1 + 3.0 x**2 (no errors)
on jupyter
it gives me the correct result: 𝑥↦0.0+1.0𝑥+1.0𝑥2+1.0𝑥3
but I really prefer writing code on pycharm - can anyone tell me why this happens or how I could change it??
First, note that p.integ() doesn't change p. It returns a new polynomial object. When you execute print(p) after this expression, you are printing the original p that was created earlier.
In an interactive shell, with a line such as p.integ() that contains an expression (with no assignment), the shell (i.e. Jupyter) prints the value of the expression in the terminal. This is a feature of Jupyter, not of the Python interpreter. When such an expression is encountered in a program, the Python interpreter evaluates the expression, but does not print it. If you want to print the integral of p, you can do something like
q = p.integ()
print(q)

Numpy - AttributeError: 'Zero' object has no attribute 'exp'

I'm having trouble solving a discrepancy between something breaking at runtime, but using the exact same data and operations in the python console, having it work fine.
# f_err - currently has value 1.11819388872025
# l_scales - currently a numpy array [1.17840183376334 1.13456764589809]
sq_euc_dists = self.se_term(x1, x2, l_scales) # this is fine. It calls cdists on x1/l_scales, x2/l_scales vectors
return (f_err**2) * np.exp(-0.5 * sq_euc_dists) # <-- errors on this line
The error that I get is
AttributeError: 'Zero' object has no attribute 'exp'
However, calling those exact same lines, with the same f_err, l_scales, and x1, x2 in the console right after it errors out, somehow does not produce errors.
I was not able to find a post referring to the 'Zero' object error specifically, and the non-'Zero' ones I found didn't seem to apply to my case here.
EDIT: It was a bit lacking in info, so here's an actual (extracted) runnable example with sample data I took straight out of a failed run, which when run in isolation works fine/I can't reproduce the error except in runtime.
Note that the sqeucld_dist function below is quite bad and I should be using scipy's cdist instead. However, because I'm using sympy's symbols for matrix elementwise gradients with over 15 partial derivatives in my real data, cdist is not an option as it doesn't deal with arbitrary objects.
import numpy as np
def se_term(x1, x2, l):
return sqeucl_dist(x1/l, x2/l)
def sqeucl_dist(x, xs):
return np.sum([(i-j)**2 for i in x for j in xs], axis=1).reshape(x.shape[0], xs.shape[0])
x = np.array([[-0.29932052, 0.40997373], [0.40203481, 2.19895326], [-0.37679417, -1.11028267], [-2.53012051, 1.09819485], [0.59390005, 0.9735], [0.78276777, -1.18787904], [-0.9300892, 1.18802775], [0.44852545, -1.57954101], [1.33285028, -0.58594779], [0.7401607, 2.69842268], [-2.04258086, 0.43581565], [0.17353396, -1.34430191], [0.97214259, -1.29342284], [-0.11103534, -0.15112815], [0.41541759, -1.51803154], [-0.59852383, 0.78442389], [2.01323359, -0.85283772], [-0.14074266, -0.63457529], [-0.49504797, -1.06690869], [-0.18028754, -0.70835799], [-1.3794126, 0.20592016], [-0.49685373, -1.46109525], [-1.41276934, -0.66472598], [-1.44173868, 0.42678815], [0.64623684, 1.19927771], [-0.5945761, -0.10417961]])
f_err = 1.11466725760716
l = [1.18388412685279, 1.02290811104357]
result = (f_err**2) * np.exp(-0.5 * se_term(x, x, l)) # This runs fine, but fails with the exact same calls and data during runtime
Any help greatly appreciated!
Here is how to reproduce the error you are seeing:
import sympy
import numpy
zero = sympy.sympify('0')
numpy.exp(zero)
You will see the same exception you are seeing.
You can fix this (inefficiently) by changing your code to the following to make things floating point.
def sqeucl_dist(x, xs):
return np.sum([np.vectorize(float)(i-j)**2 for i in x for j in xs],
axis=1).reshape(x.shape[0], xs.shape[0])
It will be better to fix your gradient function using lambdify.
Here's an example of how lambdify can be used on partial d
from sympy.abc import x, y, z
expression = x**2 + sympy.sin(y) + z
derivatives = [expression.diff(var, 1) for var in [x, y, z]]
derivatives is now [2*x, cos(y), 1], a list of Sympy expressions. To create a function which will evaluate this numerically at a particular set of values, we use lambdify as follows (passing 'numpy' as an argument like that means to use numpy.cos rather than sympy.cos):
derivative_calc = sympy.lambdify((x, y, z), derivatives, 'numpy')
Now derivative_calc(1, 2, 3) will return [2, -0.41614683654714241, 1]. These are ints and numpy.float64s.
A side note: np.exp(M) will calculate the element-wise exponent of each of the elements of M. If you are trying to do a matrix exponential, you need np.linalg.exmp.

Pandas: Location of a row with error

I am pretty new to Pandas and trying to find out where my code breaks. Say, I am doing a type conversion:
df['x']=df['x'].astype('int')
...and I get an error "ValueError: invalid literal for long() with base 10: '1.0692e+06'
In general, if I have 1000 entries in the dataframe, how can I find out what entry causes a break. Is there anything in ipdb to output the current location (i.e. where the code broke)? Basically, I am trying to pinpoint what value cannot be converted to Int.
The error you are seeing might be due to the value(s) in the x column being strings:
In [15]: df = pd.DataFrame({'x':['1.0692e+06']})
In [16]: df['x'].astype('int')
ValueError: invalid literal for long() with base 10: '1.0692e+06'
Ideally, the problem can be avoided by making sure the values stored in the
DataFrame are already ints not strings when the DataFrame is built.
How to do that depends of course on how you are building the DataFrame.
After the fact, the DataFrame could be fixed using applymap:
import ast
df = df.applymap(ast.literal_eval).astype('int')
but calling ast.literal_eval on each value in the DataFrame could be slow, which is why fixing the problem from the beginning is the best alternative.
Usually you could drop to a debugger when an exception is raised to inspect the problematic value of row.
However, in this case the exception is happening inside the call to astype, which is a thin wrapper around C-compiled code. The C-compiled code is doing the looping through the values in df['x'], so the Python debugger is not helpful here -- it won't allow you to introspect on what value the exception is being raised from within the C-compiled code.
There are many important parts of Pandas and NumPy written in C, C++, Cython or Fortran, and the Python debugger will not take you inside those non-Python pieces of code where the fast loops are handled.
So instead I would revert to a low-brow solution: iterate through the values in a Python loop and use try...except to catch the first error:
df = pd.DataFrame({'x':['1.0692e+06']})
for i, item in enumerate(df['x']):
try:
int(item)
except ValueError:
print('ERROR at index {}: {!r}'.format(i, item))
yields
ERROR at index 0: '1.0692e+06'
I hit the same problem, and as I have a big input file (3 million rows), enumerating all rows will take a long time. Therefore I wrote a binary-search to locate the offending row.
import pandas as pd
import sys
def binarySearch(df, l, r, func):
while l <= r:
mid = l + (r - l) // 2;
result = func(df, mid, mid+1)
if result:
# Check if we hit exception at mid
return mid, result
result = func(df, l, mid)
if result is None:
# If no exception at left, ignore left half
l = mid + 1
else:
r = mid - 1
# If we reach here, then the element was not present
return -1
def check(df, start, end):
result = None
try:
# In my case, I want to find out which row cause this failure
df.iloc[start:end].uid.astype(int)
except Exception as e:
result = str(e)
return result
df = pd.read_csv(sys.argv[1])
index, result = binarySearch(df, 0, len(df), check)
print("index: {}".format(index))
print(result)
To report all rows which fails to map due to any exception:
df.apply(my_function) # throws various exceptions at unknown rows
# print Exceptions, index, and row content
for i, row in enumerate(df):
try:
my_function(row)
except Exception as e:
print('Error at index {}: {!r}'.format(i, row))
print(e)

How to set the nodes color with a given graph

I am using networkx and matplotlib
Now I want to set the color of nodes,and I read the graph from text file
G=nx.read_edgelist("Edge.txt")
nx.draw(G)
plt.show()
Here is the Edge file of example
0 1
0 2
3 4
Here is what I did,and is failed
import networkx as nx
import matplotlib.pyplot as plt
G = nx.read_edgelist("Edge.txt")
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G,pos,node_list=[0,1,2],node_color='B')
nx.draw_networkx_nodes(G,pos,node_list=[3,4],node_color='R')
plt.show()
the result is a lot of blue nodes without edges
So if I want to set the NodeListA=[0,1,2] to blue, NodeListB=[3,4] to red
How can I do that?
draw the nodes explicitly by calling the top-level function, draw_networkx_nodes
and pass in your node list as the value for the parameter node_list, and value for the parameter node_color, like so
nx.draw_network_nodes(G, pos, node_list=NodeListA, node_color="#5072A7")
the argument pos is just a python dictionary whose keys are the nodes of the graph and the values are x, y positions; an easy to to supply pos is to pass your graph object to spring_layout which will return the dictionary.
pos = nx.spring_layout(G)
alternatively, you can pass in the dictionary directly, e.g.,
pos = {
0:(2,2),
1:(3,5),
2:(1,2),
3:(5,5),
4:(7,4)
}
the likely cause of the code in the OP to execute is the call to read_edgelist; in particular, the file passed in is probably incorrectly formatted.
here's how to check this and also how to fix it:
G = nx.path_graph(5)
df = "/path/to/my/graphinit.edgelist"
nx.write_edgelist(G, df) # save a properly formatted edgelist file
G = nx.read_edgelist(df) # read that file back in

Complementary Filter Code Not functioning

I've been scratching my head too long.
The data is coming from an 3D accelerometer and 3D gyro. I am using a complementary filter to control drift.
I have it working in excel but can't seem to get this python code to do the same thing:
r1_angle_cfx = np.zeros(len(r1_angle_ax))
r1_angle_cfx[0] = r1_angle_ax[0]
for i in xrange(len(r1_angle_ax)-1):
j = i + 1
r1_angle_cfx[j] = 0.98 *(r1_angle_cfx[i] + r1_alpha_x[j]*fs) + (0.02 * r1_angle_ax[j]) #complementary filter
In excel (correct) I get:
In python (incorrect) I get:
What is going wrong? and is there a better way to do this in python?
Thanks,
Scott
EDIT: Link to data files -
sample data
1. The csv file contains accelerometer, gyro data that is entered into the filter formula as well as those values that were calculated in excel.
2. The excel file contains all raw data (steps not mentioned above but I have triple checked and are equivalent up to the point of being entered in the filter formula).
EDIT 2: update - Turns out my code works. It was sloppy debugging. fs should be fs = 0.01. In my code I had fs = 1/100 which ends up = 0 in the script.
Your Python code looks pretty reasonable. Without example data, I can't do much more than say that.
But I can guess. I looked up "complementary filters" and found a link explaining them:
https://sites.google.com/site/myimuestimationexperience/filters/complementary-filter
This link gives an example equation that is very similar to yours:
angle = (1-alpha)*(angle + gyro * dt) + (alpha)*(acc)
You have fs where this has dt, and dt is computed as 1/sampling_frequency. If fs is the sampling frequency, maybe you should try inverting it?
EDIT: Okay, now that you posted the data, I played around with this. Here is my program that gets a correct result.
Your code looks basically correct, so I think you must have made a mistake in your code that collected the values. I'm not quite sure because your variable names confuse me.
I used a namedtuple and for the names, I used the column headers from the CSV file (with spaces and periods removed to make a valid Python identifier).
import collections as coll
import csv
import matplotlib.pyplot as plt
import numpy as np
import sys
fs = 100.0
dt = 1.0/fs
alpha = 0.02
Sample = coll.namedtuple("Sample",
"accZ accY accX rotZ rotY rotX r acc_angZ acc_angY acc_angX cfZ cfY cfX")
def samples_from_file(fname):
with open(fname) as f:
next(f) # discard header row
csv_reader = csv.reader(f, dialect='excel')
for i, row in enumerate(csv_reader, 1):
try:
values = [float(x) for x in row]
yield Sample(*values)
except Exception:
lst = list(row)
print("Bad line %d: len %d '%s'" % (i, len(lst), str(lst)))
samples = list(samples_from_file("data.csv"))
cfx = np.zeros(len(samples))
# Excel formula: =R12
cfx[0] = samples[0].acc_angX
# Excel formula: =0.98*(U12+N13*0.01)+0.02*R13
# Excel: U is cfX N is rotX R is acc_angX
for i, s in enumerate(samples[1:], 1):
cfx[i] = (1.0 - alpha) * (cfx[i-1] + s.rotX*dt) + (alpha * s.acc_angX)
check_line = [s.cfX - cf for s, cf in zip(samples, cfx)]
plt.figure(1)
plt.plot(check_line)
plt.plot(cfx)
plt.show()
check_line is the difference between the saved cfX value from the CSV file, and the new computed cfx value. As you can see in the plot, this is a straight line at 0, so my calculation is agreeing quite well with yours.
So I guess the mapping of names is:
your_name my_name
________________________
r1_angle_cfx cfx
r1_alpha_x rotX
r1_angle_ax acc_angX