For the following code:
%%time
steps = df.select("step").distinct().collect()
for step in steps[:]:
_df = df.where(f"step = {step[0]}")
# by adding coalesce(1) we save the dataframe to one file
_df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")
I am getting the following error:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4892/519786224.py in <module>
2 _df = df.where(f"step = {step[0]}")
3 # by adding coalesce(1) we save the dataframe to one file
----> 4 _df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")
Need solution for this.
Related
I am sorting data within a dataframe, I am using the data from one column, binning it to intervals of 2 units and then I want a list of each of the intervals created. This is the code:
df = gpd.read_file(file)
final = df[df['i_h100']>=0]
final['H_bins']=pd.cut(x=final['i_h100'], bins=np.arange(0, 120+2, 2))
HBins = list(np.unique(final['H_bins']))
With most of my dataframes this works totally fine but occasionally I get the following error:
Traceback (most recent call last):
File "/var/folders/bg/5n2pqdj53xv1lm099dt8z2lw0000gn/T/ipykernel_2222/1548505304.py", line 1, in <cell line: 1>
HBins = list(np.unique(final['H_bins']))
File "<__array_function__ internals>", line 180, in unique
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 272, in unique
ret = _unique1d(ar, return_index, return_inverse, return_counts)
File "/Users/heatherkay/miniconda3/envs/gpd/lib/python3.9/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
ar.sort()
TypeError: '<' not supported between instances of 'float' and 'pandas._libs.interval.Interval'
I don't understand why, or how to resolve this.
I have the referenced vaex dataframe
The column "Amount_INR" is calculated using the other three attributes using the function:
def convert_curr(x,y,z):
c = CurrencyRates()
return c.convert(x, 'INR', y, z)
data_df_usd['Amount_INR'] = data_df_usd.apply(convert_curr,arguments=[data_df_usd.CURRENCY_CODE,data_df_usd.TOTAL_AMOUNT,data_df_usd.SUBSCRIPTION_START_DATE_DATE])
I'm trying to perform a groupby operation using the below code:
data_df_usd.groupby('CONTENTID', agg={'Revenue':vaex.agg.sum('Amount_INR')})
The code throws the below error:
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/vaex/scopes.py", line 113, in evaluate
result = self[expression]
File "/usr/local/lib/python3.7/dist-packages/vaex/scopes.py", line 198, in __getitem__
raise KeyError("Unknown variables or column: %r" % (variable,))
**KeyError: "Unknown variables or column: 'lambda_function(CURRENCY_CODE, TOTAL_AMOUNT, SUBSCRIPTION_START_DATE_DATE)'"**
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/forex_python/converter.py", line 103, in convert
converted_amount = rate * amount
TypeError: can't multiply sequence by non-int of type 'float'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/dist-packages/vaex/expression.py", line 1616, in _apply
scalar_result = self.f(*[fix_type(k[i]) for k in args], **{key: value[i] for key, value in kwargs.items()})
File "<ipython-input-7-8cc933ccf57d>", line 3, in convert_curr
return c.convert(x, 'INR', y, z)
File "/usr/local/lib/python3.7/dist-packages/forex_python/converter.py", line 107, in convert
"convert requires amount parameter is of type Decimal when force_decimal=True")
forex_python.converter.DecimalFloatMismatchError: convert requires amount parameter is of type Decimal when force_decimal=True
"""
The above exception was the direct cause of the following exception:
DecimalFloatMismatchError Traceback (most recent call last)
<ipython-input-13-cc7b243be138> in <module>
----> 1 data_df_usd.groupby('CONTENTID', agg={'Revenue':vaex.agg.sum('Amount_INR')})
According to the error screenshot, this does not look like it is related to the groupby. Something is happening with the convert_curr function.
You get the error
TypeError; can't multiply sequence by non-int of type 'float'
See of you can evaluate the Amount_INR in the first place.
I made a python script which takes pdbqt files as input and returns a txt file. As all the lines doesn't have the same no. of columns its not able to read the files. How can I ignore those lines?
sample pdbqt and txt files
the code
from __future__ import division
import numpy as np
def function(filename):
data = np.genfromtxt(filename, dtype = float , usecols = (6, 7, 8), skip_footer=1)
import os
all_filenames = os.listdir()
import glob
all_filenames = glob.glob('*.pdbqt')
print(all_filenames)
for filename in all_filenames:
function(filename)
the error I am getting
Traceback (most recent call last):
File "cen7.py", line 45, in <module>
function(filename)
File "cen7.py", line 7, in function
data = np.genfromtxt(filename, dtype = float , usecols = (6, 7, 8), skip_footer=1)
File "/home/../.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 2261, in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #3037 (got 4 columns instead of 3)
Line #6066 (got 4 columns instead of 3)
Line #9103 (got 4 columns instead of 3)
Line #12140 (got 4 columns instead of 3)
Line #15177 (got 4 columns instead of 3)
Let's make a sample csv:
In [75]: txt = """1,2,3,4
...: 5,6,7,8,9
...: """.splitlines()
This error is to be expected - the number of columns in the 2nd line is larger than previous:
In [76]: np.genfromtxt(txt, delimiter=',')
Traceback (most recent call last):
Input In [76] in <cell line: 1>
np.genfromtxt(txt, delimiter=',')
File /usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:2261 in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #2 (got 5 columns instead of 4)
I can avoid that with usecols. It isn't bothered by the extra columns in line 2:
In [77]: np.genfromtxt(txt, delimiter=',',usecols=(1,2,3))
Out[77]:
array([[2., 3., 4.],
[6., 7., 8.]])
But if the line is too short for the usecols, I get an error:
In [78]: np.genfromtxt(txt, delimiter=',',usecols=(2,3,4))
Traceback (most recent call last):
Input In [78] in <cell line: 1>
np.genfromtxt(txt, delimiter=',',usecols=(2,3,4))
File /usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py:2261 in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #1 (got 4 columns instead of 3)
The wording of the error isn't quite right, but it is clear which line is the problem.
That should give you something to look for when scanning the problem lines in your csv.
I am attempting to get the mean of the columns in a df but keeping getting this error using groupby
grouped_df = df.groupby('location')['number of orders'].mean()
print(grouped_df)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-8cc491c4c100> in <module>
----> 1 grouped_df = df.groupby('location')['number of orders'].mean()
2 print(grouped_df)
NameError: name 'df' is not defined
If I understand your comment correctly, your DataFrame is calleed df_dig. Accordingly
grouped_df = df_dig.groupby('location')['number of orders'].mean()
Here's the codes:
a = tf.constant([1,2,3,4])
b = tf.constant([4])
c = tf.split(a, tf.squeeze(b))
then, it turns out to be wrong:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jeff/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1203, in split
num = size_splits_shape.dims[0]
IndexError: list index out of range
But why?
The docs state,
If num_or_size_splits is a tensor, size_splits, then splits value into len(size_splits) pieces. The shape of the i-th piece has the same size as the value except along dimension axis where the size is size_splits[i].
Note that size_splits needs to be slicable.
However when you squeeze(b), because it has only one element in your example, it returns a scalar that has no dimension. A scalar cannot be sliced :
b_ = tf.squeeze(b)
b_[0] # error
Hence your error.