pandas 1.3.3 to_feather giving ArrowMemoryError - pandas

I have a dataset of size around 270MB and I use the following to write to feather file:
This gives me an error :
File "C:\apps\Python\lib\site-packages\pandas\util\", line 207, in wrapper
return func(*args, **kwargs)
File "C:\apps\Python\lib\site-packages\pandas\core\", line 2519, in to_feather
to_feather(self, path, **kwargs)
File "C:\apps\Python\lib\site-packages\pandas\io\", line 87, in to_feather
feather.write_feather(df, handles.handle, **kwargs)
File "C:\apps\Python\lib\site-packages\pyarrow\", line 152, in write_feather
table = Table.from_pandas(df, preserve_index=False)
File "pyarrow\table.pxi", line 1553, in pyarrow.lib.Table.from_pandas
File "C:\apps\Python\lib\site-packages\pyarrow\", line 607, in dataframe_to_arrays
arrays[i] = maybe_fut.result()
File "C:\apps\Python\lib\concurrent\futures\", line 438, in result
return self.__get_result()
File "C:\apps\Python\lib\concurrent\futures\", line 390, in __get_result
raise self._exception
File "C:\apps\Python\lib\concurrent\futures\", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\apps\Python\lib\site-packages\pyarrow\", line 575, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 114, in pyarrow.lib.check_status
pyarrow.lib.ArrowMemoryError: realloc of size 3221225472 failed
Note : This works well in PyCharm. No issues writing the feather file.
But when the python program is called in a Windows batch file like:
call python ""
and when I schedule the batch file in a task using Task Scheduler it fails with above memory error.
PyArrow version is 5.0.0 if that helps.
Any ideas please?


pandas read_csv with storage_options working locally but not in Dataflow

I am trying to import data from an API into my GBQ and want to use dataflow.
Due to reasons unknown and unimaginable to me, the API merely returns a URL of a ".csv.gz", which I then need to download and process before pushing the data into GBQ.
Furthermore, the API has authentication with a bearer token, so I was looking for a method to handle download and parsing of the data as well as the auth and found:
pd.read_csv('', storage_options={'Authorization': 'Bearer '+ bearer_token}, compression='gzip', header=0, sep=',', quotechar='"')
which works fantastically when using it in my Beam pipeline locally.
However, as soon as I upload the pipeline to dataflow and run it there, I get the error message
ValueError: storage_options passed with file object or non-fsspec file path
Full trace:
"apache_beam/runners/", line 1223, in
apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/", line 572, in
apache_beam.runners.common.SimpleInvoker.invoke_process File
".\", line 144, in process File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
610, in read_csv return _read(filepath_or_buffer, kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
462, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
819, in __init__ self._engine = self._make_engine(self.engine) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1050, in _make_engine return mapping[engine](self.f, **self.options) #
type: ignore[call-arg] File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1867, in __init__ self._open_handles(src, kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1362, in _open_handles self.handles = get_handle( File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
558, in get_handle ioargs = _get_filepath_or_buffer( File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
286, in _get_filepath_or_buffer raise ValueError( ValueError:
storage_options passed with file object or non-fsspec file path During
handling of the above exception, another exception occurred: Traceback
(most recent call last): File
line 651, in do_work work_executor.execute() File
line 179, in execute op.start() File
"dataflow_worker/", line 63, in
File "dataflow_worker/", line 64, in
File "dataflow_worker/", line 79, in
File "dataflow_worker/", line 80, in
File "dataflow_worker/", line 84, in
File "apache_beam/runners/worker/", line 353, in
apache_beam.runners.worker.operations.Operation.output File
"apache_beam/runners/worker/", line 215, in
File "dataflow_worker/", line 261, in
File "dataflow_worker/", line 268, in
File "apache_beam/runners/worker/", line 353, in
apache_beam.runners.worker.operations.Operation.output File
"apache_beam/runners/worker/", line 215, in
File "apache_beam/runners/worker/", line 712, in
apache_beam.runners.worker.operations.DoOperation.process File
"apache_beam/runners/worker/", line 713, in
apache_beam.runners.worker.operations.DoOperation.process File
"apache_beam/runners/", line 1225, in
apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/", line 1290, in
apache_beam.runners.common.DoFnRunner._reraise_augmented File
"apache_beam/runners/", line 1223, in
apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/", line 752, in
apache_beam.runners.common.PerWindowInvoker.invoke_process File
"apache_beam/runners/", line 875, in
File "apache_beam/runners/", line 1386, in
apache_beam.runners.common._OutputProcessor.process_outputs File
"apache_beam/runners/worker/", line 215, in
File "apache_beam/runners/worker/", line 712, in
apache_beam.runners.worker.operations.DoOperation.process File
"apache_beam/runners/worker/", line 713, in
apache_beam.runners.worker.operations.DoOperation.process File
"apache_beam/runners/", line 1225, in
apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/", line 1306, in
apache_beam.runners.common.DoFnRunner._reraise_augmented File
"apache_beam/runners/", line 1223, in
apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/", line 572, in
apache_beam.runners.common.SimpleInvoker.invoke_process File
".\", line 144, in process File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
610, in read_csv return _read(filepath_or_buffer, kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
462, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
819, in __init__ self._engine = self._make_engine(self.engine) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1050, in _make_engine return mapping[engine](self.f, **self.options) #
type: ignore[call-arg] File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1867, in __init__ self._open_handles(src, kwds) File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
1362, in _open_handles self.handles = get_handle( File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
558, in get_handle ioargs = _get_filepath_or_buffer( File
"/usr/local/lib/python3.8/site-packages/pandas/io/", line
286, in _get_filepath_or_buffer raise ValueError( ValueError:
storage_options passed with file object or non-fsspec file path [while
running 'Fetch actual report data'] ```
Does anyone know why that works locally but not in the cloud? I assume it might have to do with the filesystem and temporary files - but then the error message does not make a lot of sense...
According to the the pandas doc, the storage_options parameter is handed to urllib for https links and only to fsspec for s3 and gcs paths. see here
It turned out it was just a version issue. The interpretation of the storage options argument as authorization info did not exist in the pandas version that is included in the dataflow images and when I passed a local "wheel" of a the newest available pandas version with the --extra_package parameter, the issue resolved itself.

RASA init error : tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

I am new to rasa . I installed rasa 2.4.1 in my windows 10, python 3.7.6 machine without any error . But when I initialise rasa project I get following error . I tried with multiple rasa2.x versions and multiple tensorflow installations . But no luck . Any help to resolve this issue is appreciated .
File "D:\NLP\rasa_env\Scripts\rasa.exe\", line 7, in <module>
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 116, in main
File "d:\nlp\rasa_env\lib\site-packages\rasa\cli\", line 234, in run
init_project(args, path)
File "d:\nlp\rasa_env\lib\site-packages\rasa\cli\", line 129, in init_project
print_train_or_instructions(args, path)
File "d:\nlp\rasa_env\lib\site-packages\rasa\cli\", line 68, in print_train_or_instructions
training_result = rasa.train(domain, config, training_files, output)
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 109, in train
File "d:\nlp\rasa_env\lib\site-packages\rasa\utils\", line 308, in run_in_loop
result = loop.run_until_complete(f)
File "c:\users\kni9kor\anaconda3\lib\asyncio\", line 583, in run_until_complete
return future.result()
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 174, in train_async
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 353, in _train_async_internal
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 396, in _do_training
File "d:\nlp\rasa_env\lib\site-packages\rasa\", line 818, in _train_nlu_with_validated_data
File "d:\nlp\rasa_env\lib\site-packages\rasa\nlu\", line 116, in train
interpreter = trainer.train(training_data, **kwargs)
File "d:\nlp\rasa_env\lib\site-packages\rasa\nlu\", line 209, in train
updates = component.train(working_data, self.config, **context)
File "d:\nlp\rasa_env\lib\site-packages\rasa\nlu\classifiers\", line 810, in train
self.model = self._instantiate_model_class(model_data)
File "d:\nlp\rasa_env\lib\site-packages\rasa\nlu\classifiers\", line 1132, in _instantiate_model_class
File "d:\nlp\rasa_env\lib\site-packages\rasa\nlu\classifiers\", line 1146, in __init__
super().__init__("DIET", config, data_signature, label_data)
File "d:\nlp\rasa_env\lib\site-packages\rasa\utils\tensorflow\", line 705, in __init__
File "d:\nlp\rasa_env\lib\site-packages\rasa\utils\tensorflow\", line 91, in __init__
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\training\tracking\", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\keras\engine\", line 308, in __init__
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\training\tracking\", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\keras\engine\", line 317, in _init_batch_counters
self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 262, in __call__
return cls._variable_v2_call(*args, **kwargs)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 256, in _variable_v2_call
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 237, in <lambda>
previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 2646, in default_variable_creator_v2
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 264, in __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 1518, in __init__
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 1666, in _init_from_args
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 243, in eager_safe_variable_handle
shape, dtype, shared_name, name, graph_mode, initial_value)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 175, in _variable_handle_from_shape_and_dtype
math_ops.logical_not(exists), [exists], name="EagerVariableNameReuse")
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\ops\", line 49, in _assert
_ops.raise_from_not_ok_status(e, name)
File "d:\nlp\rasa_env\lib\site-packages\tensorflow\python\framework\", line 6843, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
Possible Solutions:
1.Kill Concurrent python programs (like Jupyter notebooks) that is trying to access Tensorflow simultaneously.
2.Setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true seems to make this issue disapper:
import os
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = "true"
I have also attached following similar issues for reference which might help you out. link1 , link2, link3

Error while exciting the on TF object detection API

i'm trying to evaluate my model
using this command:
python --logtostderr --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config --checkpoint_dir=inference_graph --eval_dir=eval
and im getting this error
and I'm getting this error:
Traceback (most recent call last): File "", line 142, in File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\platform\", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\", line 299, in run _run_main(main, args) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\", line 250, in _run_main sys.exit(main(argv)) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\util\", line 324, in new_func return func(*args, **kwargs) File "", line 138, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\", line 274, in evaluate evaluator_list = get_evaluators(eval_config, categories) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\", line 166, in get_evaluators EVAL_METRICS_CLASS_DICTeval_metric_fn_key) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\", line 470, in init use_weighted_mean_ap=False) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\", line 194, in init self._build_metric_names() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\", line 213, in _build_metric_names category_name = unicode(category_name, 'utf-8') NameError: name 'unicode' is not defined
Hi there!
Python 3 renamed the unicode type to str, the old str type has been replaced by bytes.
Knowing this it makes sense that we're getting errors as parts of the TF Object Detection API are deprecated (written using Python 2.x)
See here for more explanation on how to upgrade the code to be compatible with Python 3.
I hope this helps!

Apache BEAM pipeline fails when writing TF Records - AttributeError: 'str' object has no attribute 'iteritems'

The issue started appearing over the weekend. For some reason, it feels to be a DataFlow issue.
Previously, I was able to execute the script and write TF records just fine. However, now, I am unable to initialize the computation graph to process the data.
The traceback is:
Traceback (most recent call last):
File "", line 1492, in <module>
File "", line 402, in __init__
File "", line 514, in run
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/", line 426, in __exit__
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/runners/dataflow/", line 1238, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/", line 649, in do_work
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/", line 176, in execute
File "apache_beam/runners/worker/", line 531, in apache_beam.runners.worker.operations.DoOperation.start
def start(self):
File "apache_beam/runners/worker/", line 532, in apache_beam.runners.worker.operations.DoOperation.start
with self.scoped_start_state:
File "apache_beam/runners/worker/", line 533, in apache_beam.runners.worker.operations.DoOperation.start
super(DoOperation, self).start()
File "apache_beam/runners/worker/", line 202, in apache_beam.runners.worker.operations.Operation.start
def start(self):
File "apache_beam/runners/worker/", line 206, in apache_beam.runners.worker.operations.Operation.start
File "apache_beam/runners/worker/", line 480, in apache_beam.runners.worker.operations.DoOperation.setup
with self.scoped_start_state:
File "apache_beam/runners/worker/", line 485, in apache_beam.runners.worker.operations.DoOperation.setup
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/", line 247, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/", line 317, in loads
return load(file, ignore)
File "/usr/local/lib/python2.7/dist-packages/dill/", line 305, in load
obj = pik.load()
File "/usr/lib/python2.7/", line 864, in load
File "/usr/lib/python2.7/", line 1232, in load_build
for k, v in state.iteritems():
AttributeError: 'str' object has no attribute 'iteritems'
I am using tensorflow==1.13.1 and tensorflow-transform==0.9.0 and apache_beam==2.7.0
with beam.Pipeline(options=self.pipe_opt) as p:
with beam_impl.Context(temp_dir=self.google_cloud_options.temp_location):
# rest of the script
_ = (
| 'WriteTransformFn' >>
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
I was experiencing the same error.
It seems to be triggered by a mismatch in the tensorflow-transform versions of your local (or master) machine and the workers one (specified in the file).
In my case I was running tensorflow-transform==0.13 on my local machine whereas the workers were running 0.8.
Downgrading the local version to 0.8 fixed the issue.

AttributeError: 'S3File' object has not attribute 'getvalue', while running to_csv

I'm running to_csv command as follows to an ouput file on a s3 bucket with ServerSideEncryption enabled:
{'ServerSideEncryption': 'AES256'}})
I'm getting the following attribute error:
File "/usr/lib/python2.7/site-packages/dask/dataframe/", line 1091, in to_csv
return to_csv(self, filename, **kwargs)
File "/usr/lib/python2.7/site-packages/dask/dataframe/io/", line 577, in to_csv
delayed(values).compute(get=get, scheduler=scheduler)
File "/usr/lib/python2.7/site-packages/dask/", line 156, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/lib/python2.7/site-packages/dask/", line 400, in compute
results = schedule(dsk, keys, **kwargs)
File "/usr/lib/python2.7/site-packages/distributed/", line 2159, in get
File "/usr/lib/python2.7/site-packages/distributed/", line 1562, in gather
File "/usr/lib/python2.7/site-packages/distributed/", line 652, in sync
return sync(self.loop, func, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/distributed/", line 275, in sync
File "/usr/lib/python2.7/site-packages/distributed/", line 260, in f
result[0] = yield make_coro()
File "/usr/lib64/python2.7/site-packages/tornado/", line 1099, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/", line 260, in result
File "/usr/lib64/python2.7/site-packages/tornado/", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/distributed/", line 1439, in _gather
File "/usr/lib/python2.7/site-packages/dask/dataframe/io/", line 439, in _to_csv_chunk
df.to_csv(f, **kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/core/", line 1745, in to_csv
File "/usr/lib64/python2.7/site-packages/pandas/io/formats/", line 161, in save
buf = f.getvalue()
File "/usr/lib/python2.7/site-packages/dask/bytes/", line 136, in __getattr__
return getattr(self.file, key)
AttributeError: 'S3File' object has no attribute 'getvalue'
I searched for this error, but couldn't find a relevant solution.
Do you have any idea?