Issue with tables and HDF5 python package - pandas

Here is my code:
import os
import pandas as pd
def load_hdf(filename):
"""
Load the first key of an HDF file
"""
hdf = pd.HDFStore(filename,mode = 'r')
keys = hdf.keys()
if not keys:
hdf.close()
return pd.DataFrame()
data_df = hdf.get(keys[0])
hdf.close()
return data_df
And when I do:
load_hdf(os.path.join(PATH, 'crm.hd5'))
I have this error:
HDFStore requires PyTables, "No module named 'tables'" problem importing
When I try:
pip install tables
I have the error:
Using Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
* USE_PKGCONFIG: False
.. ERROR:: Could not find a local HDF5 installation.
You may need to explicitly state where your local HDF5 headers and library can be found by setting the ``HDF5_DIR`` environment variable or by using the ``--hdf5`` command-line option.
...
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/2s/sn3gzfwd6_37v0ggqd0n8qy00000gn/T/pip-install-1mx6wjd3/tables/
I already have Pytables, hdf5 in my Anaconda. I have Python 3.7.

I also had pytables installed and could not find a solution. What worked for me is to install the release candidate of HDF5 2.8.0rc1 (as seen here). Seems that the HDF5 version that panda installs is not fully compatible.
So try:
pip install h5py==2.8.0rc1
Hope it helps.

Related

`conda search PKG --info` shows different dependencies than what conda wants to install?

I'm building a new conda environment using python=3.9 for the
osx-arm64 architecture.
conda create -n py39 python=3.9 numpy
conda list
...
numpy 1.21.1 py39h1a24bff_2
...
python 3.9.7 hc70090a_1
So far so good: numpy=1.21.1 is the one i want. Now I want to add
scipy, and the first one seems to fit the bill:
conda search scipy --info
scipy 1.7.1 py39h2f0f56f_2
--------------------------
file name : scipy-1.7.1-py39h2f0f56f_2.conda
name : scipy
version : 1.7.1
build : py39h2f0f56f_2
build number: 2
size : 14.8 MB
license : BSD 3-Clause
subdir : osx-arm64
url : https://repo.anaconda.com/pkgs/main/osx-arm64/scipy-1.7.1-py39h2f0f56f_2.conda
md5 : edbd5a5399e973d1d0325147b7118f79
timestamp : 2021-08-25 16:12:39 UTC
dependencies:
- blas * openblas
- libcxx >=12.0.0
- libgfortran 5.*
- libgfortran5 >=11.1.0
- libopenblas >=0.3.17,<1.0a0
- numpy >=1.19.5,<2.0a0
- python >=3.9,<3.10.0a0
in particular, python >=3.9 and numpy >=1.19 seems just right.
but when i try the install
conda install scipy
...
The following packages will be DOWNGRADED:
numpy 1.21.1-py39h1a24bff_2 --> 1.19.5-py39habd9f23_3
(I have bumped into various constraints with numpy=1.19 (numba,
pandas,) and am trying to avoid it.)
Why isn't the scipy package happy with the numpy=1.21 version I
have?!
The only possible clue is that conda reports a different python
version (3.8.11) than the v3.9 I specified for this environment:
conda info
active environment : py39
active env location : .../miniconda3/envs/py39
shell level : 1
user config file : .../.condarc
populated config files : .../.condarc
conda version : 4.11.0
conda-build version : not installed
python version : 3.8.11.final.0 <-------------------
virtual packages : __osx=12.1=0
...
but all the environment's pointers seem to be set correctly:
(py39) % which python
.../miniconda3/envs/py39/bin/python
(py39) % python
Python 3.9.7 (default, Sep 16 2021, 23:53:23)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Thanks, any hints as to what's broken will be greatly appreciated!
I now have things working, but I'm afraid I can't point to a satisfying "answer." Others (eg #merv) seem to not be having the same problems and I can't identify the difference.
The one thing that I did find that seemed to create issues in my install was what seems to be some mislabeling of the pandas package: pandas v1.3.5 breaks a numpy==1.19.5 requirement that is the only way i've been able to push it thru. i posted a pandas issue comment

ModuleNotFoundError: No module named 'pandas._libs.interval' | Installed pandas from git in Docker container

This is not a duplicate of existing questions because:
I'm contributing to the pandas repository itself.
I've installed pandas using the git repo and not pip.
I've used a Docker container as suggested by pandas to create the development environment.
The pandas installation is successful & any file is not missing. I've manually verified that pandas._libs.interval is present.
When I tried to import from pandas, I'd get this error:
ImportError while loading conftest '/workspaces/pandas/pandas/conftest.py'.
../../../__init__.py:22: in <module>
from pandas.compat import is_numpy_dev as _is_numpy_dev
../../../compat/__init__.py:15: in <module>
from pandas.compat.numpy import (
../../../compat/numpy/__init__.py:7: in <module>
from pandas.util.version import Version
../../../util/__init__.py:1: in <module>
from pandas.util._decorators import ( # noqa
../../../util/_decorators.py:14: in <module>
from pandas._libs.properties import cache_readonly # noqa
../../../_libs/__init__.py:13: in <module>
from pandas._libs.interval import Interval
E ModuleNotFoundError: No module named 'pandas._libs.interval'
The solution is to rebuild the c extensions.
python setup.py clean (optional, use if 2. doesn't work)
python setup.py build_ext -j 4
Credits: #MarcoGorelli from the pandas community on Gitter.
More on why this solution works:
I suspect that while docker was building the remote container, there were some issues due to an unreliable internet connection.
As all modules were indeed present, one of the only possibilities would be that they couldn't be accessed by Python. The most plausible reason is an issue with the C compiler, something related to cython (interval is a.pyx file).
Also see: https://pandas.pydata.org/docs/development/contributing_environment.html#creating-a-python-environment

Version mismatch: this is the 'cffi' RAPIDS on Colab

# Install RAPIDS
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!bash rapidsai-csp-utils/colab/rapids-colab.sh stable
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.7/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ["CONDA_PREFIX"] = "/usr/local"
for so in ['cudf', 'rmm', 'nccl', 'cuml', 'cugraph', 'xgboost', 'cuspatial']:
fn = 'lib'+so+'.so'
source_fn = '/usr/local/lib/'+fn
dest_fn = '/usr/lib/'+fn
if os.path.exists(source_fn):
print(f'Copying {source_fn} to {dest_fn}')
shutil.copyfile(source_fn, dest_fn)
# fix for BlazingSQL import issue
# ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/site-packages/../../libblazingsql-engine.so)
if not os.path.exists('/usr/lib64'):
os.makedirs('/usr/lib64')
for so_file in os.listdir('/usr/local/lib'):
if 'libstdc' in so_file:
shutil.copyfile('/usr/local/lib/'+so_file, '/usr/lib64/'+so_file)
shutil.copyfile('/usr/local/lib/'+so_file, '/usr/lib/x86_64-linux-gnu/'+so_file)
Im successfully able to install RAPIDS using the above script but I simply cant get ride of the following error:
Exception: Version mismatch: this is the 'cffi' package version 1.14.5, located in '/usr/local/lib/python3.7/dist-packages/cffi/api.py'. When we import the top-level '_cffi_backend' extension module, we get version 1.14.3, located in '/usr/local/lib/python3.7/site-packages/_cffi_backend.cpython-37m-x86_64-linux-gnu.so'. The two versions should be equal; check your installation.
I have tried everything from here and enter link description here, upgraded, downgraded, uninstalled, installed but nothing works. Any help would be greatly appreciated.

Tensorflow with R and Anaconda - error "Could not import PIL.Image. The use of `load_img` requires PIL"

There are some answers to this question in a Python environment, but the solutions did not work for my RStudio environment. Here is my code:
library(keras)
library(tensorflow)
use_condaenv("tf")
train_dir = "C:/training_images/"
train_datagen <- image_data_generator(rescale = 1/255)
validation_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
batch <- generator_next(train_generator)
The code works until the last "batch" line where it explodes like this:
Error in py_iter_next(it, completed) :
ImportError: Could not import PIL.Image. The use of `load_img` requires PIL.
Detailed traceback:
File "C:\Users\mory3\ANACON~1\envs\tf\lib\site-packages\keras_preprocessing\image\iterator.py", line 104, in __next__
return self.next(*args, **kwargs)
File "C:\Users\mory3\ANACON~1\envs\tf\lib\site-packages\keras_preprocessing\image\iterator.py", line 116, in next
return self._get_batches_of_transformed_samples(index_array)
File "C:\Users\mory3\ANACON~1\envs\tf\lib\site-packages\keras_preprocessing\image\iterator.py", line 230, in _get_batches_of_transformed_samples
interpolation=self.interpolation)
File "C:\Users\mory3\ANACON~1\envs\tf\lib\site-packages\keras_preprocessing\image\utils.py", line 108, in load_img
raise ImportError('Could not import PIL.Image. '
R version 3.6.1
Conda version 4.7
Python version 3.7
I had this same problem
After a few hours of looking, I came up with a solution that worked for me.
I used this code for solving the PIL problem. I tried using anaconda prompt but this code worked in r for me...
reticulate::py_install("pillow",env=tf)
I came up with this error next...
loaded runtime CuDNN library: 7.4.2 but source was compiled with: 7.6.0.
Make sure you have the correct cudnn version installed. For me it was CUDA 10 with 7.6.0 cudnn with 10. The output of the error will tell you which one to use.
Make sure you have cleaned any extra path variables that are in your environmental variables from installing previous versions.
I'm using windows 10
gpu = GeForce GTX 1060 with Max-Q Design
R - 3.6.1
tensorflow = 1.13
python = 3.7
anaconda = Anaconda3–2019.03-Windows-x86_64.exe
I ended up uninstalling Anaconda altogether, which made troubleshooting the remaining errors in the Python connection to R much simpler.
I had same problem with "Deep Learning with R" CNN example on Win7. I solved it like this:
I added Anaconda3 paths to PATH. In my case it was Windows so paths were like that:
C:\Anaconda3\Scripts;C:\Anaconda3\Library\bin By default there were no paths to conda in $PATH.
installed pillow (it contains PIL) to python with:
pip install pillow
configured r-reticulate.
This answer Could not import PIL.Image even if Pillow already installed? helped me. I had pillow already but conda environment wasn't configured properly so pillow wasn't visible.
Also install Nvidia CUDA if you don't have it - you need it too for tensorflow.

Google bigquery - Error message 'DataFrame' object has no attribute 'to_parquet' whereas pyarrow and fastparquet are installed

I'm trying to use the Google bigquery function load_table_from_dataframe but I get an error message stating that DataFrame object has no attribute to_parquet.
I have installed both pyarrow and fastparquet but still getting the same error message
from google.cloud import bigquery
df_test = pd.DataFrame({'Test_Name':['Charlotte','Alexis'],'Test_Age':[31,12]})
table_id = 'TEST_DF.TEST_TABLE'
job_config = bigquery.LoadJobConfig()
job = client.load_table_from_dataframe(df_test, table_id,job_config=job_config)
job.result()
I'm using Python 3.6.3 and pyarrow version 0.14.0
Any idea on what is causing the issue?
Solved with:
$ pip install --upgrade pandas