pd.read_excel crashes (hangs) on certain files - pandas

FOA I've never had pandas crash (freeze, loop infinitely) on me before. Second it's not the files, they were reading well before.
Doing a bit of research I stumbled upon this issue where the cause is traced back to pd._libs.cp36.
Another similar this
I looked up my own pd.libs to find diverse .py files like algos.cp38-win....
A couple things fell to mind. First that I upgraded to python 3.8.
Environment is called work38 btw
But trying on a different environment didn't work
The only other thing is that I installed fbprophet. To install fbprophet I installed pystan.
To install pystan I had to run this command as per their docs
conda install libpython m2w64-toolchain -c msys2.
There are many guides to installing pystan that encourage you to install in a particular order (first pystan, then numpy cython pandas etc). Idk if there's a reason for this.
In any case, my idea is that the code above f***ed up my whole anaconda environment with some c compilers and now pandas is broken in all environments, even if I pip uninstall & pip install --no-cache-dir pandas.
2 Questions:
First one is, do you know what's happening here? Could you explain me?
And second, any idea how I can repair this? Or must I uninstall anaconda an reinstall everything (then of course pip install -r requirements.txt)
Edit: Maybe the C compiler stuff is unrelated. I just let read_excel run for a painful amount of time and it returned a dataframe with 65000 rows and 250 columns.
I see that when I convert the xlsx to csv (with a CLI script) the csv contains a bunch of empty rows and columns.
TLDR: I have an xlsx file with 250 rows and ~20 columns but apparently the empty cells aren't empty?

Related

Pandas Import Error when converting Python script to .exe with Pyinstaller

I am currently trying to convert my Python script that automates an Excel task for my boss to an executable so that I can send it to him and he can use it without having Python installed. One of the libraries I used in the script is Pandas, and when I run the .exe after successfully building it, I get "failed to execute script 'main' due to unhandled exception: No module named pandas."
This is what I used to build the .exe:
pyinstaller --onefile -w main.py
Obviously I have Pandas installed on my machine and the script works perfectly fine when I run it normally, so I don't really know why I am getting this error especially since I thought the whole point of me converting the script to an executable was to get rid of the need for other packages to be installed.
I have read other articles on here about a similar error relating to numpy, but none of the other solutions have helped me. I have tried doing --hidden-import pandas while building the executable and downgrading pandas to an older version, both to no success.
Any suggestions with help!

Running python commands in a terminal in Google Colab

I've uploaded a directory of codes in Google Colabs. I need to run python command lines in a terminal that I'm unable to open.
I tried each and every solution suggested in How can I run shell (terminal) in Google Colab? but to no avail.
Updated 20220202
Can you try to execute your python scripts using the exclamation mark ! from Colab's cell directly?
I believe I encountered identical issue back in my Colab days, the process will hung or froze over some time, especially when I am dealing with GBs of datasets. So, ultimately I just ran using the exclamation mark directly from my Colab's cell to resolve the issue.
Have you tried this with the following syntax?
!pip install google-colab-shell
from google_colab_shell import getshell
getshell()
getshell(height=400)
I understood that you have tried the solution from another post, but just in case that you missed out the !, and ignored the warning message, and that exclamation mark was actually causing the shell not spawning.

SSL error while using pip install to install tensorflow-gpu

I am trying to install tensorflow-gpu by running pip install tensorflow-gpu Windows, inside an Anaconda enviornment, but I am getting the following error:
Could not install packages due to an EnvironmentError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1977)
I also tried doing it in a command prompt with administrator access, but it still didn't work.
C:\WINDOWS\system32>pip install tensorflow-gpu
Collecting tensorflow-gpu
Downloading https://files.pythonhosted.org/packages/2f/84/b6dfafe3282101f7d3a9410652ab4e6dc73f981fd63a40be0b47ff3bac3a/tensorflow_gpu-1.9.0-cp35-cp35m-win_amd64.whl (103.3MB)
19% |###### | 19.9MB 2.6MB/s eta 0:00:32
Could not install packages due to an EnvironmentError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1977)
All other network-related activities(such as browsing the web) work properly on the computer. I also have the correct CUDA and cuDNN installed, along with the latest NVIDIA drivers.
I fixed this problem by purging my Anaconda installation and reinstalling it, and then installing tensorflow-gpu. However, I am still curious as to why this error even happened in the first place.
It's related to network connection. I faced the same error in my Ubuntu system. Hence solved it by changing my network(WiFi) connection.
Use the following syntax while installation packages:
pip install --user packagename
I had the same problem with PyCharm to install tensorflow. As Dimitri Bolt described, I started the CMD as administrator and used the sytax described by Devendra Kanade. I got the error again, but each time I started the command, the progress became bigger. After three or four times, the download was successful and I was able to import tensorflow in PyCharm.
You should note "Downloading" (19%, not finished yet) in the pip-msg just before the error. For pip, this means 99% that this is a download error.
The quick fix is to increase <retries> in the pip option :
--retries <retries> Maximum number of retries each connection should attempt (default 5 times).
NB: I've never tried this.
The general fix using normal download (I tried this to install PyQt5) :
a) Find the name of the problematic web-file in your pip-errmsg, enclosed in quotes, then google the whole (including quotes). In your case, that will be "tensorflow_gpu-1.9.0-cp35-cp35m-win_amd64.whl" .
b) Choose a reliable site and download that file. Windows 10 may give you erroneous msg about some .part-file but just ignore that msg and download again.
c) Load your browser list for downloaded files. For Firefox, ctrl-J will load that list (library). If failure, click to repeat download on the list (not site). You may need to repeat the clicking 10 times for the download to be successful !!!
d) Suppose that "tensorflow_gpu-1.9.0-cp35-cp35m-win_amd64.whl" is now in the local folder c:\Users\uuuu\Downloads\ .
Create in this folder an ansi-text file named "example-requirements.txt" which has 3 lines(2 empty lines) as below:
tensorflow_gpu-1.9.0-cp35-cp35m-win_amd64.whl
e) Now in a normal command prompt, issue 3 commands as below:
c:
cd c:\Users\uuuu\Downloads\
pip install --requirement example-requirements.txt --no-cache-dir
NB: you can copy and paste, very easy in Win 10.
f) If successful, done now !
g) If you again have pip-problem with downloading another web-file then you must repeat a) , b) , c) , d) and edit the old "example-requirements.txt" to contain 4 lines(2 empty lines) as below:
tensorflow_gpu-1.9.0-cp35-cp35m-win_amd64.whl
another-file-name-with-extension
h) Repeat e) , f) , g) ... ...
NB: DOC for pip install-options are found on https://pip.pypa.io/en/stable/cli/pip_install/#
(search for "Example Requirements File" on the page).
stumbled upon the same error while installing via conda, updating conda solved the problem (btw, the new version downloaded the packages sequentially, whilst the older attempted a parallel download)
Reinstall the library again!
I face this problem for several libraries like tensorflow-gpu and matplotlib
I have no idea why but if i found something i will share.

Highcharts-convert missing labels

I have the same code, running the same highcharts-convert.js and phantomjs on two servers. One produces perfect charts images the other is missing all labels. Does anyone know why or where to start looking?
This is most likely missing font packages on the failing host. Highcharts-convert uses the fonts available to it, but will silently skip labels if there aren't any available. I had this happen and running
sudo yum install dejavu-fonts-common dejavu-sans-fonts dejavu-serif-fonts libXfont xorg-x11-fonts-Type1
fixed it. I don't yet know what subset of those packages would have been sufficient, but I suspect "libXfont xorg-x11-fonts-Type1" would do it.

How can I install matplotlib for my AWS Elastic Beanstalk application?

I'm having a hell of a time deploying matplotlib on AWS Elastic Beanstalk. I gather that my issue comes from some dependencies and the way that EB deploys packages installed with PIP, and have attempted to follow the instructions here on SO for resolving the issue.
I first tried incrementally deploying, as suggested in the linked answer, by adding pieces of the matplotlib package stack to my requirements.txt file in stages. But this takes forever (for each stage) and is prone to failure and timing out (which seems to leave build directories behind that stall subsequent package installations).
So the simple solution mentioned off-handedly at the end of the answer appeals to me: just eb ssh, activate the virtialenv with
source /opt/python/run/venv/bin/activate
and pip install packages manually. But I can't get this to work either. First I'm often confronted with left-beind build directories (as mentioned above)
pip can't proceed with requirement 'xxxx' due to a pre-existing build directory.
location: /opt/python/run/venv/build/xxxx
This is likely due to a previous installation that failed.
pip is being responsible and not assuming it can delete this.
Please delete it and try again.
But even after removing these, I consistently get
Exception:
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1197, in prepare_files
do_download,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1375, in unpack_url
self.session,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/download.py", line 582, in unpack_http_url
unpack_file(temp_location, location, content_type, link)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 625, in unpack_file
untar_file(filename, location)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 533, in untar_file
os.makedirs(location)
File "/opt/python/run/venv/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/opt/python/run/venv/build/xxxx'
in response to pip install xxxx (and sudo pip fails with sudo: pip: command not found).
What can I do to get this working on AWS-EB? In particular, what do I need to do to get the simple SSH+PIP approach working; or is there some other better — simpler! — approach I should try.
FWIW, I have a .ebextensions/software.config with
packages:
yum:
gcc-c++: []
gcc-gfortran: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
and a requirements.txt that ends with
pytz==2014.10
pyparsing==2.0.3
python-dateutil==2.4.0
nose==1.3.4
six>=1.8.0
mock==1.0.1
numpy==1.9.1
matplotlib==1.4.2
After about 4 hours, I've gotten far as numpy (as reported by pip list in the EB virtualenv).
And (in case it matters) the user who is SSHing is part in a group with the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticbeanstalk:*",
"ec2:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"s3:*",
"sns:*",
"cloudformation:*",
"rds:*",
"sqs:*",
"iam:PassRole"
],
"Resource": "*"
}
]
}
I have used many approaches to build and deploy numpy/scipy/matplotlib, on Windows as well as Linux systems. I have used system-provided package managers (aptitude, rpm), 3rd-party package managers (pypm), Python package managers (easy_install, pip), source releases, used different build environments/tools (GCC, but also Intel MKL, OpenMP). While doing so, I have run into many many quite annoying situations, but have also learned a lot about the pros and cons of each approach.
I have no experience with Elastic Beanstalk (EB), but I have experience with EC2. I see that you can SSH into an instance and poke around. So, what I suggest further below is based on
above-stated experiences and on
the more or less obvious boundary conditions regarding Beanstalk and on
your application scenario, described in another question here on SO and on
the fact that you just want to get things running, quickly
My suggestion: start off with not building these things yourself. Do not use pip. If possible, try to use the package manager of the Linux distribution in place and let it handle the installation of everything required for you, with a single command (e.g. sudo apt-get install python-matplotlib).
Disadvantages:
possibly old package versions, depending on the Linux distro in use
non-optimized builds (e.g. not built against e.g. Intel MKL or not leveraging OpenMP features or not using special instruction sets)
Advantages:
it quickly downloads, because packages are most likely cached near your machine
it quickly installs (these packages are pre-built, no compilation involved)
it just works
So, I hope you can just use aptitude or rpm or whatever on these machines and inherit the great work that the distribution package maintainers do for you, behind the scenes.
Once you are confident in your application and identified some bottleneck or issue, you might have reason to use a newer version of numpy/matplotlib/... or you might have reason to have a faster version of these, by creating an optimized build.
Edit: EB-related details of outlined approach
In the meantime we have learned that EB by default runs Amazon Linux which is based on Red Hat Enterprise Linux. Likewise, it uses yum as package manager and packages are in RPM format.
Amazon provides documentation about available packages. In Amazon Linux 2014.09, these packages are available: http://aws.amazon.com/de/amazon-linux-ami/2014.09-packages/
In this list we find
numpy-1.7.2
python-matplotlib-0.99.1.2
This version of matplotlib is very old, according to the changelog it is from September 2009: "2009-09-21 Tagged for release 0.99.1".
I did not anticipate it to be so old, but still, it might be sufficient for your needs. So we proceed with our plan (but I'd understand if that's a blocker).
Now, we have learned that system Python and EB Python are isolated from each other. That does not mean that EB Python cannot access system Python site packages. We just need it to tell so. A simple and clean method is to set up a proper directory structure with the packages that should be accessible to EB Python, and to communicate this directory to EB Python via sys.path.
Clearly, we need to customize the bootstrapping phase of EB containers. The available tools are documented here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
Obviously, we want to make use of the packages approach, and tell EB to install the numpy and python-matplotlib packages via yum. So the corresponding config file section should contain:
packages:
yum:
numpy: []
python-matplotlib: []
Explicitly mentioning numpy might not be necessary, it likely is a dependency of python-matplotlib.
Also, we need to make use of the commands section:
You can use the commands key to execute commands on the EC2 instance.
The commands are processed in alphabetical order by name, and they run
before the application and web server are set up and the application
version file is extracted.
The following three commands create above-mentioned directory, and set up symbolic links to the numpy/mpl installation paths (these paths hopefully are available in the moment these commands become executed):
commands:
00-create-dir:
command: "mkdir -p /opt/py26-selected-site-packages"
01-link-numpy:
command: "ln -s /usr/lib64/python2.6/site-packages/numpy /opt/py26-selected-site-packages/numpy"
02-link-mpl:
command: "ln -s /usr/lib64/python2.6/site-packages/matplotlib /opt/py26-selected-site-packages/matplotlib"
Two uncertainties: the AWS docs to not clarify that packages are processed before commands are executed. You have to try. It it does not work, use container_commands. Secondly, it is just an educated guess that /usr/lib64/python2.6/site-packages/matplotlib is available after installing python-matplotlib. It should be installed to this place, but it may end up somewhere else. Needs to be tested. Numpy should end up where specified as inferred from this article.
[UPDATE FROM SEB]
AWS documentation says "The cfn-init helper script processes these configuration sections in the following order: packages, groups, users, sources, files, commands, and then services."
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html
So, your approach is safe
[/UPDATE]
The crucial step, as pointed out in the comments to this answer, is to tell your Python app where to look for packages. Direct modification of sys.path before attempting to import is a reliable method to take control of this. The following code adds our special directory to the selection of directories in which Python looks out for packages, and then attempts to import matplotlib:
sys.path.append("/opt/py26-selected-site-packages")
from matplotlib import pyplot
The order in sys.path defines priorities, so in case there is any other matplotlib or numpy package available in one of the other directories, it might be a better idea to
sys.path.insert(0, "/opt/py26-selected-site-packages")
However, this should not be necessary if our whole approach was well thought-through.
To add to Jan-Philip Answer :
AWS Elastic Beanstalk is using Amazon Linux distribution (except for .Net environments). Amazon Linux uses the yum package manager. MatPlotLib is available in Amazon's software repository.
[ec2-user#ip-1-1-1-174 ~]$ yum list | grep matplot
python-matplotlib.x86_64 0.99.1.2-1.6.amzn1 amzn-main
If this version is the one you need for your application, I would try to simply modify your .ebextensions/software.config file and to add the package to the yum section of it:
packages:
yum:
python-matplotlib: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
A last note about AWS Elastic BeansTalk and SSH.
While Amazon gives you the possibility to SSH to your Elastic Beanstalk instances, you should use this possibility only for debugging purposes, to understand why your app failed or is not installing as suggested.
Other than that, your deployment must be 100% automatic. When Elastic Beanstalk (Auto Scaling to be precise) will scale out your infrastructure (add more instances) or scale it in (terminate instances) depending on your application workload, all your manual configuration will be lost.
Best practices is to not install SSH keys on your production environment, it further reduces the surface of attacks.
I might be a bit late to this question, but as AWS and a lot of the cloud service providers are moving into Docker and taking into consideration that you haven't specified the platform . I have a fast solution to your question:
Use the generic docker platform.
I created some images with Python, Numpy, Scipy and Matplotlib preinstalled, so you can directly pull and start using them with one line of code.
Python 2.7(This one also has the versions that you were specifying for numpy and matplotlib)
sudo docker pull chuseuiti/pynuscimat2.7
Python 3.4
sudo docker pull chuseuiti/pynusci
However you can create your own image or modify existing images.
In case you want to automate your instances, you can pass a Dockerfile to AWS with the definition of your image.
Tip, in case you don't know about docker:
It is need to login before been able to pull:
sudo docker login
After pulling the image, you can generate and work in a container created from an image with the next code:
sudo docker run -i -t chuseuiti/pynuscimat2.7 bash
PS. At least with the free tier AWS is always complaining about running out of time with scipy and matplotlib, it takes too much time to install them, that is why I use this option.