pandas installed, but shows ModuleNotFoundError When running python script - pandas

Windows 11;
Tried Anaconda, PyCharm, Python IDLE, and PowerShell;
installed pandas under PyCharm/Terminal, PowerShell:
using pip3 install pandas
In PowerShell, it works with input:
Python
import pandas
In Pycharm,
In the terminal, when I tried to install pandas again, it showed "Requirement already satisfied: pandas in d:\anaconda3\lib\site-packages (1.4.4)..."
However, it still shows "ModuleNotFoundError: No Module named 'pandas' when I run a.py with just one sentence:
import pandas
What's wrong?

Related

Issue with 'pandas on spark' used with conda: "No module named 'pyspark.pandas'" even though both pyspark and pandas are installed

I have installed both Spark 3.1.3 and Anaconda 4.12.0 on Ubuntu 20.04.
I have set PYSPARK_PYTHON to be the python bin of a conda environment called my_env
export PYSPARK_PYTHON=~/anaconda3/envs/my_env/bin/python
I installed several packages on conda environment my_env using pip. Here is a portion of the output of pip freeze command:
numpy==1.22.3
pandas==1.4.1
py4j==0.10.9.3
pyarrow==7.0.0
N.B: package pyspark is not installed on the conda environment my_env. I would like to be able to launch a pyspark shell on different conda environments without having to reinstall pyspark in every environment (I would like to only modify PYSPARK_PYTHON). This would also avoids having different versions of Spark on different conda environments (which is sometimes desirable but not always).
When I launch a pyspark shell using pyspark command, I can indeed import pandas and numpy which confirms that PYSPARK_PYTHON is properly set (my_env is the only conda env with pandas and numpy installed, moreover pandas and numpy are not installed on any other python installation even outside conda, and finally if I change PYSPARK_PYTHON I am no longer able to import pandas or numpy).
Inside the pyspark shell, the following code works fine (creating and showing a toy Spark dataframe):
sc.parallelize([(1,2),(2,4),(3,5)]).toDF(["a", "b"]).show()
However, if I try to convert the above dataframe into a pandas on spark dataframe it does not work. The command
sc.parallelize([(1,2),(2,4),(3,5)]).toDF(["t", "a"]).to_pandas_on_spark()
returns:
AttributeError: 'DataFrame' object has no attribute 'to_pandas_on_spark'
I tried to first import pandas (which works fine) and then pyspark.pandas before running the above command but when I run
import pyspark.pandas as ps
I obtain the following error:
ModuleNotFoundError: No module named 'pyspark.pandas'
Any idea why this happens ?
Thanks in advance
From here, it seems that you need apache spark 3.2, not 3.1.3. Update to 3.2 and you will have the desired API.
pip install pyspark #need spark 3.3
import pyspark.pandas as ps

How to make PyCharm recognise numpy

PyCharm does not see my numpy: thats what i got when Im trying - pip install numpy -
Requirement already satisfied: numpy in /Users/ruslanpilipyuk/opt/anaconda3/lib/python3.7/site-packages (1.18.1)
However, when I write import numpy,as a result I got -
ModuleNotFoundError: No module named 'numpy'
Im using mac
Whenever you import a module, python will search for that module in some specific directories. To know which all directories it will search, use the below given code in python prompt.
$ import sys
$ print(sys.path)
If your NumPy module does not present in any of those directory, then add your NumPy module to the python search path by the following given code in python prompt.
​
$ import sys
$ sys.path.append("/Users/ruslanpilipyuk/opt/anaconda3/lib/python3.7/site-packages (1.18.1)")
$ import NumPy
Hope this will solve your error.
Reference: https://www.edureka.co/community/66413/import-numpy-as-np-importerror-no-module-named-numpy

How to install numpy on vscode on mac?

It's saying:
"ModuleNotFoundError: No module named 'numpy'"
and then when I do "pip install numpy" it says:
Requirement already satisfied: numpy in ./Library/Python/2.7/lib/python/site-packages (1.16.6)"
Probably you are using python 3.X while pip is configured for python 2.X.
Try running pip3 install numpy
Basically you are installing a package for python 2, while running the script with python 3.

ModuleNotFoundError for pandas_datareader: Jupyter Notebook using different packages from conda environment

I am using Anaconda windows v5.3.
I am getting the error:
ModuleNotFoundError: No module named 'pandas_datareader'
When I tried to print out the packages used by Jupyter Notebook, I realized that pandas_datareader is not in, and a different version of pandas (0.23.0) is used:
import pkg_resources
for i in pkg_resources.working_set:
print(i)
Output
...
pandocfilters 1.4.2
pandas 0.23.0
packaging 17.1
openpyxl 2.5.3
...
This differs from the library installed in the pyfinance environment:
>conda list
# Name Version Build
pandas 0.20.3 py36_0
pandas-datareader 0.4.0 py36_0
Hence, pandas_datareader seem to work in the python shell in the command prompt, but not in jupyter notebook. Are there anyways to sync jupyter notebook environment to the conda environment?
I realized to sync jupyter notebook you just have to do:
conda install jupyter

ImportError: Install xlrd >= 0.9.0 for Excel support when using pd.readexcel to read .xlsx file : never happened before

Something strange is going on. Just today when trying to read in a dataframe from an xlsx file:
import pandas as pd
df = pd.read_excel('vlnew.xlsx',sheet_name='Sheet1')
I am getting the following error:
ImportError: Install xlrd >= 0.9.0 for Excel support
I am fully aware that plain and simple the instructions are to install xlrd, but I should not have to install xlrd when I was never getting this error before, and also, xlrd only applies to the old .xls file format. I am using .xlsx.
I can't understand why today all of a sudden this error is popping up. This is very strange indeed, at least to me.
Update:
When I execute this script in the Spyder IDE, I do not get the xlrd import error, but just today I ran this script in the Conda command prompt and only then does it report the xlrd error. Why are there inconsistencies between the Conda command prompt and Spyder IDE?
Try writing following command into the terminal
pip install xlrd
And then import the xlrd alongside with pandas
import xlrd and import pandas as pd
I was getting an error "ImportError: Install xlrd >= 1.0.0 for Excel support" on Pycharm for below code
import pandas as pd
df2 = pd.read_excel("data.xlsx")
print(df2.head(3))
print(df2.tail(3))
Solution : pip install xlrd
It resolved error after using this.
Also no need to use "import xlrd" in program
(2021.01.18)
NOTICE: the current version of "xlrd" reads only ".xls" files
to read ".xlsx" files install openpyxl package.
Just do it in your phyton frame (my is "repl.it") writing
import xlrd
or
openpyxl_
NOTICE: the current version of "xlrd" reads only ".xls" files
As mentioned by you and others correctly that xlrd needs to be installed, for using read_excel we require xlrd package.
This might be one of the possibility of compatibility difference between spyder and conda is that you might be using different conda environments for Spyder and prompt, one of which might contain xlrd package and other did not this has happens usually when we use different virtual environments for our work , it has happened to me many times.
You should try
pip install --upgrade xlrd
juste type
pip install xlrd
and use it like this
import xlrd
import pandas as pd
data=pd.read_excel('titanic3.xls')