Not able to extract a column name using Panda data frame - dataframe

KeyError: 'Name'
>>> df=pd.read_csv(text_file)
>>> print(df)
Name Age
0 Ritesh 32
1 Priyanka 29
>>> print(df['Name'].where(df['Name'] == 'Ritesh'))
Traceback (most recent call last):
File "/Users/reyansh/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Name'
During handling of the above exception, another exception occurred:

read_csv didn't read your file into two columns because you have a space as the separator and the default is a comma. Specify the space:
df = pd.read_csv(text_file, " ")

Related

How to avoid row access by label error in DataFrame?

I have trouble to access rows in DataFrame. My code and the results are asfollows. What's the problem? Please help me.
df = pd.read_excel('./eeg_samples/chanlocs67.xlsx',usecols=
[0,3,4,5],index_col='labels')
df.index.names = [None]
print(df.head())
print(df.loc['Fp1'])
The result is as follows.
X Y Z
'Fp1' 83.9171 29.4367 -6.990
'Fz' 58.5120 -0.3122 66.462
'F3' 53.1112 50.2438 42.192
'F7' 42.4743 70.2629 -11.420
'FT9' 14.5673 84.0759 -50.429
Traceback (most recent call last):
File "C:\ProgramData\mne-python\1.2.1_0\lib\site-
packages\pandas\core\indexes\base.py", line 3803, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 138, in
pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 165, in
pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5745, in
pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5753, in
pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Fp1'
your index values are coming with surrounding quotes . so you need to do this or fix your data in excel file:
print(df.loc["'Fp1'"])

MissingDataError: exog contains inf or nans

Input:
import statsmodels.api as sm
import pandas as pd
# reading data from the csv
data = pd.read_csv('/Users/justkiddings/Desktop/Python/TM/TM.csv')
# defining the variables
x = data['FSP'].tolist()
y = data['RSP'].tolist()
# adding the constant term
x = sm.add_constant(x)
# performing the regression
# and fitting the model
result = sm.OLS(y, x).fit()
# printing the summary table
print(result.summary())
Output:
runfile('/Users/justkiddings/Desktop/Python/Code/untitled28.py', wdir='/Users/justkiddings/Desktop/Python/Code')
Traceback (most recent call last):
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/Users/justkiddings/Desktop/Python/Code/untitled28.py", line 24, in <module>
result = sm.OLS(y, x).fit()
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 890, in __init__
super(OLS, self).__init__(endog, exog, missing=missing,
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 717, in __init__
super(WLS, self).__init__(endog, exog, missing=missing,
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/regression/linear_model.py", line 191, in __init__
super(RegressionModel, self).__init__(endog, exog, **kwargs)
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/model.py", line 267, in __init__
super().__init__(endog, exog, **kwargs)
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/model.py", line 92, in __init__
self.data = self._handle_data(endog, exog, missing, hasconst,
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/model.py", line 132, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/data.py", line 673, in handle_data
return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/data.py", line 86, in __init__
self._handle_constant(hasconst)
File "/Users/justkiddings/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/data.py", line 132, in _handle_constant
raise MissingDataError('exog contains inf or nans')
MissingDataError: exog contains inf or nans
Some of the Data:
DATE,HOUR,STATION,CO,FSP,NO2,NOX,O3,RSP,SO2
1/1/2022,1,TUEN MUN,75,38,39,40,83,59,2
1/1/2022,2,TUEN MUN,72,35,29,30,90,61,2
1/1/2022,3,TUEN MUN,74,38,28,30,91,66,2
1/1/2022,4,TUEN MUN,76,39,31,32,79,61,2
1/1/2022,5,TUEN MUN,72,38,25,26,83,65,2
1/1/2022,6,TUEN MUN,74,37,24,25,86,60,2
I have removed the N.A. in my dataset and they have converted into blanks. (Eg. 3/1/2022,12,TUEN MUN,85,,53,70,59,,5) Why there is MissingDataError? How to fix it? Thanks.

Access dataframe with multi-level index on pandas v1.0.3

I'm trying to access a row of the dataframe dtSortedTable by
dtSortedTable.loc[decisionCountSorted.index[0]]
dtSortedTable is
X0 X1 X2
(D1, G2) A B C
(D2, G1) A A A
(D2, G0) A A C
decisionCountSorted indexes look like:
Index([('D1', 'G2'), ('D2', 'G1'), ('D2', 'G0')], dtype='object')
The indexes of decisionCountSorted are exactly the same as dtSortedTable. The indexes are multilevel with 2 levels. Why am I getting the below error? I need to run some tests on decisionCountSorted and extract the corresponding rows from dtSortedTable. Any help would be hugely appreciated!
Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'D1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/tkazi/Documents/Code/logicsim/logic.py", line 183, in <module>
dtFixed = dtConsensus(dtSortedTable,quorumCount)
File "/Users/tkazi/Documents/Code/logicsim/logic.py", line 119, in dtConsensus
print(dtSortedTable.loc[decisionCountSorted.index[0]])
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexing.py", line 1762, in __getitem__
return self._getitem_tuple(key)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexing.py", line 1272, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexing.py", line 1389, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexing.py", line 1965, in _getitem_axis
return self._get_label(key, axis=axis)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexing.py", line 625, in _get_label
return self.obj._xs(label, axis=axis)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/generic.py", line 3537, in xs
loc = self.index.get_loc(key)
File "/usr/local/Caskroom/miniconda/base/envs/logicsim/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'D1'

Python _getitem_ and _getitem_column(key) error when load data from excel

I have a dataframe, with 20 different sheets. It ran normally for the first 16 sheets, but on the 17th sheet it raised an error. Here is my code:
A=A.sort_values(by=['timing','id'])
The error was:
Traceback (most recent call last):
File "<ipython-input-24-11bf4f35bb1b>", line 1, in <module>
SessionNumber(5)
File "filepath", line 160
DepthBuyA=DepthBuyA.sort_values(by=['timing','id'])
File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 4411, in sort_values
stacklevel=stacklevel)
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1382, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'id'
So I thought, there must be some problem with the column 'id' on that particular sheet, because other sheets also had 'id' and none of which raised an error like that. So I tried:
print(A['id'])
And it successfully printed the column 'id' for sheet 17, however, right after printing it, it raised this error:
File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'
So after that I tried the code by putting it directly into the console, and now there is no error.
A=A.sort_values(by=['timing','id'])
So what is the problem and what can I do to fix?
Thank you!
Used column index instead of name, it is fine now

pandas.read_csv gives FileNotFound error inside a loop

pandas.read_csv is working properly when used as a single statement. But it is giving FileNotFoundError when it is being used inside a loop even though the file exists.
for filename in os.listdir("./Datasets/pollution"):
print(filename) # To check which file is under processing
df = pd.read_csv(filename, sep=",").head(1)
These above lines are giving this following error.
pollutionData184866.csv <----- The name of the file is printed properly.
Traceback (most recent call last):
File "/home/parnab/PycharmProjects/FinalYearProject/locationExtractor.py", line 13, in <module>
df = pd.read_csv(i, sep=",").head(1)
File "/usr/lib/python3.6/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3.6/site-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib/python3.6/site-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/usr/lib/python3.6/site-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib/python3.6/site-packages/pandas/io/parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4184)
File "pandas/parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8449)
FileNotFoundError: File b'pollutionData184866.csv' does not exist
But when I am doing
filename = 'pollutionData184866.csv'
df = pd.read_csv(filename, sep=',')
It is working fine.
What am I doing wrong?
os.listdir("./Datasets/pollution") returns a list of files without a path and according to the path "./Datasets/pollution" you are parsing CSV files NOT from the current directory ".", so changing it to glob.glob('./Datasets/pollution/*.csv') should work, because glob.glob() returns a list of satisfying files/directories including given path
Demo:
In [19]: os.listdir('d:/temp/.data/629509')
Out[19]:
['AAON_data.csv',
'AAON_data.png',
'AAPL_data.csv',
'AAPL_data.png',
'AAP_data.csv',
'AAP_data.png']
In [20]: glob.glob('d:/temp/.data/629509/*.csv')
Out[20]:
['d:/temp/.data/629509\\AAON_data.csv',
'd:/temp/.data/629509\\AAPL_data.csv',
'd:/temp/.data/629509\\AAP_data.csv']