I have a csv file with dates.
import pandas as pd
spam=pd.read_csv('DATA.csv', parse_dates=[0], usecols=[0], header=None)
spam.shape
is (n,1)
How can I call an element as I do in Numpy (ex. I have an array A.shape => (n,1), if I call A[5,1] I get the element on the 5th row in the 1st column) ?
Numpy arrays index at zero, so you'll actually need A[4,0] to get the element on the 5th row of the 1st column.
But this is how you'd get the same as Numpy Arrays.
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(2,2)) # create a 2 by 2 DataFrame object
>>> df.ix[1,1]
-1.206712609725652
>>> df
0 1
0 -0.281467 1.124922
1 0.580617 -1.206713
iloc is for integers only, whereas ix will work for both integers and labels, and is available in older versions of Pandas.
>>> df.iloc[1,1]
-1.206712609725652
Related
I have a .csv file and I want to convert it to numpy dtype('float64')
my code
import pandas as pd
import numpy as np
from pandas import read_csv
df=read_csv('input.csv')
df=df['data']
df.to_numpy() ---> produces numpy array of object data type and i want it to be dtype('float64')
Hope experts may help me.Thanks.
Data sample
0 -3.288733e-08, 1.648743e-09, 2.202711e-08, 2.7...
1 2.345769e-07, 2.054583e-07, 1.610073e-07, 1.14...
2 -1.386798e-07, -8.212822e-08, -4.192486e-08, -...
3 -4.234607e-08, 2.526512e-10, 2.222485e-08, 3.3...
4 3.899913e-08, 5.349818e-08, 5.65899e-08, 5.424...
...
If same number of floats in each row is possible use Series.str.split with cast to floats and then convert DataFrame to numpy array:
df=read_csv('input.csv')
arr = df['data'].str.split(', ', expand=True).astype(float).to_numpy()
How would I access the individual elements of the dataFrame below?
More specifically, how would I retrieve/extract the string "CN112396173" in the index 2 of the dataFrame?
Thanks
A more accurate description of your problem would be: "Getting all first words of string column in pandas"
You can use data["PN"].str.split(expand=True)[0]. See the docs.
>>> import pandas as pd
>>> df = pd.DataFrame({"column": ["asdf abc cdf"]})
>>> series = df["column"].str.split(expand=True)[0]
>>> series
0 asdf
Name: 0, dtype: object
>>> series.to_list()
["asdf"]
dtype: object is actually normal (in pandas, strings are 'objects').
I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6
I have a file that is not beautiful and searchable so i downloaded it in the csv format. It contains 4 columns and 116424 rows.
I'm not able to plot its three columns namely Year, Age and Ratio onto a heat map.
The link for the csv file is: https://gist.github.com/JustGlowing/1f3d7ff0bba7f79651b00f754dc85bf1
import numpy as np
import pandas as pd
from pandas import DataFrame
from numpy.random import randn
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('new_file.csv')
print(df.info())
print(df.shape)
couple_columns = df[['Year','Age','Ratio']]
print(couple_columns.head())
Error
C:\Users\Pranav\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/Pranav/PycharmProjects/takenmind/Data_Visualization/a1.py
Traceback (most recent call last):
RangeIndex: 116424 entries, 0 to 116423
File "C:/Users/Pranav/PycharmProjects/takenmind/Data_Visualization/a1.py", line 12, in
Data columns (total 4 columns):
couple_columns = df[['Year','Age','Ratio']]
AREA 116424 non-null object
File "C:\Users\Pranav\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 2682, in getitem
YEAR 116424 non-null int64
AGE 116424 non-null object
RATIO 116424 non-null object
dtypes: int64(1), object(3)
memory usage: 2.2+ MB
None
(116424, 4)
return self._getitem_array(key)
File "C:\Users\Pranav\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Users\Pranav\AppData\Roaming\Python\Python36\site-packages\pandas\core\indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: "['Year' 'Age' 'Ratio'] not in index"
It seems that your columns are uppercase from the info output: YEAR 116424 non-null int64. You should be able to get e.g. the year column with df[['YEAR']].
If you would rather use lowercase, you can use
df = pd.read_csv('new_file.csv').rename(columns=str.lower)
The csv has some text in the top 8 lines before your actual data begins. You can skip those by using the skiprows argument
df = pd.read_csv('f2m_ratios.csv', skiprows=8)
Lets say you want to plot heatmap only for one Area
df = df[df['Area'] == 'Afghanistan']
Before you plot a heatmap, you need data in a certain format (pivot table)
df = df.pivot('Year','Age','Ratio')
Now your dataframe is ready for a heatmap
sns.heatmap(df)
Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame.
I've found how to store my pandas DataFrame to pytables... then read I want to read it back, at this point it will have:
"kind = v._v_attrs.pandas_type"
I could write it out as csv and re-read it in but that seems silly. It is what I am doing for now.
How should I be reading pytable objects into pandas?
import tables as pt
import pandas as pd
import numpy as np
# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))
# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array
# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes
Beware that your u2 (unsigned 2-byte integer) will end as an i8 (integer 8 byte), and the strings will be objects, because Pandas does not yet support the full range of dtypes that are available for Numpy arrays.
The docs now include an excellent section on using the HDF5 store and there are some more advanced strategies discussed in the cookbook.
It's now relatively straightforward:
In [1]: store = HDFStore('store.h5')
In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [4]: store['df'] = df
In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[2,2])
And to retrieve from HDF5/pytables:
In [6]: store['df'] # store.get('df') is an equivalent
Out[6]:
A B
0 1 2
1 3 4
You can also query within a table.