Reading CSV file and manipulating the components (a newbie question) - pandas

So, I have just started learning python. I am trying to read a .csv file (https://www.dropbox.com/s/fp1g32uv2cljd1n/adcpDat.csv?dl=0)
in python.
I can read in the file but then when I want to choose one of the components it returns Traceback (most recent call last) error.
import os
import csv
import pandas as pd
import numpy as np
os.chdir("/Users/K1/Documents/Work/UGA/Cruise/GC600-MP/Data/ADCP/")
print("Current Working Directory ", os.getcwd())
adcpDat = pd.read_csv("adcpDat.csv")
print(adcpDat.shape)
output is
Current Working Directory /Users/K1/Documents/Work/UGA/Cruise/GC600-MP/Data/ADCP
(805945, 1)
but when I run for example,
adcpDat[3]
it just returns an error.
How can I pick the components?

You first have to specify the column name, then the row number:
adcpDat['rowname'][3]
In the case of your csv file it would be:
adcpDat['tADCP'][3]
This is because the first line of the csv file specifies the row name which is tADCP

Related

Cant import my csv file in jupyter notebbok

I have put my CSV file in the same folder as running jupyter notebook, still can't able to import it.
You need read to a df first:
df = pd.read_csv('name.csv') # (the file name of your csv)
df

import pandas as pd NameError: name 'null' not defined on jupyter notebook

Hello I'm currently taking a data analyst bootcamp course on Udemy and I'm using jupyter notebook with python version 3.9. I'm currently learning how to use pandas library I installed it on my computer and I even upgraded it to version 1.1.4. When I run
import pandas as pd
and execute the cell I get this error message
NameError Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>
----> 1 import pandas as pd
~\pandas.py in <module>
25 {
26 "cell_type": "code",
---> 27 "execution_count": null,
28 "metadata": {},
29 "outputs": [],
NameError: name 'null' is not defined
I tried restarting the kernel and also restart and clear output but it's still giving me this error.
You may have a local file named pandas.py. Delete the local file pandas.py and rerun it. That will resolve.
The import statement is trying to import your local file instead of the pandas library
You may have a local file named pandas.py. Delete the local file pandas.py and rerun it. That will solve that.
The import statement is importing your local file instead of the pandas library.

Pandas - xls to xlsx converter

I want python to take ANY .xls file from given location and save it as .xlsx with original file name? How I can do that so anytime I paste file to location it will be converted to xlsx with original file name?
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(??)
Your code seems to be perfectly fine. In case you are only missing the correct way to write it with the given name, here you go.
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(f"{os.path.splitext(filename)[0]}.xlsx")
A possible extension to convert any file that gets pasted inside the folder can be implemented with an infinite loop, for instance:
import pandas as pd
import os
import time
while True:
files = os.listdir('./')
for filename in files:
out_name = f"{os.path.splitext(filename)[0]}.xlsx"
if filename.endswith('.xls') and out_name not in files:
df = pd.read_excel(filename)
df.to_excel(out_name)
time.sleep(10)

Pandas groupby. Grouping covid19 cases by continents

File
https://www.dropbox.com/sh/cx9kasx83qmsi33/AABfOzVgzBuQe2ORU_t65J4Ta?dl=0
What I have done.
Here is the code I used
import pands as pd
Read the csv file and set the file as 'covid'
covid.groupby('continent').TotalCases() it generates
KeyError: 'continent'

Python and netCDF sciprt do not operate anymore

I am using a python 2.6 script, that I have been using for quite a while now and I get an error that it shouldn't be there. The python script is run form the location of where the netCDF file is located, here is the code
from numpy import *
import numpy as numpy
from netCDF4 import Dataset
import datetime as DT
from time import strftime
import os
floc ='/media/USB-HDD/NCEP_NCAP data/data_2010/' #location of directory that the file resides
fname ='cfsr_Scotland_2010' # name of the netCDF file
in_ext = '.nc' # ending extentsion of the netCDF
basetime = DT.datetime(2010,01,01,0,0,0) # Initial time (start) for the netCDF
ncfile = Dataset(floc+fname+in_ext,'r') # netCDF assigned name
time = ncfile.variables['time']
lon = ncfile.variables['lon']
lat = ncfile.variables['lat']
uwind = ncfile.variables['10u']
vwind = ncfile.variables['10v']
ht = ncfile.variables['height']
I get the error in the ncfile naming, which is odd cause I checked the way its written
Traceback (most recent call last):
File "CFSR2WIND.py", line 24, in <module>
ncfile = Dataset(floc+fname+in_ext,'r') # netCDF assigned name
File "netCDF4.pyx", line 1317, in netCDF4.Dataset.__init__ (netCDF4.c:14608)
RuntimeError: No such file or directory
Does anybody know why and what caused this, and how can It be solved
thank you
george
Try using the netcdf module from scipy instead:
from scipy.io.netcdf import netcdf_file as Dataset
Couple other suggestions:
Importing numpy. You're importing it twice, and it's a bit dangerous to read in all instances using *. By convention, most people abbreviate numpy as np and load it as import numpy as np. Then you can call instances from numpy using np.mean() for example.
Concatenating the path, filename, and file extension. It's OK to use string concatenation using the + sign, but there is another way to do this using the join command. So, the total filename would be something like filename = ''.join([floc, fname, in_ext]).