Pandas groupby. Grouping covid19 cases by continents - pandas

File
https://www.dropbox.com/sh/cx9kasx83qmsi33/AABfOzVgzBuQe2ORU_t65J4Ta?dl=0
What I have done.
Here is the code I used
import pands as pd
Read the csv file and set the file as 'covid'
covid.groupby('continent').TotalCases() it generates
KeyError: 'continent'

Related

Generate graphs for all the columns in a excel file with Pandas in Google Colab

I'm trying to generate graphs for all the columns from a excel file but I'm having one error. My goal is getting this graphs in png files and then download it. Let me give you some context: I'm reading a csv, for each column, I'm trying to use a for to use .value_counts() and then create a graph once the graph is generated saving this one in a png file with the number of the index my code is this one:
import pandas as pd
from google.colab import files
from matplotlib import pyplot as plt
df = pd.read_excel('columns.xlsx')
for i in df.columns:
print(i)
#i.value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
#plt.savefig('viz_movies.png')
Error in this code line:
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
Error:
index 0 is out of bounds for axis 0 with size 0
Also I want to add in the for the names of the files something like this:
for index, number in enumerate(numbers):
plt.savefig('index.png') #use the index as a name

pandas read csv giving creating incorrect columns

I am using pandas to read a csv file
import pandas as pd
data = pd.read_csv('file_name.csv', "ISO-8859-1")
Op:
col1,col2,col3
"sample1","sample2","sample3"
but the DF is just creating 1 column instead of the 5 it is supposed to create. I checked the csv and its fine.
Any suggestions on why this could be happening will be useful.

How to convert PDF to excel using tabula-py into dataframe of several tables?

I have a PDF file where are several tables, For example:
Table from PDF File
By the way, I learned that I have to use tabula-py from Java (Note: I'm working on Jupyter Notebook
So, I code this:
import pandas as pd
import numpy as np
import tabula
from tabula import read_pdf
pdf_path = "..\PDFs\pobreza2.pdf" #File direction
df=tabula.read_pdf(pdf_path, pages="all", stream=True, guess=False, multiple_tables=True) #PDF have many pages with several tables
And I get this:
Output of the code
It's like a list and not a dataframe
So, how could I get this table into a Dataframe? The tables have string and float object

Reading CSV file and manipulating the components (a newbie question)

So, I have just started learning python. I am trying to read a .csv file (https://www.dropbox.com/s/fp1g32uv2cljd1n/adcpDat.csv?dl=0)
in python.
I can read in the file but then when I want to choose one of the components it returns Traceback (most recent call last) error.
import os
import csv
import pandas as pd
import numpy as np
os.chdir("/Users/K1/Documents/Work/UGA/Cruise/GC600-MP/Data/ADCP/")
print("Current Working Directory ", os.getcwd())
adcpDat = pd.read_csv("adcpDat.csv")
print(adcpDat.shape)
output is
Current Working Directory /Users/K1/Documents/Work/UGA/Cruise/GC600-MP/Data/ADCP
(805945, 1)
but when I run for example,
adcpDat[3]
it just returns an error.
How can I pick the components?
You first have to specify the column name, then the row number:
adcpDat['rowname'][3]
In the case of your csv file it would be:
adcpDat['tADCP'][3]
This is because the first line of the csv file specifies the row name which is tADCP

Beckhoff TwinCat Scope CSV Format into pandas dataframe

After recording data in Beckhoff TwinCAT Scope, one can export this data to a CSV file. Said CSV file, however, has a rather complicated format. Can anyone suggestion the most effective way to import such a file into a pandas Dataframe so I can perform analysis?
An example of the format can be found here:
https://infosys.beckhoff.com/english.php?content=../content/1033/tcscope2/html/TwinCATScopeView2_Tutorial_SaveExport.htm&id=
No need to write a custom parser. Using the example data scope_data.csv:
Name,fasd,,,,
File,C;\,,,,
Start,dfsd,,,,
,,,,,
,,,,,
Name,Peak,Name,PULS1,Name,SINUS_FAST
Net id,123.123.123,Net id,123.123.124,Net Id,123.123.125
Port,801,Port,801,Port,801
,,,,,
0,0.6113936598,0,0.07994111349,0,0.08425652468
0,0.524852539,0,0.2051963401,0,0.4391185847
0,0.4993723482,0,0.2917317117,0,0.4583736263
0,0.5976553194,0,0.8675482865,0,0.8435987898
0,0.06087224998,0,0.7933980583,0,0.5614294705
0,0.1967968423,0,0.3923966599,0,0.1951608414
0,0.9723649064,0,0.5187276782,0,0.7646786192
You can import as follows:
import pandas as pd
scope_data = pd.read_csv(
"scope_data.csv",
skiprows=[*range(5), *range(6, 9)],
usecols=[*range(1, 6, 2)]
)
Then you get
>>> scope_data.head()
Peak PULS1 SINUS_FAST
0 0.611394 0.079941 0.084257
1 0.524853 0.205196 0.439119
2 0.499372 0.291732 0.458374
3 0.597655 0.867548 0.843599
4 0.060872 0.793398 0.561429
I don't have the original scope csv, but a little adjustment of skiprows and use_cols should give you the desired result.
To read the bulk of the file (ignoring the header material) use the skiprows keyword argument to read_csv:
import pandas as pd
df = pd.read_csv('data.csv', skiprows=18)
For the header material, I think you'd have to write a custom parser.