Convert IEX Finance API data to pandas dataframe - pandas

I want to pull data from the IEX finance api and put it into a pandas dataframe but I don't know the correct code. Can someone help?
URL call for the api =
https://api.iextrading.com/1.0/stock/aapl/chart/1d?chartInterval=5
I tried the below but it doesn't work
import pandas as pd
api_call = 'https://api.iextrading.com/1.0/stock/aapl/chart/1d?chartInterval=5'
price = pd.read_csv(api_call)

The data is in JSON format. To load into dataframe you have to call read_json function.
import pandas as pd
df = pd.read_json("https://api.iextrading.com/1.0/stock/aapl/chart/1d?chartInterval=5")

Related

Is there a way to speed up the conversion of spark dataframe to pandas dataframe?

I tried to convert spark dataframe to pandas in databricks notebook with pyspark. It takes for ever running. Is there a better way to do this? There are more than 600,000 rows.
df_PD = sparkDF.toPandas()
df_PD = sparkDF.toPandas()
Can you try changing your import statement and importing the Pandas API for Spark?
import pyspark.pandas as pd
df_PD = sparkDF.to_pandas()

Google cloud blob: XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

I want to import several xml files from a bucket on GCS and then parse them into a pandas Dataframe. I found the pandas.read_xml function do to this which is great. Unfortunately
I keep getting the error:
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
I checked the xml files and they look fine.
This is the code:
from google.cloud import storage
import pandas as pd
#importing the data
client = storage.Client()
bucket = client.get_bucket('bucketname')
df = pd.DataFrame()
from google.cloud import storage
import pandas as pd
#parsing the data into pandas df
for blob in bucket.list_blobs():
print(blob)
split = str(blob.name).split("/")
country = split[0]
data = pd.read_xml(blob.open(mode='rt', encoding='iso-8859-1', errors='ignore'), compression='gzip')
df["country"] = country
print(country)
df.append(data)
When I print out the blob it gives me :
<Blob: textkernel, DE/daily/2020/2020-12-19/jobs.0.xml.gz, 1612169959288959>
maybe it has something to do with the pandas function trying to read the filename and not the content? Does someone have an idea about why this could be happening?
thank you!

reading csv file into python pandas

I want to read a csv file into a pandas dataframe but I get an error when executing the code below:
filepath = "https://drive.google.com/file/d/1bUTjF-iM4WW7g_Iii62Zx56XNTkF2-I1/view"
df = pd.read_csv(filepath)
df.head(5)
To retrieve information or data from google drive, at first, you need to identify the file id.
import pandas as pd
url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
df = pd.read_csv(dwn_url)
print(df.head())
Try the following code snippet to read the CSV from Google Drive into the pandas DataFrame:
import pandas as pd
url = "https://drive.google.com/uc?id=1bUTjF-iM4WW7g_Iii62Zx56XNTkF2-I1"
df = pd.read_csv(url)
df.head(5)

create a dask dataframe from a dictionary

I have a dictionary like this:
d = {'Caps': 'cap_list', 'Term': 'unique_tokens', 'LocalFreq': 'local_freq_list','CorpusFreq': 'corpus_freq_list'}
I want to create a dask dataframe from it. How do I do it? Normally, in Pandas, is can be easily imported to a Pandas df by:
df = pd.DataFrame({'Caps': cap_list, 'Term': unique_tokens, 'LocalFreq': local_freq_list,
'CorpusFreq': corpus_freq_list})
Should I first load into a bag and then convert from bag to ddf?
If your data fits in memory then I encourage you to use Pandas instead of Dask Dataframe.
If for some reason you still want to use Dask dataframe then I would convert things to a Pandas dataframe and then use the dask.dataframe.from_pandas function.
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(...)
ddf = dd.from_pandas(df, npartitions=20)
But there are many cases where this will be slower than just using Pandas well.

Beckhoff TwinCat Scope CSV Format into pandas dataframe

After recording data in Beckhoff TwinCAT Scope, one can export this data to a CSV file. Said CSV file, however, has a rather complicated format. Can anyone suggestion the most effective way to import such a file into a pandas Dataframe so I can perform analysis?
An example of the format can be found here:
https://infosys.beckhoff.com/english.php?content=../content/1033/tcscope2/html/TwinCATScopeView2_Tutorial_SaveExport.htm&id=
No need to write a custom parser. Using the example data scope_data.csv:
Name,fasd,,,,
File,C;\,,,,
Start,dfsd,,,,
,,,,,
,,,,,
Name,Peak,Name,PULS1,Name,SINUS_FAST
Net id,123.123.123,Net id,123.123.124,Net Id,123.123.125
Port,801,Port,801,Port,801
,,,,,
0,0.6113936598,0,0.07994111349,0,0.08425652468
0,0.524852539,0,0.2051963401,0,0.4391185847
0,0.4993723482,0,0.2917317117,0,0.4583736263
0,0.5976553194,0,0.8675482865,0,0.8435987898
0,0.06087224998,0,0.7933980583,0,0.5614294705
0,0.1967968423,0,0.3923966599,0,0.1951608414
0,0.9723649064,0,0.5187276782,0,0.7646786192
You can import as follows:
import pandas as pd
scope_data = pd.read_csv(
"scope_data.csv",
skiprows=[*range(5), *range(6, 9)],
usecols=[*range(1, 6, 2)]
)
Then you get
>>> scope_data.head()
Peak PULS1 SINUS_FAST
0 0.611394 0.079941 0.084257
1 0.524853 0.205196 0.439119
2 0.499372 0.291732 0.458374
3 0.597655 0.867548 0.843599
4 0.060872 0.793398 0.561429
I don't have the original scope csv, but a little adjustment of skiprows and use_cols should give you the desired result.
To read the bulk of the file (ignoring the header material) use the skiprows keyword argument to read_csv:
import pandas as pd
df = pd.read_csv('data.csv', skiprows=18)
For the header material, I think you'd have to write a custom parser.