Pandas read_csv failing on gzipped file with OSError: Not a gzipped file (b'NU') - pandas

I used the code ask below to load the csv.gz file but I got the error
OSError: Not a gzipped file (b'NU')
How can I solve it?
Code:
import pandas as pd
data = pd.read_csv('climat.202010.csv.gz', compression='gzip')
print(data)
Or:
import gzip
import pandas as pd
filename = 'climat.202010.csv.gz'
with gzip.open(filename, 'rb') as f:
data = pd.read_csv(f)

Try
import gzip
with gzip.open(filename, 'rb') as fio:
df = pd.read_csv(fio)

This works for me:
import gzip
import pandas as pd
with gzip.open(r'C:\Users\MyUser\OneDrive - Company\Data\Wiser\Files\WiserWeeklyReport.csv.gz') as f:
wiser_report = pd.read_csv(f)
wiser_report.head()
If you're still getting an error, it may be the file or the file name. Have you tried taking out the extra period in the file name?

Related

How to create multiple pandas profiling reports for multiple csv files in a directory? The report name should match the file name

I tried this,
import glob
import os
import pandas as pd
import pandas_profiling
from pandas_profiling import ProfileReport
files = glob.glob("D:\home_health_services_current_data\*.csv")
df = pd.DataFrame()
for f in files:
csv = pd.read_csv(f)
df = df.append(csv)
profile = ProfileReport(df, title="Profiling Report", explorative=True)
profile.to_file("D:\proj_report\profilerep\prof_report.html")

Importing a gziped csv into pandas

I have an url: https://api.kite.trade/instruments
And this is my code to fetched data from url and write into excel
import pandas as pd
url = "https://api.kite.trade/instruments"
df = pd.read_json(url)
df.to_excel("file.xlsx")
print("Program executed successfully")
but, when I run this program I'm getting error like this_
AttributeError: partially initialized module 'pandas' has no attribute 'read_json' (most likely due to a circular import)
It's not a json, it's csv. So you need to use read_csv. Can you please try this?
import pandas as pd
url = "https://api.kite.trade/instruments"
df = pd.read_csv(url)
df.to_excel("file.xlsx",index=False)
print("Program excuted successfully")
I added an example how I converted the text to a csv dump on your local drive.
import requests
url = "https://api.kite.trade/instruments"
filename = 'test.csv'
f = open(filename, 'w')
response = requests.get(url)
f.write(response.text)

To read csv in google colaboratore

from google.colab import files
uploaded = files.upload()
import io
def ls(ruta = uploaded):
return [arch.name for arch in io.StringIO((ruta)) if arch.is_file()]
divisas = ls()
I have this error:
TypeError: initial_value must be str or None, not dict
from google.colab import files
uploaded = files.upload()
Import the google.colab library for file upload then upload the file and pass file name inside the pandas read_csv function
import io
import pandas as pd
df2 = pd.read_csv(io.BytesIO(uploaded['heart.csv']))
df2.head()
I need to include the file names in divisas list

read csv file from buffer got EmptyDataError?

i need to read a string like csv content with pandas , but pandas get some errors, i don't knonw what happened, can anyone help me?
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
df = pd.read_csv(buf)
got error, EmptyDataError: No columns to parse from file
老铁你拿去
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
buf.seek(0)
df = pd.read_csv(buf)
``

Loading .txt file from Google Cloud Storage into a Pandas DF

I'm trying to load a .txt file from a GCS bucket into pandas df via pd.read_csv. When I run this code on my local machine (sourcing the .txt file from a local directory), it works perfectly. However, when I try and run the code in a cloud function , accessing the same .txt file but from a GCS bucket, I get a 'TypeError: cannot use a string pattern on a bytes-like object'
The only thing that's different is the fact that I'm accessing the .txt file via the GCS bucket so its a bucket object (Blob) instead of a normal file. Would I need to download the blob as a string or as a file-like object first before doing pd.read_csv? code is below
def stage1_cogs_vfc(data, context):
from google.cloud import storage
import pandas as pd
import dask.dataframe as dd
import io
import numpy as np
start_bucket = 'my_bucket'
storage_client = storage.Client()
source_bucket = storage_client.bucket(start_bucket)
df = pd.DataFrame()
file_path = 'gs://my_bucket/SCE_Var_Fact_Costs.txt'
df = pd.read_csv(file_path,skiprows=12, encoding ='utf-8', error_bad_lines= False, warn_bad_lines= False , header = None ,sep = '\s+|\^+',engine='python')
Traceback (most recent call last):
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 383, in run_background_function _function_handler.invoke_user_function(event_object) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 214, in call_user_function event_context.Context(**request_or_event.context)) File "/user_code/main.py", line 20, in stage1_cogs_vfc df = pd.read_csv(file_path,skiprows=12, encoding ='utf-8', error_bad_lines= False, warn_bad_lines= False , header = None ,sep = '\s+|\^+',engine='python') File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__ self._make_engine(self.engine) File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1132, in _make_engine self._engine = klass(self.f, **self.options) File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2238, in __init__ self.unnamed_cols) = self._infer_columns() File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2614, in _infer_columns line = self._buffered_line() File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2689, in _buffered_line return self._next_line() File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2791, in _next_line next(self.data) File "/env/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2379, in _read yield pat.split(line.strip()) TypeError: cannot use a string pattern on a bytes-like object
``|
I found a similar situation here.
I also noticed that on the line:
source_bucket = storage_client.bucket(source_bucket)
you are using "source_bucket" for both: your variable name and parameter. I would suggest to change one of those.
However, I think you'd like to see this doc for any further question related to the API itself: Storage Client - Google Cloud Storage API
Building on points from #K_immer is my updated code that includes reading into 'Dask' df...
def stage1_cogs_vfc(data, context):
from google.cloud import storage
import pandas as pd
import dask.dataframe as dd
import io
import numpy as np
import datetime as dt
start_bucket = 'my_bucket'
destination_path = 'gs://my_bucket/ddf-*_cogs_vfc.csv'
storage_client = storage.Client()
bucket = storage_client.get_bucket(start_bucket)
blob = bucket.get_blob('SCE_Var_Fact_Costs.txt')
df0 = pd.DataFrame()
file_path = 'gs://my_bucket/SCE_Var_Fact_Costs.txt'
df0 = dd.read_csv(file_path,skiprows=12, dtype=object ,encoding ='utf-8', error_bad_lines= False, warn_bad_lines= False , header = None ,sep = '\s+|\^+',engine='python')
df7 = df7.compute() # converts dask df to pandas df
# then do your heavy ETL stuff here using pandas...