Read json files from tar.gz folders and convert to pandas dataframe [duplicate] - pandas

This question already has answers here:
JSON to pandas DataFrame
(14 answers)
Closed 3 years ago.
i have never worked with json files and my problem is I have several folders tar.gz containing different json files. From each zipped folders I need to read only files AAjson, append and convert to a pandas dataframe. I tried in this way
import os, re
import pandas as pd
import pandas as pd
import tarfile
import json
from pandas.io.json import json_normalize
cd = "my_path"
dfList = []
for root, dirs, files in os.walk(cd):
with tarfile.open("dirs", "r:*") as tar:
for fname in files:
if re.match("AA_*.json$", fname):
data = json.load(fname)
frame = pd.DataFrame.from_dict(json_normilized(data),
orient='columns')
dfList.append(frame)
df = pd.concat(dfList)
I found the error
FileNotFoundError: [Errno 2] No such file or directory: 'dirs'

import pandas as pd
data = pd.read_json('filepath/filename')
data

Related

Import multiple files in pandas

I am trying to import multiple files in pandas. I have created 3 files in the folder
['File1.xlsx', 'File2.xlsx', 'File3.xlsx'] as read by files = os.listdir(cwd)
import os
import pandas as pd
cwd = os.path.abspath(r'C:\Users\abc\OneDrive\Import Multiple files')
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
# df.to_excel('total_sales.xlsx')
print (files)
Upon running the code, I am getting the error (even though the file does exist in the folder)
FileNotFoundError: [Errno 2] No such file or directory: 'File1.xlsx'
Ideally, I want a code where I define a list of files in a LIST and then read the files through the loop using the path and the file LIST.
I think the following should work
import os
import pandas as pd
cwd = os.path.abspath(r'C:\Users\abc\OneDrive\Import Multiple files')
paths = [os.path.join(cwd,path) for path in os.listdir(cwd) if path.endswith('.xlsx')]
df = pd.concat(pd.read_excel(path,ignore_index=True) for path in paths)
df.head()
The idea is to get a list of full paths and then read them all in and concatenate them into a single dataframe on the next line

How to import Excel xlsx files into pandas [duplicate]

This question already has answers here:
Pandas cannot open an Excel (.xlsx) file
(5 answers)
Closed 1 year ago.
I need to import .XLSX Excel file into pandas it is now unsupported and gives the error
XLRDError: Excel xlsx file; not supported
I need an alternative for:
import pandas as pd
df = pd.read_excel("Challenger track/Data Sets/extract-text-1.xlsx", index_col=0)
df.head()
This is the workaround for pandas not supporting xlsx files. Install openpyxl and specify it as the engine when reading an xlsx file as below:
xlfile = pd.ExcelFile('sample.xlsx', engine='openpyxl')
df = xlfile.parse('sheet_name')

Pandas - xls to xlsx converter

I want python to take ANY .xls file from given location and save it as .xlsx with original file name? How I can do that so anytime I paste file to location it will be converted to xlsx with original file name?
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(??)
Your code seems to be perfectly fine. In case you are only missing the correct way to write it with the given name, here you go.
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(f"{os.path.splitext(filename)[0]}.xlsx")
A possible extension to convert any file that gets pasted inside the folder can be implemented with an infinite loop, for instance:
import pandas as pd
import os
import time
while True:
files = os.listdir('./')
for filename in files:
out_name = f"{os.path.splitext(filename)[0]}.xlsx"
if filename.endswith('.xls') and out_name not in files:
df = pd.read_excel(filename)
df.to_excel(out_name)
time.sleep(10)

How to convert the outcome from np.mean to csv?

so I wrote a script to get the average grey value of each image in a folder. when I execute print(np.mean(img) I get all the values on the terminal. But i don't know how to get the values to a csv data.
import glob
import cv2
import numpy as np
import csv
import pandas as pd
files = glob.glob("/media/rene/Windows8_OS/PROMON/Recorded Sequences/6gParticles/650rpm/*.png")
for file in files:
img = cv2.imread(file)
finalArray = np.mean(img)
print(finalArray)
so far it works but I need to have the values in a csv data. I tried csvwriter and pandas but did not mangage to get a file containing the grey scale values.
Is this what you're looking for?
files = glob.glob("/media/rene/Windows8_OS/PROMON/Recorded Sequences/6gParticles/650rpm/*.png")
mean_lst = []
for file in files:
img = cv2.imread(file)
mean_lst.append(np.mean(img))
pd.DataFrame({"mean": mean_lst}).to_csv("path/to/file.csv", index=False)

Generating a NetCDF from a text file

Using Python can I open a text file, read it into an array, then save the file as a NetCDF?
The following script I wrote was not successful.
import os
import pandas as pd
import numpy as np
import PIL.Image as im
path = 'C:\path\to\data'
grb = [[]]
for fn in os.listdir(path):
file = os.path.join(path,fn)
if os.path.isfile(file):
df = pd.read_table(file,skiprows=6)
grb.append(df)
df2 = pd.np.array(grb)
#imarray = im.fromarray(df2) ##cannot handle this data type
#imarray.save('Save_Array_as_TIFF.tif')
i once used xray or xarray (they renamed them selfs) to get a NetCDF file into an ascii dataframe... i just googled and appearantly they have a to_netcdf function
import xarray and it allows you to treat dataframes just like pandas.
so give this a try:
df.to_netcdf(file_path)
xarray slow to save netCDF