How to convert netcdf file into csv? - file-upload

I am trying to load nc4 files into Power BI but there is no option to load nc4 format into Power BI, or instead is there any way to convert nc4 files?

Related

Trouble reading Blob Storage File into Azure ML Notebook

I have an Excel file uploaded to my ML workspace.
I can access the file as an azure FileDataset object. However, I don't know how to get it into a pandas DataFrame since 'FileDataset' object has no attribute 'to_dataframe'.
Azure ML notebooks seem to make a point of avoiding pandas for some reason.
Does anyone know how to get blob files into pandas dataframes from within Azure ML notebooks?
To explore and manipulate a dataset, it must first be downloaded from the blob source to a local file, which can then be loaded in a pandas DataFrame.
Here are the steps to follow for this procedure:
Download the data from Azure blob with the following Python code sample using Blob service. Replace the variable in the following code with your specific values:
from azure.storage.blob import BlobServiceClient
import pandas as pd
STORAGEACCOUNTURL= <storage_account_url>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>
#download from blob
t1=time.time()
blob_service_client_instance =
BlobServiceClient(account_url=STORAGEACCOUNTURL,
credential=STORAGEACCOUNTKEY)
blob_client_instance =
blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME,
snapshot=None)
with open(LOCALFILENAME, "wb") as my_blob:
blob_data = blob_client_instance.download_blob()
blob_data.readinto(my_blob)
t2=time.time()
print(("It takes %s seconds to download "+BLOBNAME) % (t2 - t1))
Read the data into a pandas DataFrame from the downloaded file.
#LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)
For more details you can follow this link

How to load a dataset from filenames in tensorflow, with input png and npy files

I have to load 2 images and 1 array from filenames with this format:
/nuove/corridoio_22092021/left/220921_141.png
/nuove/corridoio_22092021/right/220921_141.png
/nuove/corridoio_22092021/M/220921_141.npy
They are separated by a space..
I want to split each file path and load the images with tf.decode_png
but how to load the npy file in tensorflow?
Maybe I can use
dataset = tf.data.TextLineDataset([filenames_file])
but how should I proceed?
Use np.load to load the array and then use the array as required.

How can I open a large parquet file with Keras?

I've tried looking for this and haven't had any meaningful results.
I have a keras model that has multi input and my data was getting too large for my pandas approach so I preprocessed it and saved it parquet file. I'm not sure how to open it with keras.
I looked up tf.datasets but I still cannot figure out how to read a parquet file that I can pass to my model.
Does anyone know how to use open parquet files? I can't seem to figure out how to do this in tensorflow and can't find anything related to it in keras.
You can probably keep your pandas approach, but you would have to breakdown your data into chunks.
If you have already broken it down to create your parquet file, you should be able to use the same method to have only a subset of your data opened in pandas at a time.
If you need to extract the data from your parquet file here's a link on how to create chunks of data for a pandas dataframe:
How to read a CSV file subset by subset with Pandas?
Once you have a chunk of data you can call model.fit on that chunk of data and then go on to the next chunk and call model.fit
You can look into TensorFlow I/O which is a collection of file systems and file formats that are not available in TensorFlow's built-in support. Here you can find functionalities such tfio.IODataset.from_parquet, and also tfio.IOTensor.from_parquet to work with the parquet file formats.
!pip install tensorflow_io -U -q
import tensorflow_io as tfio
df = pd.DataFrame({"data": tf.random.normal([20], 0, 1, tf.float32),
"label": np.random.randint(2, size=(20))})
df.to_parquet("df.parquet")
pd.read_parquet('/content/df.parquet')[:2]
data label
0 0.721347 1
1 -1.215225 1
ds = tfio.IODataset.from_parquet('/content/df.parquet')
ds
FYI, I think you should also consider using the feather format rather than the parquet file format, AFAIK, the parquet file can be really heavy to load and can slow down your training pipelines, whereas feather is comparatively fast (very fast).

How to use python pydub to convert mp3 data(bytes) to wav data(bytes) without storing data to file?

How to use python pydub to convert mp3 data(bytes) to wav data(bytes) without storing data to file?
seg=AudioSegment(data=mp3_data)
seg.set_frame_rate(16000)
seg.set_channels(1)
# no function named set_format
# seg.set_foramt("wav")
return seg.raw_data
Updated:
Oh, I see. BytesIO can be used like this:
from io import BytesIO
seg=AudioSegment.from_mp3(BytesIO(mp3_data))
seg=seg.set_frame_rate(vosk_sample_rate)
seg=seg.set_channels(1)
wavIO=BytesIO()
seg.export(wavIO, format="wav")
return wavIO.getvalue()

Remove header from .MAT files loaded by loadmat() from scipy.io library

I am new to both python and tensorflow.
I am trying to make a input pipeline for a generative adversarial network with input complex number data in .mat format and loaded it with loadmat() from scipy.io library. Now I am trying to prepare my data for giving input to my network and i tried from_tensor_slices(). But it can not be converted into tensor because of the headers in it. I looked up how to remove header from files by python and found some techniques that can be applied to .csv file but nothing on .mat files. How can I remove the header from .mat files? Also, the loadmat() function returns a list of dictionary I think. How can I extract the data from the file under such condition? Thank you.