genfromtxt() Save results to separate file - numpy

I am new to python and I have question regarding genfromtxt(). I have the following code:
import numpy as np
Myfile = "C:\\Users\\suntzu\\Desktop\\winequality-red.csv"
ds = np.genfromtxt(Myfile,names=True, delimiter=',')
I am trying to redirect this output to a new file. I searched on google for sometime and I can seem to figure out on how to do this.

See if this helps:
To save as csv using numpy , Try this:
np.savetxt("save.csv",ds, delimiter=",")
To save as numpy file, try this :
np.save("save.npy",ds)

Related

Pandas - xls to xlsx converter

I want python to take ANY .xls file from given location and save it as .xlsx with original file name? How I can do that so anytime I paste file to location it will be converted to xlsx with original file name?
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(??)
Your code seems to be perfectly fine. In case you are only missing the correct way to write it with the given name, here you go.
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(f"{os.path.splitext(filename)[0]}.xlsx")
A possible extension to convert any file that gets pasted inside the folder can be implemented with an infinite loop, for instance:
import pandas as pd
import os
import time
while True:
files = os.listdir('./')
for filename in files:
out_name = f"{os.path.splitext(filename)[0]}.xlsx"
if filename.endswith('.xls') and out_name not in files:
df = pd.read_excel(filename)
df.to_excel(out_name)
time.sleep(10)

How to load csv file into SparkSession

I am learning PySpark from some online source. I googled around and found how I could read csv file into Spark DataFrame using the following codes
import pandas as pd
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark_df = spark.read.csv('my_file.csv', header=True)
pandas_df = spark_df.toPandas()
However, on the online site I am learning, it loads the csv file somehow into SparkSession without telling the audience how to do it. That is, when I typed (on the online site's browser)
print(spark.catalog.listTables())
The following output returns.
[Table(name='my_file', database=None, description=None, tableType='TEMPORARY', isTemporary=True)]
When I tried to print the catalog as above, I got an empty list back.
Is there anyway how to put the csv file into the SparkSession? I have tried to google for this but most of what I found is how to load csv into Spark DataFrame like I showed above.
Thanks very much.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName(#type the app name).getOrCreate()
df = spark.read.csv('invoice.csv',inferSchema=True,header=True)
It seems how to do this is left far behind where it should be on the online site.
sdf = spark.read.csv('my_file.csv', header=True)
pdf = sdf.toPandas()
spark_temp = spark.createDataFrame(pdf)
spark_temp.createOrReplaceTempView('my_file')
print(spark.catalog.listTables())
[Table(name='my_file', database=None, description=None, tableType='TEMPORARY', isTemporary=True)]
One question remains though. I cannot use pd.read_csv('my_file.csv') directly. It resulted in some merge error or something.
This can work:
df = my_spark.read.csv("my_file.csv",inferSchema=True,header=True)
df.createOrReplaceTempView('my_file')
print(my_spark.catalog.listTables())

How to convert the outcome from np.mean to csv?

so I wrote a script to get the average grey value of each image in a folder. when I execute print(np.mean(img) I get all the values on the terminal. But i don't know how to get the values to a csv data.
import glob
import cv2
import numpy as np
import csv
import pandas as pd
files = glob.glob("/media/rene/Windows8_OS/PROMON/Recorded Sequences/6gParticles/650rpm/*.png")
for file in files:
img = cv2.imread(file)
finalArray = np.mean(img)
print(finalArray)
so far it works but I need to have the values in a csv data. I tried csvwriter and pandas but did not mangage to get a file containing the grey scale values.
Is this what you're looking for?
files = glob.glob("/media/rene/Windows8_OS/PROMON/Recorded Sequences/6gParticles/650rpm/*.png")
mean_lst = []
for file in files:
img = cv2.imread(file)
mean_lst.append(np.mean(img))
pd.DataFrame({"mean": mean_lst}).to_csv("path/to/file.csv", index=False)

Generating a NetCDF from a text file

Using Python can I open a text file, read it into an array, then save the file as a NetCDF?
The following script I wrote was not successful.
import os
import pandas as pd
import numpy as np
import PIL.Image as im
path = 'C:\path\to\data'
grb = [[]]
for fn in os.listdir(path):
file = os.path.join(path,fn)
if os.path.isfile(file):
df = pd.read_table(file,skiprows=6)
grb.append(df)
df2 = pd.np.array(grb)
#imarray = im.fromarray(df2) ##cannot handle this data type
#imarray.save('Save_Array_as_TIFF.tif')
i once used xray or xarray (they renamed them selfs) to get a NetCDF file into an ascii dataframe... i just googled and appearantly they have a to_netcdf function
import xarray and it allows you to treat dataframes just like pandas.
so give this a try:
df.to_netcdf(file_path)
xarray slow to save netCDF

Accessing carray of pointcloud using pytables

I am having a hard time understanding how to access the data in a carray.
http://carray.pytables.org/docs/manual/index.html
I have a carray that I can view in a group structure using vitables - but how to open it and retrieve the data it beyond me.
The data are a point cloud that is 3 levels down that I want to make a scatter plot of and extract as a .obj file..
I then have to loop through (many) clouds and do the same thing..
Is there anyone that can give me a simple example of how to do this?
This was my attempt:
import carray as ca
fileName = 'hdf5_example_db.h5'
a = ca.open(rootdir=fileName)
print a
I managed to solve my issue.. I wasn't treating the carray differently to the rest of the hierarchy. I needed to first load the entire db, then refer to the data I needed. I ended up not having to use carray, and just stuck to h5py:
from __future__ import print_function
import h5py
import numpy as np
# read the hdf5 format file
fileName = 'hdf5_example_db.h5'
f = h5py.File(fileName, 'r')
# full path of carry type data (which is in ply format)
dataspace = '/objects/object_000/object_model'
# view the data
print(f[dataspace])
# print to ply file
with open('object_000.ply', 'w') as fo:
for line in f[dataspace]:
fo.write(line+'\n')