Reading CSV and import column data as an numpy array - pandas

I have many csv file all contains two column. One is 'Energy' and another is 'Count'. My target is to import those data and keep them as a numpy array separately. Let's say X and Y will be two numpy array where X have all Energy and Y have all count data. But the problem is in my csv file i have a blank row after each data that seems making a lot of trouble. How can I eliminate those lines and save data as an array?
Energy Counts
-0.4767 0
-0.4717 0
-0.4667 0
-0.4617 0
-0.4567 0
-0.4517 0
import pandas as pd
import glob
import numpy as np
import os
import matplotlib.pyplot as plt
file_path = "path" ###file path
read_files = glob.glob(os.path.join(file_path,"*.csv")) ###get all files
X = [] ##create empty list
Y = [] ##create empty list
for files in read_files:
df = pd.read_csv(files,header=[0])
X.append(['Energy'])##store X data
Y.append(['Counts'])##store y data
X=np.array(X)
Y=np.array(Y)
print(X.shape)
print(Y.shape)
plt.plot(X[50],Y[50])
plt.show()
Ideally if I can save all data correctly, I suppose to get my plot but as data is not saving correctly, I am not getting any plot.

Set the skip_blank_lines parameter to True and these lines won't be read into the dataframe:
df = pd.read_csv(files, header=[0], skip_blank_lines=True)
So your whole program should be something like this (each file has the same column headers in the first line and the columns are separated by spaces):
...
df = pd.DataFrame()
for file in read_files:
df = df.append(pd.read_csv(file, sep='\s+', skip_blank_lines=True))
df.plot(x='Energy', y='Counts')
df.show()
# save both columns in one file
df.to_csv('myXYFile.csv', index=False)
# or two files with one column each
df.Energy.to_csv('myXFile.csv', index=False)
df.Counts.to_csv('myYFile.csv', index=False)
TEST PROGRAM
import pandas as pd
import io
input1="""Energy Counts
-0.4767 0
-0.4717 0
-0.4667 0
-0.4617 0
-0.4567 0
-0.4517 0
"""
input2="""Energy Counts
-0.4767 0
-0.4717 0
"""
df = pd.DataFrame()
for input in (input1,input2):
df = df.append(pd.read_csv(io.StringIO(input), sep='\s+', skip_blank_lines=True))
print(df)
TEST OUTPUT:
Energy Counts
0 -0.4767 0
1 -0.4717 0
2 -0.4667 0
3 -0.4617 0
4 -0.4567 0
5 -0.4517 0
0 -0.4767 0
1 -0.4717 0

Related

assigning csv file to a variable name

I have a .csv file, i uses pandas to read the .csv file.
import pandas as pd
from pandas import read_csv
data=read_csv('input.csv')
print(data)
0 1 2 3 4 5
0 -3.288733e-08 2.905263e-08 2.297046e-08 2.052534e-08 3.767194e-08 4.822049e-08
1 2.345769e-07 9.462636e-08 4.331173e-08 3.137627e-08 4.680112e-08 6.067109e-08
2 -1.386798e-07 1.637338e-08 4.077676e-08 3.339685e-08 5.020153e-08 5.871679e-08
3 -4.234607e-08 3.555008e-08 2.563824e-08 2.320405e-08 4.008257e-08 3.901410e-08
4 3.899913e-08 5.368551e-08 3.713510e-08 2.367323e-08 3.172775e-08 4.799337e-08
My aim is to assign the file to a column name so that i can access the data in later time. For example by doing something like
new_data= df['filename']
filename
0 -3.288733e-08,2.905263e-08,2.297046e-08,2.052534e-08,3.767194e-08,4.822049e-08
1 2.345769e-07,9.462636e-08,4.331173e-08,3.137627e-08,4.680112e-08, 6.067109e-08
2 -1.386798e-07,1.637338e-08,4.077676e-08,3.339685e-08,5.020153e-08,5.871679e-08
3 -4.234607e-08,3.555008e-08,2.563824e-08,2.320405e-08,4.008257e-08,3.901410e-08
4 3.899913e-08,5.368551e-08,3.713510e-08,2.367323e-08,3.172775e-08,4.799337e-08
I don't really like it (and I still don't completely get the point), but you could just read in your data as 1 column (by using a 'wrong' seperator) and renaming the column.
import pandas as pd
filename = 'input.csv'
df = pd.read_csv(filename, sep=';')
df.columns = [filename]
If you then wish, you could add other files by doing the same thing (with a different name for df at first) and then concatenate that with df.
A more usefull approach IMHO would be to add the dataframe to a dictionary (or a list would be possible).
import pandas as pd
filename = 'input.csv'
df = pd.read_csv(filename)
data_dict = {filename: df}
# ... Add multiple files to data_dict by repeating steps above in a loop
You can then access your data later on by calling data_dict[filename] or data_dict['input.csv']

Extracting column from Array in python

I am beginner in Python and I am stuck with data which is array of 32763 number, separated by comma. Please find the data here data
I want to convert this into two column 1 from (0:16382) and 2nd column from (2:32763). in the end I want to plot column 1 as x axis and column 2 as Y axis. I tried the following code but I am not able to extract the columns
import numpy as np
import pandas as pd
import matplotlib as plt
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
df = pd.DataFrame(data.flatten())
print(df)
and then I want to write the data in some file let us say data1 in the format as shown in attached pic
It is hard to answer without seeing the format of your data, but you can try
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
print(data.shape) # here we check we got something useful
# this should split data into x,y at position 16381
x = data[:16381]
y = data[16381:]
# now you can create a dataframe and print to file
df = pd.DataFrame({'x':x, 'y':y})
df.to_csv('data1.csv', index=False)
Try this.
#input as dataframe df, its chunk_size, extract output as list. you can mention chunksize what you want.
def split_dataframe(df, chunk_size = 16382):
chunks = list()
num_chunks = len(df) // chunk_size + 1
for i in range(num_chunks):
chunks.append(df[i*chunk_size:(i+1)*chunk_size])
return chunks
or
np.array_split

Increment or reset counter based on an existing value of a data frame column in Pandas

I have a dataframe imported from csv file along the lines of the below:
Value Counter
1. 5 0
2. 15 1
3. 15 2
4. 15 3
5. 10 0
6. 15 1
7. 15 1
I want to increment the value of counter only if the value= 15 else reset it to 0. I tried cumsum but stuck how to reset it back to zero of nonmatch
Here is my code
import pandas as pd
import csv
import numpy as np
dfs = []
df = pd.read_csv("H:/test/test.csv")
df["Counted"] = (df["Value"] == 15).cumsum()
dfs.append(df)
big_frame = pd.concat(dfs, sort=True, ignore_index=False)
big_frame.to_csv('H:/test/List.csv' , index=False)
Thanks for your help
Here's my approach:
s = df.Value.ne(15)
df['Counter'] = (~s).groupby(s.cumsum()).cumsum()

How to shift the column headers in pandas

I have .txt files I'm reading in with pandas and the header line starts with '~A'. I need to ignore the '~A' and have the next header correspond to the data in the first column. Thanks!
You can do this:
import pandas as pd
data = pd.read_csv("./test.txt", names=[ 'A', 'B' ], skiprows=1)
print(data)
and the output for input:
~A, A, B
1, 2
3, 4
is:
c:\Temp\python>python test.py
A B
0 1 2
1 3 4
You have to name the columns yourself but given that your file seems to be malformed I guess it is not that bad.
If your header lines are not the same in all files, then you can just read them in Python:
import pandas as pd;
# read first line
with open("./test.txt") as myfile:
headRow = next(myfile)
# read column names
columns = [x.strip() for x in headRow.split(',')]
# process by pandas
data = pd.read_csv("./test.txt", names=columns[1:], skiprows=1)
print(data);

pandas calling an element from imported data

I have a csv file with dates.
import pandas as pd
spam=pd.read_csv('DATA.csv', parse_dates=[0], usecols=[0], header=None)
spam.shape
is (n,1)
How can I call an element as I do in Numpy (ex. I have an array A.shape => (n,1), if I call A[5,1] I get the element on the 5th row in the 1st column) ?
Numpy arrays index at zero, so you'll actually need A[4,0] to get the element on the 5th row of the 1st column.
But this is how you'd get the same as Numpy Arrays.
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(2,2)) # create a 2 by 2 DataFrame object
>>> df.ix[1,1]
-1.206712609725652
>>> df
0 1
0 -0.281467 1.124922
1 0.580617 -1.206713
iloc is for integers only, whereas ix will work for both integers and labels, and is available in older versions of Pandas.
>>> df.iloc[1,1]
-1.206712609725652