Beckhoff TwinCat Scope CSV Format into pandas dataframe - pandas

After recording data in Beckhoff TwinCAT Scope, one can export this data to a CSV file. Said CSV file, however, has a rather complicated format. Can anyone suggestion the most effective way to import such a file into a pandas Dataframe so I can perform analysis?
An example of the format can be found here:
https://infosys.beckhoff.com/english.php?content=../content/1033/tcscope2/html/TwinCATScopeView2_Tutorial_SaveExport.htm&id=

No need to write a custom parser. Using the example data scope_data.csv:
Name,fasd,,,,
File,C;\,,,,
Start,dfsd,,,,
,,,,,
,,,,,
Name,Peak,Name,PULS1,Name,SINUS_FAST
Net id,123.123.123,Net id,123.123.124,Net Id,123.123.125
Port,801,Port,801,Port,801
,,,,,
0,0.6113936598,0,0.07994111349,0,0.08425652468
0,0.524852539,0,0.2051963401,0,0.4391185847
0,0.4993723482,0,0.2917317117,0,0.4583736263
0,0.5976553194,0,0.8675482865,0,0.8435987898
0,0.06087224998,0,0.7933980583,0,0.5614294705
0,0.1967968423,0,0.3923966599,0,0.1951608414
0,0.9723649064,0,0.5187276782,0,0.7646786192
You can import as follows:
import pandas as pd
scope_data = pd.read_csv(
"scope_data.csv",
skiprows=[*range(5), *range(6, 9)],
usecols=[*range(1, 6, 2)]
)
Then you get
>>> scope_data.head()
Peak PULS1 SINUS_FAST
0 0.611394 0.079941 0.084257
1 0.524853 0.205196 0.439119
2 0.499372 0.291732 0.458374
3 0.597655 0.867548 0.843599
4 0.060872 0.793398 0.561429
I don't have the original scope csv, but a little adjustment of skiprows and use_cols should give you the desired result.

To read the bulk of the file (ignoring the header material) use the skiprows keyword argument to read_csv:
import pandas as pd
df = pd.read_csv('data.csv', skiprows=18)
For the header material, I think you'd have to write a custom parser.

Related

pandas read csv giving creating incorrect columns

I am using pandas to read a csv file
import pandas as pd
data = pd.read_csv('file_name.csv', "ISO-8859-1")
Op:
col1,col2,col3
"sample1","sample2","sample3"
but the DF is just creating 1 column instead of the 5 it is supposed to create. I checked the csv and its fine.
Any suggestions on why this could be happening will be useful.

Adding a column with a calculation to multiple CSVs

I'm SUPER green to Python and am having some issues trying to automate some calculations.
I know that this works to add a new column called "Returns" that divides "value" of current to "value" of previous to a csv:
import pandas as pd
import numpy as np
import csv
a = pd.read_csv("/Data/a_data.csv", index_col = "time")
a ["Returns"] = (a["value"]/a["value"].shift(1) -1)*100
However, I have a lot of these CSVs. I need this calculation to happen prior to merging them all together. So I was hoping to write something that just looped through all of the CSVs and did the calculation and added the column but clearly this was incorrect as I get Syntax error:
import pandas as pd
import numpy as np
import csv
a = pd.read_csv("/Data/a_data.csv", index_col = "time")
b = pd.read_csv("/Data/b_data.csv", index_col = "time")
c = pd.read_csv("/Data/c_data.csv", index_col = "time")
my_lists = ['a','b','c']
for my_list in my_lists:
{my_list}["Returns"] = ({my_list}["close"]/{my_list}["close"].shift(1) -1)*100
print(f"Calculating: {my_list.upper()}")
I'm sure there is an easy way to do this that I just haven't reached in my Python education yet, so any guidance would be greatly appreciated!
Assuming "close" and "time" are fields defined in each of your csv files, you could define a function that reads each file, do the shift and returns a dataframe:
def your_func(my_file): # this function takes a file name as an argument.
my_df = pd.read_csv(my_file, index_col = "time") # The function reads its content into a data frame,
my_df["Returns"] = (my_df["close"]/{my_df}["close"].shift(1) -1)*100 # makes the calculation
return my_df #and returns it as an output.
Then as a main code, you collect all csv files from a folder with glob package. Using the above function, you build a data frame for each file with the calculation done.
import glob
path =r'/Data/' # path to the directory where you have the csv files
filenames = glob.glob(path + "/*.csv") # grab the csv files names using glob package with path+all csv files present
for filename in filenames: # loop into all csv files names in the list of csv files present in the directory
df= your_func (filename) # call the function, defined above block of code, that reads the file from its name as argument, then makes the calculation and returns it.
print (df)
Above, there is a print of the data Frame which shows results; I am not sure what you intend to do with upper (I dont think this is a function on a data frame).
Finally, this returns independent data frames with calculations done prior to other or final transformation.
1.Do a, b, c data frames have the same dimension?
2.You don't need to import the CSV library because it includes in the Pandas library.
3.If you want to union data frames, you can use like this :
my_lists = [a,b,c]
and you can concatenate with this way:
result=pd.concat(my_lists)
Lastly, your calculation should be :
result["Returns"]=(result.loc[:, "close"].div(result.loc[:, "close"].shift()).fillna(0).replace([np.inf, -np.inf], 0))
You need to add an index-label selection (loc) function to the data frame in order to access the values. When numbers are dividing, results can be NaN(Not a Number) or infinite. Therefore, replace and fillna functions are related to NaN and Inf.

To_datetime in pandas module

This is a followup from a question which i had asked earlier: File size increased after imported Pandas
I have the following code:
pd.to_datetime(xl_file.index, format='%Y-%m-%d')
in this code to make it work i have to use import pandas as pd is there a way i can get this code to work without having to import the entire pandas package. I need to do it this way because the size of the .exe file increases dramatically.
Just import the to_datetime function, rather than the entire package.
from pandas import to_datetime
val = to_datetime("2020-01-01", format='%Y-%m-%d')
print(val)
Output:
2020-01-01 00:00:00

Reading BED files into pandas dataframe (windows)

For a bioinformatics project, I would like to read a .BED file into a pandas dataframe and have no clue how I can do it and what tools/programs are required. Nothing I found on the internet was really applicable to me, as I am working on windows10 with Python 3.7 (Anaconda distribution).
Any help would be appreciated.
According to https://software.broadinstitute.org/software/igv/BED:
A BED file (.bed) is a tab-delimited text file that defines a feature
track.
According to http://genome.ucsc.edu/FAQ/FAQformat#format1 is contains up to 12 fields (columns) and possible comment lines starting with the word 'track'. The following is a minimal program to read such a bed file into a pandas dataframe.
import pandas as pd
df = pd.read_csv('so58178958.bed', sep='\t', comment='t', header=None)
header = ['chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts']
df.columns = header[:len(df.columns)]
This is just a very simple code snippet treating all lines starting with a 't' as comments. This should work as all 'chrom' field entries should start with either a 'c', an 's' or a digit.
If you use pyranges, the df will be given names and the columns appropriate data types.
import pyranges as pr
df = pr.read_bed("your.bed", as_df=True)
It also has readers for untidy bioinformatics formats such as gtfs and gff3s.

Pandas- Groupby Plot is not working for object

I am new to Pandas and doing some analysis csv file. I have successfully read csv and shown all details. I have got two column as an object type which I need to plot. I have done groupy for those two columns and getting first and all data, However I am not sure, how to do plotting for these object types in Pandas. Below is my sample of Groupby and smaple for event_type and event_description for which I need to do plotting. If I can plot for Application and Network for event_type that will be great help
import pandas as pd
data = pd.read_csv('/Users/temp/Downloads/sample.csv’)
data.head()
grouped_df = data.groupby([ "event_type", "event_description"])
grouped_df.first()
As commented - need more info, but IIUC, try:
df['event_type'].value_counts(sort=True).plot(kind='barh')