Importing Data with read_csv into DF [duplicate] - pandas

This question already has answers here:
How to read a file with a semi colon separator in pandas
(2 answers)
Closed 1 year ago.
I have tried to import csv via pandas. But df.head shows the data in wrong rows (see picture).
import numpy as np
import pandas as pd
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv")
df.head()
Can you help me?

Seems like your data is not 'comma' seperated but 'semicolon' separated. Try adding this separator parameter.
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv", sep=';')

Related

how to use pandas.concat insted of append [duplicate]

This question already has answers here:
How to replace pandas append with concat?
(3 answers)
Closed 4 months ago.
I have to import my excel files and combine them into one file.
I used below code and it's worked, but I got information "The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead"
I tried to use concat but it doest't work please help.
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob(r'path\*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
Use list comprehension instead loop with DataFrame.append:
all_data = pd.concat([pd.read_excel(f) for f in glob.glob(r'path\*.xlsx')],
ignore_index=True)

pandas read csv giving creating incorrect columns

I am using pandas to read a csv file
import pandas as pd
data = pd.read_csv('file_name.csv', "ISO-8859-1")
Op:
col1,col2,col3
"sample1","sample2","sample3"
but the DF is just creating 1 column instead of the 5 it is supposed to create. I checked the csv and its fine.
Any suggestions on why this could be happening will be useful.

How do I convert this scikit-learn section to pandas dataframe? [duplicate]

This question already has answers here:
How to convert a Scikit-learn dataset to a Pandas dataset
(27 answers)
Closed 2 years ago.
I am trying to convert this Python code section to pandas dataframe:
iris = datasets.load_iris()
x = iris.data
y = iris.target
I will like to import Iris data on my local machine instead of loading the data from Scikit library. Your kind suggestions would be highly appreciated.
Use from_records and concat to create a datafram. then rename the columns:
df = pd.concat([pd.DataFrame.from_records(x),pd.DataFrame(y)],axis=1)
df.columns = ['col1','col2','col3','col4','target']

Jupyter Notebook Truncates Python Output [duplicate]

This question already has answers here:
How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?
(10 answers)
Closed 3 years ago.
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
data = pd.read_csv(...)
data.columns
Given the code above I am expecting to see a complete list of the 668 columns in this data set. Instead the output is truncated like this:
Index(['VIN_SIGNI_PATTRN_MASK', 'NCI_MAK_ABBR_CD', 'MDL_YR', 'VEH_TYP_CD',
'VEH_TYP_DESC', 'MAK_NM', 'MDL_DESC', 'TRIM_DESC', 'OPT1_TRIM_DESC',
'OPT2_TRIM_DESC',
...
'EPA_SMART_WAY_DESC', 'MA_COLL_SYMB', 'MA_COMP_SYMB', 'MA_BASE_SYMB',
'MA_VSR_SYMB', 'MA_PERFORMANCE_IND', 'MA_ROLL_IND', 'PROACTIVE_IND',
'MAK_CD', 'MDL_CD'],
dtype='object', length=668)
Why can't I see all 668 columns ?
Because you are changing Pandas pretty print, not how Python itself is truncating output.
For example: display.max_rows and display.max_columns sets the maximum number of rows and columns displayed when a frame is pretty-printed. Truncated lines are replaced by an ellipsis.
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#frequently-used-options
Instead of this, just do list(data.columns)
Without list()
With list()
Your solution works for me... (can scroll to last column)
import pandas as pd
import numpy as np
print(pd.__version__)
pd.set_option('display.max_columns', None)
df = pd.DataFrame(np.random.rand(10, 668))
df

Beckhoff TwinCat Scope CSV Format into pandas dataframe

After recording data in Beckhoff TwinCAT Scope, one can export this data to a CSV file. Said CSV file, however, has a rather complicated format. Can anyone suggestion the most effective way to import such a file into a pandas Dataframe so I can perform analysis?
An example of the format can be found here:
https://infosys.beckhoff.com/english.php?content=../content/1033/tcscope2/html/TwinCATScopeView2_Tutorial_SaveExport.htm&id=
No need to write a custom parser. Using the example data scope_data.csv:
Name,fasd,,,,
File,C;\,,,,
Start,dfsd,,,,
,,,,,
,,,,,
Name,Peak,Name,PULS1,Name,SINUS_FAST
Net id,123.123.123,Net id,123.123.124,Net Id,123.123.125
Port,801,Port,801,Port,801
,,,,,
0,0.6113936598,0,0.07994111349,0,0.08425652468
0,0.524852539,0,0.2051963401,0,0.4391185847
0,0.4993723482,0,0.2917317117,0,0.4583736263
0,0.5976553194,0,0.8675482865,0,0.8435987898
0,0.06087224998,0,0.7933980583,0,0.5614294705
0,0.1967968423,0,0.3923966599,0,0.1951608414
0,0.9723649064,0,0.5187276782,0,0.7646786192
You can import as follows:
import pandas as pd
scope_data = pd.read_csv(
"scope_data.csv",
skiprows=[*range(5), *range(6, 9)],
usecols=[*range(1, 6, 2)]
)
Then you get
>>> scope_data.head()
Peak PULS1 SINUS_FAST
0 0.611394 0.079941 0.084257
1 0.524853 0.205196 0.439119
2 0.499372 0.291732 0.458374
3 0.597655 0.867548 0.843599
4 0.060872 0.793398 0.561429
I don't have the original scope csv, but a little adjustment of skiprows and use_cols should give you the desired result.
To read the bulk of the file (ignoring the header material) use the skiprows keyword argument to read_csv:
import pandas as pd
df = pd.read_csv('data.csv', skiprows=18)
For the header material, I think you'd have to write a custom parser.