How do I convert this scikit-learn section to pandas dataframe? [duplicate] - pandas

This question already has answers here:
How to convert a Scikit-learn dataset to a Pandas dataset
(27 answers)
Closed 2 years ago.
I am trying to convert this Python code section to pandas dataframe:
iris = datasets.load_iris()
x = iris.data
y = iris.target
I will like to import Iris data on my local machine instead of loading the data from Scikit library. Your kind suggestions would be highly appreciated.

Use from_records and concat to create a datafram. then rename the columns:
df = pd.concat([pd.DataFrame.from_records(x),pd.DataFrame(y)],axis=1)
df.columns = ['col1','col2','col3','col4','target']

Related

how to use pandas.concat insted of append [duplicate]

This question already has answers here:
How to replace pandas append with concat?
(3 answers)
Closed 4 months ago.
I have to import my excel files and combine them into one file.
I used below code and it's worked, but I got information "The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead"
I tried to use concat but it doest't work please help.
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob(r'path\*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
Use list comprehension instead loop with DataFrame.append:
all_data = pd.concat([pd.read_excel(f) for f in glob.glob(r'path\*.xlsx')],
ignore_index=True)

Importing Data with read_csv into DF [duplicate]

This question already has answers here:
How to read a file with a semi colon separator in pandas
(2 answers)
Closed 1 year ago.
I have tried to import csv via pandas. But df.head shows the data in wrong rows (see picture).
import numpy as np
import pandas as pd
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv")
df.head()
Can you help me?
Seems like your data is not 'comma' seperated but 'semicolon' separated. Try adding this separator parameter.
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv", sep=';')

How to view full data when using Dataframe in pandas while using jupyternotebook? [duplicate]

This question already has answers here:
pandas pd.options.display.max_rows not working as expected
(2 answers)
Pandas: Setting no. of max rows
(10 answers)
Closed 1 year ago.
rather than having the ....(dot dot) view when opening a data frame, how to access or see all the values in Jupyter notebook. reference image
You can set "display.max_rows" option in pandas: pd.set_option('display.max_rows', None) or pd.set_option('display.max_rows', LARGE_NUMBER)
To set it temporarily use a context manager:
with pd.option_context('display.max_rows', None):
print(df)

Divide Pandas Dataframe by Series - why does it work? [duplicate]

This question already has answers here:
How do I operate on a DataFrame with a Series for every column?
(3 answers)
What does the term "broadcasting" mean in Pandas documentation?
(2 answers)
Closed 2 years ago.
I've come across something that while looking at how to normalize data on datacamp.com
In one of the exercises it was said that to normalize a dataframe df one should do like this
normalized_df = df / df.mean()
My question is: why does this work? Why does Pandas know here to divide columnwise?
Thanks
Edit: This post Dataframe divide series on pandas does not answer the question I am posting. I merely states what to do. I want to know why it works.

Jupyter Notebook Truncates Python Output [duplicate]

This question already has answers here:
How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?
(10 answers)
Closed 3 years ago.
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
data = pd.read_csv(...)
data.columns
Given the code above I am expecting to see a complete list of the 668 columns in this data set. Instead the output is truncated like this:
Index(['VIN_SIGNI_PATTRN_MASK', 'NCI_MAK_ABBR_CD', 'MDL_YR', 'VEH_TYP_CD',
'VEH_TYP_DESC', 'MAK_NM', 'MDL_DESC', 'TRIM_DESC', 'OPT1_TRIM_DESC',
'OPT2_TRIM_DESC',
...
'EPA_SMART_WAY_DESC', 'MA_COLL_SYMB', 'MA_COMP_SYMB', 'MA_BASE_SYMB',
'MA_VSR_SYMB', 'MA_PERFORMANCE_IND', 'MA_ROLL_IND', 'PROACTIVE_IND',
'MAK_CD', 'MDL_CD'],
dtype='object', length=668)
Why can't I see all 668 columns ?
Because you are changing Pandas pretty print, not how Python itself is truncating output.
For example: display.max_rows and display.max_columns sets the maximum number of rows and columns displayed when a frame is pretty-printed. Truncated lines are replaced by an ellipsis.
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#frequently-used-options
Instead of this, just do list(data.columns)
Without list()
With list()
Your solution works for me... (can scroll to last column)
import pandas as pd
import numpy as np
print(pd.__version__)
pd.set_option('display.max_columns', None)
df = pd.DataFrame(np.random.rand(10, 668))
df