Why there is giving, name 'pd ' is not defined error? - pandas

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
dataset = pd.read_csv('homeprice.csv')
print(dataset)
Output
NameError Traceback (most recent call
last) in
----> 1 dataset = pd.read_csv('homeprice.csv')
2 print(dataset)
NameError: name 'pd' is not defined

You mention that you are using a Jupiter notebook so you may have two code cells:
First is with the imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
And the second is with the functionality:
dataset = pd.read_csv('homeprice.csv')
print(dataset)
In Jupiter notebooks you may run each cell separately. If this is what you do, you should remember the run the first cell before you execute the second one for the first time. This would make sure the right stuff is imported in the current context for you second cell.

!pip install tensorflow-gpu==2.2.0.0rc2
import tensorflow as tf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Or
Just go with the latest TensorFlow version in the Colab

Related

create line chart in pyspark jupyter notebook on emr

I am running pyspark on an aws emr. I have a jupyter notebook, running in jupyter hub on the aws emr. I have read data into a spark dataframe named clusters_df. I'm now trying to create a simple line chart with k as the x axis and score as the y axis. I tried converting the dataframe to a pandas dataframe, since I don't think spark has built in data visualization. When I try to display the chart in the jupyter notebook I'm getting the messages below. I've also tried matplotlib. Both code examples are below, with the messages that get returned. Can anyone suggest how to create a line chart with a jupyter notebook running pyspark on an emr?
libraries imported:
import pyspark
##### running on emr
## function to create all tables
from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.functions import first
import pyspark.sql.functions as func
from pyspark.sql.functions import lit,StringType,coalesce,lag,trim, upper, substring
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.functions import round, explode,row_number,udf, length, min, when, format_number
from pyspark.sql.functions import hour, year, month, dayofmonth, date_add, to_date,datediff,dayofyear, weekofyear, date_format, unix_timestamp
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.feature import MinMaxScaler, PCA
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType
from pyspark.ml.feature import StandardScaler
import traceback
import sys
import time
import math
import datetime
import numpy as np
import pandas as pd
UPdate: I want to clarify I'm showing the two code examples below to show two examples of trying to create a linechart visualization in a jupyter notebook running with spark on an emr, that both fail to produce a line chart visualization.
the panadas example just returns the text shown. the matplotlib example returns the error shown because it doesn't seem to recognize spark anymore once the magic code is run in the cell to import matplotlib.
importing dataframe:
clusters_df=sqlContext.read.parquet("path")
code:
clusters_df.toPandas().plot.line(x="k",y="score");
output:
<AxesSubplot:xlabel='k'>
code:
%matplotlib inline
import matplotlib.pyplot as plt
pnds_df=clusters_df.toPandas()
plt.plot(pnds_df['k'],pnds_df['score'])
plt.show()
output:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-33-5e7649bc56fb> in <module>
3 import matplotlib.pyplot as plt
4
----> 5 pnds_df=clusters_df.toPandas()
6
7 plt.plot(pnds_df['k'],pnds_df['score'])
NameError: name 'clusters_df' is not defined

can not plot a graph using matplotlib showing error

Exception has occurred: ImportError
dlopen(/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/PIL/_imaging.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '_xcb_connect'
File "/Users/showrov/Desktop/Machine learning/Preprosessing/import_dataset.py", line 2, in <module>
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import sys
print(sys.version)
data=pd.read_csv('Data_customer.csv')
print(data)
plt.plot(data[:2],data[:2])
data[:2] will return the first 2 rows. In order to plot, you need to use the columns.
Mention the column name directly like data['columnName'] otherwise use the iloc method.
for example: data.iloc[:, 1:2] in order to access 2nd column.
For more information about indexing operations, please check out this link

Need to run cell twice for the changed code to show output

I've run into an issue where I need to run the same cell twice after making a change. I've included a gif and the code.
In the gif I first change the seaborn style to darkgrid and run it, this should show the output as changed to the specified style on the first run, but I need to run it twice in order for the output to change.
Here is the code:
%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,14,100)
for i in range(1,5):
plt.plot(x,np.sin(x+i*0.5)*(7-i))
sns.set_style("white", {'axes.axisbelow': False})
plt.show()
I have tried separating the import lines to a previous cell but still the problem persists
set your style, before you plot anything . Move the line sns.set_style before for loop. It should work.
%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sns.set_style("darkgrid", {'axes.axisbelow': False})
x = np.linspace(0,14,100)
for i in range(1,5):
plt.plot(x,np.sin(x+i*0.5)*(7-i))
plt.show()

Getting an error : "OptionError: 'You can only set the value of existing options'" in python

I am getting an error:
"OptionError: 'You can only set the value of existing options'"
after using the below code, please can someone help?
### Data Analaysis
import numpy as np
import pandas as pd
### Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
import plotly.express as px
from plotly.subplots import make_subplots
from scipy.interpolate import make_interp_spline, BSpline
%matplotlib inline
import warnings
warnings.simplefilter(action='ignore', category=Warning)
pd.set_option('display.max_columns', None)
pd.options.plotting.backend = "plotly"
The option error "you can only set the value of existing options" are coming because of your last line of script
pd.options.plotting.backend = "plotly"
where you are trying to update your pandas backend plotting method.
Please update your pandas and plotly packages. Because it only works with pandas version>=0.25 and plotly version>=4.8.
So update both the packages, restart your kernel, if you are working on Jupyter notebook. For upgrading the packages
pip install -U pandas
pip install -U plotly

Successfully installed SciPy, but "from scipy.misc import imread" gives ImportError: cannot import name 'imread'

I have successfully installed scipy, numpy, and pillow, however I get error as below
ImportError: cannot import name 'imread'
Are you following the same steps?
import scipy.misc
img = scipy.misc.imread('my_image_path')
# To verify image is read properly.
import matplotlib.pyplot as plt
print(img.shape)
plt.imshow(img)
plt.show()
imread and imsave are deprecated in scipy.misc
Use imageio.imread instead after import imageio.
For saving -
Use imageio.imsave instead or use imageio.write
For resizing use skimage.transform.resize instead after import skimage