I need to download the results of a for loop on Google Colab to a csv file, but I haven't been able to do it.
This is my for loop:
for num in range(1, 101):
if ( num%2 == 0 and num%6 != 0) or (num%3 ==0 and num%6 != 0):
list = print(num)
The Notebook is called AHW1.ipynb
I tried:
from google.colab import files
files.download("AHW1.csv")
What can I do to download the results of this for loop as a csv file?
# data analysis libraries
import numpy as np
import pandas as pd
# visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# ignore warnings
import warnings
warnings.filterwarnings('ignore')
Related
I am running pyspark on an aws emr. I have a jupyter notebook, running in jupyter hub on the aws emr. I have read data into a spark dataframe named clusters_df. I'm now trying to create a simple line chart with k as the x axis and score as the y axis. I tried converting the dataframe to a pandas dataframe, since I don't think spark has built in data visualization. When I try to display the chart in the jupyter notebook I'm getting the messages below. I've also tried matplotlib. Both code examples are below, with the messages that get returned. Can anyone suggest how to create a line chart with a jupyter notebook running pyspark on an emr?
libraries imported:
import pyspark
##### running on emr
## function to create all tables
from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.functions import first
import pyspark.sql.functions as func
from pyspark.sql.functions import lit,StringType,coalesce,lag,trim, upper, substring
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.functions import round, explode,row_number,udf, length, min, when, format_number
from pyspark.sql.functions import hour, year, month, dayofmonth, date_add, to_date,datediff,dayofyear, weekofyear, date_format, unix_timestamp
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.feature import MinMaxScaler, PCA
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType
from pyspark.ml.feature import StandardScaler
import traceback
import sys
import time
import math
import datetime
import numpy as np
import pandas as pd
UPdate: I want to clarify I'm showing the two code examples below to show two examples of trying to create a linechart visualization in a jupyter notebook running with spark on an emr, that both fail to produce a line chart visualization.
the panadas example just returns the text shown. the matplotlib example returns the error shown because it doesn't seem to recognize spark anymore once the magic code is run in the cell to import matplotlib.
importing dataframe:
clusters_df=sqlContext.read.parquet("path")
code:
clusters_df.toPandas().plot.line(x="k",y="score");
output:
<AxesSubplot:xlabel='k'>
code:
%matplotlib inline
import matplotlib.pyplot as plt
pnds_df=clusters_df.toPandas()
plt.plot(pnds_df['k'],pnds_df['score'])
plt.show()
output:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-33-5e7649bc56fb> in <module>
3 import matplotlib.pyplot as plt
4
----> 5 pnds_df=clusters_df.toPandas()
6
7 plt.plot(pnds_df['k'],pnds_df['score'])
NameError: name 'clusters_df' is not defined
I am getting an error:
"OptionError: 'You can only set the value of existing options'"
after using the below code, please can someone help?
### Data Analaysis
import numpy as np
import pandas as pd
### Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
import plotly.express as px
from plotly.subplots import make_subplots
from scipy.interpolate import make_interp_spline, BSpline
%matplotlib inline
import warnings
warnings.simplefilter(action='ignore', category=Warning)
pd.set_option('display.max_columns', None)
pd.options.plotting.backend = "plotly"
The option error "you can only set the value of existing options" are coming because of your last line of script
pd.options.plotting.backend = "plotly"
where you are trying to update your pandas backend plotting method.
Please update your pandas and plotly packages. Because it only works with pandas version>=0.25 and plotly version>=4.8.
So update both the packages, restart your kernel, if you are working on Jupyter notebook. For upgrading the packages
pip install -U pandas
pip install -U plotly
I'm trying to import a file to c-lab. I've tried various versions https://buomsoo-kim.github.io/colab/2018/04/15/Colab-Importing-CSV-and-JSON-files-in-Google-Colab.md/
#import packages
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import io
print("Setup Complete")
from google.colab import files
uploaded = files.upload()
# Read the file into a variable power_data
#power_data = pd.read("DE_power prices historical.csv")
data = pd.read_csv('DE_power prices historical.csv', error_bad_lines=False)
Keep getting error:
enter image description here
Try using this method it works a bit easier:
Upload .csv files to your Google Drive
Run the following code in your Colab cell:
from google.colab import drive
drive.mount('/content/drive')
Follow the link the output cell gives you and verify your Gmail account
Import using Pandas like:
power_data = pd.read_csv('/content/drive/My Drive/*filename.csv*')
Mount google drive in google-colab
from google.colab import drive
drive.mount('/content/drive')
copy file path add into URL variable
import pandas as pd
url = 'add copy path your csv file'
df=pd.read_csv(url)
df.head()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
dataset = pd.read_csv('homeprice.csv')
print(dataset)
Output
NameError Traceback (most recent call
last) in
----> 1 dataset = pd.read_csv('homeprice.csv')
2 print(dataset)
NameError: name 'pd' is not defined
You mention that you are using a Jupiter notebook so you may have two code cells:
First is with the imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
And the second is with the functionality:
dataset = pd.read_csv('homeprice.csv')
print(dataset)
In Jupiter notebooks you may run each cell separately. If this is what you do, you should remember the run the first cell before you execute the second one for the first time. This would make sure the right stuff is imported in the current context for you second cell.
!pip install tensorflow-gpu==2.2.0.0rc2
import tensorflow as tf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Or
Just go with the latest TensorFlow version in the Colab
Using Python can I open a text file, read it into an array, then save the file as a NetCDF?
The following script I wrote was not successful.
import os
import pandas as pd
import numpy as np
import PIL.Image as im
path = 'C:\path\to\data'
grb = [[]]
for fn in os.listdir(path):
file = os.path.join(path,fn)
if os.path.isfile(file):
df = pd.read_table(file,skiprows=6)
grb.append(df)
df2 = pd.np.array(grb)
#imarray = im.fromarray(df2) ##cannot handle this data type
#imarray.save('Save_Array_as_TIFF.tif')
i once used xray or xarray (they renamed them selfs) to get a NetCDF file into an ascii dataframe... i just googled and appearantly they have a to_netcdf function
import xarray and it allows you to treat dataframes just like pandas.
so give this a try:
df.to_netcdf(file_path)
xarray slow to save netCDF