I have imported
import numpy as np
and I have used
xy = np.loadtxt('./Desktop/wine.csv', delimiter=',', dtype=np.float32, skiprows=1)
but Python3 is not able to read the file and I really do not know why. Can anyone help me please?
Can you try again by specifying the full file path?
/home/username/Desktop/wine.cvs
Related
I am getting an error when I am going to read a VCF file using scikit-allel library inside a docker image and os ubuntu 18.04. It shows that
raise RuntimeError('VCF file is missing mandatory header line ("#CHROM...")')
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")
But in the VCF file is well-formatted.
Here is my code of how I applied :
import pandas as pd
import os
import numpy as np
import allel
import tkinter as tk
from tkinter import filedialog
import matplotlib.pyplot as plt
from scipy.stats import norm
GenomeVariantsInput = allel.read_vcf('quartet_variants_annotated.vcf', samples=['ISDBM322015'],fields=[ 'variants/CHROM', 'variants/ID', 'variants/REF',
'variants/ALT','calldata/GT'])
version what Installed :
Python 3.6.9
Numpy 1.19.5
pandas 1.1.5
scikit-allel 1.3.5
You need to add a line like this in the first:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
but it's not static for all of the files, you have to make a Header like above for your file. (I suggest try this header first and if it's got error then customize it)
I am trying to write my own optimizer for pytorch and am looking at the source code https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
to get started. When I try to run the code for the SGD, I get an error on the line
from .optimizer import Optimizer, required. I've searched everywhere but I'm not sure where to obtain the .optimizer package. Any help here is greatly appreciated.
Here import .optimizer means import optimizer.py from the same directory as the current .py file, so this file
I don't know why, but my file located at "FileStore/tables/train.csv" is not readable using pandas in the Databricks platform. I tried :
pd.read_csv("/dbfs/FileStore/tables/train.csv")
and got
FileNotFoundError: [Errno 2] File /dbfs/FileStore/tables/train.csv does not exist: '/dbfs/FileStore/tables/train.csv'
Sorry for the late reply.
You can read microsoft docs HERE
It will help you in copying the file uploaded in "FileStore" to "Tmp" location.
This should enable your code like:
import pandas as pd
pd.read_csv(path)
and
import pyspark.pandas as ps
ps.read_csv(path)
I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS.
Code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
# plt.show()
Things that I tried:
plt.savefig("/FileStore/my-file.png")
[Errno 2] No such file or directory: '/FileStore/my-file.png'
fig = plt.gcf()
dbutils.fs.put("/dbfs/FileStore/my-file.png", fig)
TypeError: has the wrong type - (,) is expected.
After some research, I think the fs.put only works if you want to save text files.
running the above code with plt.show() will get you a bar graph - I want to be able to save the bar graph as an image to DBFS. Any help is appreciated, thanks in advance!
Easier way, just with matplotlib.pyplot. Fix the dbfs path:
Example
import matplotlib.pyplot as plt
plt.scatter(x=[1,2,3], y=[2,4,3])
plt.savefig('/dbfs/FileStore/figure.png')
You can do this by saving the figure to memory and then using the Python local file APIs to write to the DataBricks filesystem (DBFS).
Example:
import matplotlib.pyplot as plt
from io import BytesIO
# Create a plt or fig, then:
buf = BytesIO()
plt.savefig(buf, format='png')
path = '/dbfs/databricks/path/to/file.png'
# Make sure to open the file in bytes mode
with open(path, 'wb') as f:
# You can also use Bytes.IO.seek(0) then BytesIO.read()
f.write(buf.getvalue())
I am attempting run in Jupyter
import pandas as pd
import matplotlib.pyplot as plt # plotting
import numpy as np # dense matrices
from scipy.sparse import csr_matrix # sparse matrices
%matplotlib inline
However when loading the dataset with
wiki = pd.read_csv('people_wiki.csv')
# add id column
wiki['id'] = range(0, len(wiki))
wiki.head(10)
the following error persists
NameError Traceback (most recent call last)
<ipython-input-1-56330c326580> in <module>()
----> 1 wiki = pd.read_csv('people_wiki.csv')
2 # add id column
3 wiki['id'] = range(0, len(wiki))
4 wiki.head(10)
NameError: name 'pd' is not defined
Any suggestions appreciated
Select Restart & Clear Output and run the cells again from the beginning.
I had the same issue, and as Ivan suggested in the comment, this resolved it.
If you came here from a duplicate, notice also that your code needs to contain
import pandas as pd
in the first place. If you are using a notebook like Jupyter and it's already there, or if you just added it, you probably need to re-evaluate the cell, as suggested in the currently top-voted answer by martin-martin.
python version will need 3.6 above, I think you have been use the python 2.7. Please select from top right for your python env version.
Be sure to load / import Pandas first
When stepping through the Anaconda Navigator demo, I found that pressing "play" on the first line before inputting the second line resolved the issue.