I am getting an error when I am going to read a VCF file using scikit-allel library inside a docker image and os ubuntu 18.04. It shows that
raise RuntimeError('VCF file is missing mandatory header line ("#CHROM...")')
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")
But in the VCF file is well-formatted.
Here is my code of how I applied :
import pandas as pd
import os
import numpy as np
import allel
import tkinter as tk
from tkinter import filedialog
import matplotlib.pyplot as plt
from scipy.stats import norm
GenomeVariantsInput = allel.read_vcf('quartet_variants_annotated.vcf', samples=['ISDBM322015'],fields=[ 'variants/CHROM', 'variants/ID', 'variants/REF',
'variants/ALT','calldata/GT'])
version what Installed :
Python 3.6.9
Numpy 1.19.5
pandas 1.1.5
scikit-allel 1.3.5
You need to add a line like this in the first:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
but it's not static for all of the files, you have to make a Header like above for your file. (I suggest try this header first and if it's got error then customize it)
Related
I want to open the .xlsx file through read_excel().
However, an error message is printed even though the openpyxl and pandas packages are installed.
The pandas version is 0.24.2 and the openpyxl version is 3.0.10.
The error message is - ValueError: Unknown engine: openpyxl
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx',engine='openpyxl')
print(retail_df.head())
In Pandas 0.24.2 the default engine is openpyxl and for that, you don't need to set it up manually during loading the excel file inside the read_excel() function.
So now your updated working code for reading excel files is :
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx')
print(retail_df.head())
Testing result from my side with this code.
I saw there is a way to directly read XML files using pandas. I followed and used this package. However, I keep getting errors.
https://pypi.org/project/pandas-read-xml/
import pandas as pd
import pandas_read_xml as pdx
from pandas.io.json import json_normalize
The error was generated by last line and the error is
ImportError: cannot import name 'json_normalize'
I am using kernel python 3, can anyone tell me what was wrong with it?
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
I imported these libraries and used %matplotlib inline . The code runs fine but the terminal in vs code shows the error :
Syntax Error: invalid syntax
I found an answer to use %matplotlibinline in VS Code
# Suppress matplotlib user warnings
# Necessary for newer version of matplotlib
import warnings
warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib")
#
# Display inline matplotlib plots with IPython
from IPython import get_ipython
get_ipython().run_line_magic('matplotlib', 'inline')
%matplotlib inline is not a valid python command. This has a specific meaning when used in ipython/jupyter.
Whenever I try to run the following code I get an error message
import matplotlib
import pandas as pd
import _pickle as pickle
import numpy as np
print(matplotlib.version)
AttributeError: module 'matplotlib' has no attribute 'version'
I get the same error if i try this: matplotlib.style.use('bmh')
I'm using PyCharm
The errors are stating those attributes do not exist. It is not a problem of the installation.
To get the version:
print(matplotlib.__version__)
Regarding the style:
You do not "apply" that in matplotlib. You may want to use the module pyplot:
import matplotlib.pyplot as plt
plt.style.use("bmh")
I'm trying to import a file to c-lab. I've tried various versions https://buomsoo-kim.github.io/colab/2018/04/15/Colab-Importing-CSV-and-JSON-files-in-Google-Colab.md/
#import packages
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import io
print("Setup Complete")
from google.colab import files
uploaded = files.upload()
# Read the file into a variable power_data
#power_data = pd.read("DE_power prices historical.csv")
data = pd.read_csv('DE_power prices historical.csv', error_bad_lines=False)
Keep getting error:
enter image description here
Try using this method it works a bit easier:
Upload .csv files to your Google Drive
Run the following code in your Colab cell:
from google.colab import drive
drive.mount('/content/drive')
Follow the link the output cell gives you and verify your Gmail account
Import using Pandas like:
power_data = pd.read_csv('/content/drive/My Drive/*filename.csv*')
Mount google drive in google-colab
from google.colab import drive
drive.mount('/content/drive')
copy file path add into URL variable
import pandas as pd
url = 'add copy path your csv file'
df=pd.read_csv(url)
df.head()