VCF file is missing mandatory header line ("#CHROM...") - pandas

I am getting an error when I am going to read a VCF file using scikit-allel library inside a docker image and os ubuntu 18.04. It shows that
raise RuntimeError('VCF file is missing mandatory header line ("#CHROM...")')
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")
But in the VCF file is well-formatted.
Here is my code of how I applied :
import pandas as pd
import os
import numpy as np
import allel
import tkinter as tk
from tkinter import filedialog
import matplotlib.pyplot as plt
from scipy.stats import norm
GenomeVariantsInput = allel.read_vcf('quartet_variants_annotated.vcf', samples=['ISDBM322015'],fields=[ 'variants/CHROM', 'variants/ID', 'variants/REF',
'variants/ALT','calldata/GT'])
version what Installed :
Python 3.6.9
Numpy 1.19.5
pandas 1.1.5
scikit-allel 1.3.5

You need to add a line like this in the first:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
but it's not static for all of the files, you have to make a Header like above for your file. (I suggest try this header first and if it's got error then customize it)

Related

Cannot read .xlsx file with read_excel()

I want to open the .xlsx file through read_excel().
However, an error message is printed even though the openpyxl and pandas packages are installed.
The pandas version is 0.24.2 and the openpyxl version is 3.0.10.
The error message is - ValueError: Unknown engine: openpyxl
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx',engine='openpyxl')
print(retail_df.head())
In Pandas 0.24.2 the default engine is openpyxl and for that, you don't need to set it up manually during loading the excel file inside the read_excel() function.
So now your updated working code for reading excel files is :
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx')
print(retail_df.head())
Testing result from my side with this code.

pandas-read-xml has error on 'json-normalize'

I saw there is a way to directly read XML files using pandas. I followed and used this package. However, I keep getting errors.
https://pypi.org/project/pandas-read-xml/
import pandas as pd
import pandas_read_xml as pdx
from pandas.io.json import json_normalize
The error was generated by last line and the error is
ImportError: cannot import name 'json_normalize'
I am using kernel python 3, can anyone tell me what was wrong with it?

Code is running but vs code terminal shows syntax error

import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
I imported these libraries and used %matplotlib inline . The code runs fine but the terminal in vs code shows the error :
Syntax Error: invalid syntax
I found an answer to use %matplotlibinline in VS Code
# Suppress matplotlib user warnings
# Necessary for newer version of matplotlib
import warnings
warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib")
#
# Display inline matplotlib plots with IPython
from IPython import get_ipython
get_ipython().run_line_magic('matplotlib', 'inline')
%matplotlib inline is not a valid python command. This has a specific meaning when used in ipython/jupyter.

Can't call matplotlib attributes v3.2.2

Whenever I try to run the following code I get an error message
import matplotlib
import pandas as pd
import _pickle as pickle
import numpy as np
print(matplotlib.version)
AttributeError: module 'matplotlib' has no attribute 'version'
I get the same error if i try this: matplotlib.style.use('bmh')
I'm using PyCharm
The errors are stating those attributes do not exist. It is not a problem of the installation.
To get the version:
print(matplotlib.__version__)
Regarding the style:
You do not "apply" that in matplotlib. You may want to use the module pyplot:
import matplotlib.pyplot as plt
plt.style.use("bmh")

Reading csv in colab errors

I'm trying to import a file to c-lab. I've tried various versions https://buomsoo-kim.github.io/colab/2018/04/15/Colab-Importing-CSV-and-JSON-files-in-Google-Colab.md/
#import packages
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import io
print("Setup Complete")
from google.colab import files
uploaded = files.upload()
# Read the file into a variable power_data
#power_data = pd.read("DE_power prices historical.csv")
data = pd.read_csv('DE_power prices historical.csv', error_bad_lines=False)
Keep getting error:
enter image description here
Try using this method it works a bit easier:
Upload .csv files to your Google Drive
Run the following code in your Colab cell:
from google.colab import drive
drive.mount('/content/drive')
Follow the link the output cell gives you and verify your Gmail account
Import using Pandas like:
power_data = pd.read_csv('/content/drive/My Drive/*filename.csv*')
Mount google drive in google-colab
from google.colab import drive
drive.mount('/content/drive')
copy file path add into URL variable
import pandas as pd
url = 'add copy path your csv file'
df=pd.read_csv(url)
df.head()