I am trying to get some info from TK form that i built to new CSV file, unfortunately i am getting some Errors.
The code:
def sub_func():
with open('Players.csv','w') as df:
df = pd.DataFrame
i=len(df.index)
data=[]
data.append(entry_box1.get())
data.append(entry_box2.get())
data.append(entry_box3.get())
data.append(entry_box4.get())
if i==4:
df.loc[i,:]=data
The Errors:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\tkinter\_init.py", line 1883, in __call_
return self.func(*args)
File "C:/Users/user/PycharmProjects/test11/Final Project/registration form.py", line 23, in sub_func
i=len(df.index)
TypeError: object of type 'pandas._libs.properties.AxisProperty' has no len()
You have two problems in your code:
The file is opened as df, but df is overwritten on the next line
df = pd.DataFrame does not return anything as it would need parenthesis () to create a dataframe. This means that df does not exist -> error comes when you try to take length of 'nothing'.
Solution:
df = pd.read_csv('./Players') # Make sure that the file is in the same directory
i = len(df.index)
Not sure what you are trying to achieve, but you might need to work on the rest of the code, too.
Related
I'm working on this script that first takes all .csv's and converts them .xlsx's in a separate folder. I'm getting the first file to output exactly how I want in the 'Script files' folder, but then it throws a Traceback error before it does the second one.
Script code below, Traceback error below that. Some path data removed for privacy:
import pandas as pd
import matplotlib.pyplot as plt
import os
# Assign current directory and list files there
f_path = os.path.dirname(__file__)
rd_path = f_path+'\\Raw Data'
sc_path = f_path+'\\Script files'
# Create /Script files folder
if os.path.isdir(sc_path) == False:
os.mkdir(sc_path)
print("\nCreating new Script files path here...",sc_path)
else:
print("\nScript files directory exists!")
# List files in Raw Data directory
print("\nRaw Data files in the directory:\n",rd_path,"\n")
for filename in os.listdir(rd_path):
f = os.path.join(rd_path,filename)
if os.path.isfile(f):
print(filename)
print("\n\n\n")
# Copy and edit data files to /Script files folder
for filename in os.listdir(rd_path):
src = os.path.join(rd_path,filename)
if os.path.isfile(src):
name = os.path.splitext(filename)[0]
read_file = pd.read_csv(src)
result = sc_path+"\\"+name+'.xlsx'
read_file.to_excel(result)
print(src,"\nconverted and written to: \n",result,"\n\n")
Traceback (most recent call last):
File "C:\Users\_________________\Graph.py", line 32, in <module>
read_file = pd.read_csv(src)
File "C:\Users\_____________\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\______________\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\_____________\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\_____________\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\_____________\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 47, saw 8
Have you tried to convert to xlsx the second file in the folder? I'm not sure but it seems like there's a problem when Pandas reads the csv.
So I found out this issue was due to the formatting of my 2nd .csv file. Opening it up, I had a number of cells above the data I wanted. After deleting these extra rows so that the relevant data started in row 1, the code ran correctly. Looks like I'll have to add code to detect these extra rows and delete them prior to attempting to convert to .xlsx.
My file contains 9976 values but while I am using the command
fid = open('commemi2d0.1.node','r')
a = np.fromfile(fid, dtype=np.float64, sep=" ")
it shows an Error message as:
DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
also
a.shape shows only 9975 values
I try to save my web scraping results in a csv file by using openpyxl
If I print my results with the following command, I am able to see al the data necessary:
i = 1
g = len(titles)
while i < g:
print(titles[i].text)
print('\n')
print(sections[i].text)
print('\n')
# print(dates[i].text)
# print('\n')
i += 1
but if I try to save the results with the following comamand to csv file:
for r in range(1,5):
for c in range(1,4):
sheet.cell(row=r,column=c).titles[i].text
workbook.save(path1)
I do get the follwing error message:
AttributeError Traceback (most recent call last)
<ipython-input-487-201b34800c46> in <module>
1 for r in range(1,5):
2 for c in range(1,4):
----> 3 sheet.cell(row=r,column=c).titles[i].text
4
5 workbook.save(path1)
AttributeError: 'Cell' object has no attribute 'titles'
As I am a total beginner, it would be great to get an explanation what is wrong... Thanks!
If you look at the error, you'll see it says 'Cell' object has no attribute 'titles'. That means that you're looking for titles in the wrong place, and in your example with printing, you just have titles as a list. This sort of seems like a makeshift solution, so I'd recommend using a data-frame with something like pandas, as it's easy and intuitive to work with in python and has simple exporting. Also, it's a little hard to troubleshoot, as this isn't a minimum workable example. That means I can't see the type of the variables, or data in them.
Is it simply this line:
sheet.cell(row=r,column=c).titles[i].text
Should be an assignment?
sheet.cell(row=r,column=c) = titles[i].text
I'm still pretty new to Snakemake, and I've been having trouble with a rule I'm trying to write.
I've been trying to combine using snakemake.remote.NCBI with accessing a pandas dataframe and using a wildcard, but I can't seem to make it work.
I have a tsv file called genomes.tsv with several columns, where each row is one species. One is "id" and has the genbank id for the species's genomes. Another "species" has a short string unique for each species. In my Snakefile, genomes.tsv is imported as genomes, with only the id and species column, then species is set as genomes index and dropped from genome.
I want to use the values in "species" as values for the wildcard {species} in my workflow, and I want my rule to use snakemake.remote.NCBI to download each species's genome sequence in fasta format and then output it to a file "{species}_gen.fa"
from snakemake.remote.NCBI import RemoteProvider as NCBIRemoteProvider
import pandas as pd
configfile: "config.yaml"
NCBI = NCBIRemoteProvider(email=config["email"]) # email required by NCBI to prevent abuse
genomes = pd.read_table(config["genomes"], usecols=["species","id"]).set_index("species")
SPECIES = genomes.index.values.tolist()
rule all:
input: expand("{species}_gen.fasta",species=SPECIES)
rule download_and_count:
input:
lambda wildcards: NCBI.remote(str(genomes[str(wildcards.species)]) + ".fasta", db="nuccore")
output:
"{species}_gen.fasta"
shell:
"{input} > {output}"
Currently, trying to run my code results in a key error, but it says that the key is a value from species, so it should be able to get the corresponding genbank id from genomes.
EDIT: here is the error
InputFunctionException in line 18 of /home/sjenkins/work/olflo/Snakefile:
KeyError: 'cappil'
Wildcards:
species=cappil
cappil is a valid value for {species}, and it should be usable as an index, I think. Here are the first few rows of genomes, for reference:
species id accession name assembly
cappil 8252558 GCA_004027915.1 Capromys_pilorides_(Desmarest's_hutia) CapPil_v1_BIUU
cavape 1067048 GCA_000688575.1 Cavia_aperea_(Brazilian_guinea_pig) CavAp1.0
cavpor 175118 GCA_000151735.1 Cavia_porcellus_(domestic_guinea_pig) Cavpor3.0
Update:
I tried changing the the input line to:
lambda wildcards: NCBI.remote(str(genomes[genomes['species'] == wildcards.species].iloc[0]['id']) + ".fasta", db="nuccore")
but that gives me the error message:
Traceback (most recent call last):
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/init.py", line 547, in snakemake
export_cwl=export_cwl)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/workflow.py", line 421, in execute
dag.init()
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/dag.py", line 122, in init
job = self.update([job], progress=progress)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/dag.py", line 603, in update
progress=progress)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/dag.py", line 666, in update_
progress=progress)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/dag.py", line 603, in update
progress=progress)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/dag.py", line 655, in update_
missing_input = job.missing_input
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/jobs.py", line 398, in missing_input
for f in self.input
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/jobs.py", line 399, in
if not f.exists and not f in self.subworkflow_input)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/io.py", line 208, in exists
return self.exists_remote
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/io.py", line 119, in wrapper
v = func(self, *args, **kwargs)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/io.py", line 258, in exists_remote
return self.remote_object.exists()
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/remote/NCBI.py", line 72, in exists
likely_request_options = self._ncbi.guess_db_options_for_extension(self.file_ext, db=self.db, rettype=self.rettype, retmode=self.retmode)
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/remote/NCBI.py", line 110, in file_ext
accession, version, file_ext = self._ncbi.parse_accession_str(self.local_file())
File "/home/sjenkins/miniconda3/envs/olflo/lib/python3.7/site-packages/snakemake/remote/NCBI.py", line 366, in parse_accession_str
assert file_ext, "file_ext must be defined: {}.{}.. Possible values include: {}".format(accession,version,", ".join(list(self.valid_extensions)))
AssertionError: file_ext must be defined: ... Possible values include: est, ssexemplar, gb.xml, docset, fasta.xml, fasta, fasta_cds_na, abstract, txt, gp, medline, chr, flt, homologene, alignmentscores, gbwithparts, seqid, fasta_cds_aa, gpc, uilist, uilist.xml, rsr, xml, gb, gene_table, gss, ft, gp.xml, acc, asn1, gbc
I think you should have:
genomes = pd.read_table(config["genomes"], usecols=["species","id"])
SPECIES = list(genomes['species'])
and then access the ID of a given species with:
lambda wildcards: str(genomes[genomes['species'] == wildcards.species].iloc[0]['id'])
Ok, so it turns out that the reason that I was getting AssertionError: file_ext must be defined: is that NCBIRemoteProvider can't recognize the file extension if the file name that its given doesn't have a valid Genbank accession number. I was giving it file names with genbank ids, so it returned that error.
Also, it seems like the whole genome sequences don't have an accession number that returns all the sequences. Instead there's an accession number for the wgs report and then accession numbers for each scaffold. I decided to try downloading the genomes I need manually instead of trying to download all the scaffolds and then combine them.
I've been given some large sqlite tables that I need to read into dask dataframes. The tables have columns with datetimes (ISO formatted strings) stored as sqlite NUMERIC data type. I am able to read in this kind of data using Pandas' read_sql_table. But, the same call from dask gives an error. Can someone suggest a good workaround? (I do not know of an easy way to change the sqlite data type of these columns from NUMERIC to TEXT.) I am pasting a minimal example below.
import sqlalchemy
import pandas as pd
import dask.dataframe as ddf
connString = "sqlite:///c:\\temp\\test.db"
engine = sqlalchemy.create_engine(connString)
conn = engine.connect()
conn.execute("create table testtable (uid integer Primary Key, datetime NUM)")
conn.execute("insert into testtable values (1, '2017-08-03 01:11:31')")
print(conn.execute('PRAGMA table_info(testtable)').fetchall())
conn.close()
pandasDF = pd.read_sql_table('testtable', connString, index_col='uid', parse_dates={'datetime':'%Y-%m-%d %H:%M:%S'})
pandasDF.head()
daskDF = ddf.read_sql_table('testtable', connString, index_col='uid', parse_dates={'datetime':'%Y-%m-%d %H:%M:%S'})
Here is the traceback:
Warning (from warnings module):
File "C:\Program Files\Python36\lib\site-packages\sqlalchemy\sql\sqltypes.py", line 596
'storage.' % (dialect.name, dialect.driver))
SAWarning: Dialect sqlite+pysqlite does *not* support Decimal objects natively, and SQLAlchemy must convert from floating point - rounding errors and other issues may occur. Please consider storing Decimal numbers as strings or integers on this platform for lossless storage.
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
daskDF = ddf.read_sql_table('testtable', connString, index_col='uid', parse_dates={'datetime':'%Y-%m-%d %H:%M:%S'})
File "C:\Program Files\Python36\lib\site-packages\dask\dataframe\io\sql.py", line 98, in read_sql_table
head = pd.read_sql(q, engine, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\pandas\io\sql.py", line 416, in read_sql
chunksize=chunksize)
File "C:\Program Files\Python36\lib\site-packages\pandas\io\sql.py", line 1104, in read_query
parse_dates=parse_dates)
File "C:\Program Files\Python36\lib\site-packages\pandas\io\sql.py", line 157, in _wrap_result
coerce_float=coerce_float)
File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1142, in from_records
coerce_float=coerce_float)
File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 6304, in _to_arrays
data = lmap(tuple, data)
File "C:\Program Files\Python36\lib\site-packages\pandas\compat\__init__.py", line 129, in lmap
return list(map(*args, **kwargs))
TypeError: must be real number, not str
EDIT: The comments by #mdurant make me wonder now if this is a bug in sqlalchemy. The following code gives the same error message as pandas does:
import sqlalchemy as sa
from sqlalchemy import text
m = sa.MetaData()
table = sa.Table('testtable', m, autoload=True, autoload_with=engine)
resultList = conn.execute(sa.sql.select(table.columns).select_from(table)).fetchall()
print(resultList)
resultList2 = conn.execute(sa.sql.select(columns=[text('uid'),text('datetime')], from_obj = text('testtable'))).fetchall()
print(resultList2)
Traceback (most recent call last):
File "<ipython-input-20-188c84a35d95>", line 1, in <module>
print(resultList)
File "c:\program files\python36\lib\site-packages\sqlalchemy\engine\result.py", line 156, in __repr__
return repr(sql_util._repr_row(self))
File "c:\program files\python36\lib\site-packages\sqlalchemy\sql\util.py", line 329, in __repr__
", ".join(trunc(value) for value in self.row),
TypeError: must be real number, not str
Puzzling.
Here is some further information, which hopefully can lead to an answer.
The query being execute at the line in question is
pd.read_sql(sql.select(table.columns).select_from(table),
engine, index_col='uid')
which fails as you show (the limit is not relevant here).
However, the text version of the same query
sql.select(table.columns).select_from(table).compile().string
-> 'SELECT testtable.uid, testtable.datetime \nFROM testtable'
pd.read_sql('SELECT testtable.uid, testtable.datetime \nFROM testtable',
engine, index_col='uid') # works fine
The following workaround, using a cast in the query, does work (but isn't pretty):
import sqlalchemy as sa
engine = sa.create_engine(connString)
table = sa.Table('testtable', m, autoload=True, autoload_with=engine)
uid, dt = list(table.columns)
q = sa.select([dt.cast(sa.types.String)]).select_from(table)
daskDF = ddf.read_sql_table(q, connString, index_col=uid.label('uid'))
-edit-
Simpler form of this that also appears to work (see comment)
daskDF = ddf.read_sql_table('testtable', connString, index_col='uid',
columns=['uid', sa.sql.column('datetime').cast(sa.types.String).label('datetime')])