When using AlphaFold Colab (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb), I am getting the following error on step 4 ("Search against genetic databases"):
printscreen of error message
KeyError Traceback (most recent call last)
<ipython-input-12-a4ead9510dc9> in <module>
145 for db_name, db_results in raw_msa_results.items():
146 merged_msa = notebook_utils.merge_chunked_msa(
--> 147 results=db_results, max_hits=MAX_HITS.get(db_name))
148 if merged_msa.sequences and db_name != 'uniprot':
149 single_chain_msas.append(merged_msa)
1 frames
/opt/conda/lib/python3.7/site-packages/alphafold/notebooks/notebook_utils.py in merge_chunked_msa(results, max_hits)
105 e_values_dict = parsers.parse_e_values_from_tblout(chunk['tbl'])
106 # Jackhmmer lists sequences as <sequence name>/<residue from>-<residue to>.
--> 107 e_values = [e_values_dict[t.partition('/')[0]] for t in msa.descriptions]
108 chunk_results = zip(
109 msa.sequences, msa.deletion_matrix, msa.descriptions, e_values)
/opt/conda/lib/python3.7/site-packages/alphafold/notebooks/notebook_utils.py in <listcomp>(.0)
105 e_values_dict = parsers.parse_e_values_from_tblout(chunk['tbl'])
106 # Jackhmmer lists sequences as <sequence name>/<residue from>-<residue to>.
--> 107 e_values = [e_values_dict[t.partition('/')[0]] for t in msa.descriptions]
108 chunk_results = zip(
109 msa.sequences, msa.deletion_matrix, msa.descriptions, e_values)
KeyError: '0000|query')
Notably, this error seems to be dependent on the input sequence, because I get it with some sequences but not with others. I am using ColabPro and I have tried to tweak parameters like the amount of computing power but it did not help. I have never experienced this error before switching to ColabPro. Any help would be appreaciated.
Related
So, I have a table from BigQuery public tables (Google Analytics):
print(bigquery_client.query(
"""
SELECT hits.0.productName
from `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
where date between '20160101' and '20161231'
""").to_dataframe())
Additional code:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/Users/<UserName>/Desktop/folder/key/<Key_name>.json'
bigquery_client = bigquery.Client()
ERROR in Jupiter Notebook:
BadRequest Traceback (most recent call last)
<ipython-input-31-424833cf8827> in <module>
----> 1 print(bigquery_client.query(
2 """
3 SELECT hits.0.productName
4 from `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
5 where date between '20160101' and '20161231'
~/opt/anaconda3/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object, max_results, geography_as_object)
1563 :mod:`shapely` library cannot be imported.
1564 """
-> 1565 query_result = wait_for_query(self, progress_bar_type, max_results=max_results)
1566 return query_result.to_dataframe(
1567 bqstorage_client=bqstorage_client,
~/opt/anaconda3/lib/python3.8/site-packages/google/cloud/bigquery/_tqdm_helpers.py in wait_for_query(query_job, progress_bar_type, max_results)
86 )
87 if progress_bar is None:
---> 88 return query_job.result(max_results=max_results)
89
90 i = 0
~/opt/anaconda3/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py in result(self, page_size, max_results, retry, timeout, start_index, job_retry)
1370 do_get_result = job_retry(do_get_result)
1371
-> 1372 do_get_result()
1373
1374 except exceptions.GoogleAPICallError as exc:
~/opt/anaconda3/lib/python3.8/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
281 self._initial, self._maximum, multiplier=self._multiplier
282 )
--> 283 return retry_target(
284 target,
285 self._predicate,
~/opt/anaconda3/lib/python3.8/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
188 for sleep in sleep_generator:
189 try:
--> 190 return target()
191
192 # pylint: disable=broad-except
~/opt/anaconda3/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py in do_get_result()
1360 self._job_retry = job_retry
1361
-> 1362 super(QueryJob, self).result(retry=retry, timeout=timeout)
1363
1364 # Since the job could already be "done" (e.g. got a finished job
~/opt/anaconda3/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py in result(self, retry, timeout)
711
712 kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
--> 713 return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
714
715 def cancelled(self):
~/opt/anaconda3/lib/python3.8/site-packages/google/api_core/future/polling.py in result(self, timeout, retry)
135 # pylint: disable=raising-bad-type
136 # Pylint doesn't recognize that this is valid in this case.
--> 137 raise self._exception
138
139 return self._result
BadRequest: 400 Syntax error: Unexpected keyword WHERE at [4:1]
(job ID: 3c15e031-ee7d-4594-a577-0237f8282695)
-----Query Job SQL Follows-----
| . | . | . | . | . | . |
1:
2:SELECT hits.0.productName
3:from `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
4:where date between '20160101' and '20161231'
| . | . | . | . | . | . |
As seen at the screenshot I have hits column, which value is a dictionary and I need to fetch the inner dictionary value from '0' column, but there is the error. Actually, I need to take 'productName' values from all numeric columns.
An approach you can take to solve this will filter the data you want directly in the query.
Filtering from BigQuery:
First for a better understanding, take a look on the data schema for the fields that contains product names:
Picture-schema
The first possible field could be hits.item.productName
hits is a RECORD
item is a RECORD inside item
productName is the string hits.item
The second field could be hits.product.v2ProductName
product is a RECORD inside item
v2ProductName is the string hits.product
For query a RECORD, you have to 'flat' is, turning it into a table using the expression UNNEST([record]) as described here:
So to return all the unique product names from hits.product.v2ProductName query :
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
sql = """
SELECT
DISTINCT p.v2productname
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(product) AS p
WHERE
date BETWEEN '20160101'
AND '20161231'
AND (p.v2productname IS NOT NULL);
"""
v2productname = client.query(sql).to_dataframe()
print(v2productname)
For use the field hits.item.productName run the following, but all records are null:
SELECT
DISTINCT h.item.productname
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS h,
UNNEST(product) AS p
WHERE
date BETWEEN '20160101'
AND '20161231'
AND (h.item.productname IS NOT NULL);
Filtering from the dataframe:
I tried to process it using a dataframe but its not possible due to the chain of records in the datasets, the function to_dataframe() is not able to process it.
In resume:
Try to filter and process as much of the data as possible in the BigQuery, it will faster and more cost effectively.
I'm new to using Spark's MLLib Python API. I have my data in CSV format like so:
Label 0 1 2 3 4 5 6 7 8 9 ... 758 759 760 761 762 763 764 765 766 767
0 -0.168307 -0.277797 -0.248202 -0.069546 0.176131 -0.152401 0.12664 -0.401460 0.125926 0.279061 ... -0.289871 0.207264 -0.140448 -0.426980 -0.328994 0.328007 0.486793 0.222587 0.650064 -0.513640
3 -0.313138 -0.045043 0.279587 -0.402598 -0.165238 -0.464669 0.09019 0.008703 0.074541 0.142638 ... -0.094025 0.036567 -0.059926 -0.492336 -0.006370 0.108954 0.350182 -0.144818 0.306949 -0.216190
2 -0.379293 -0.340999 0.319142 0.024552 0.142129 0.042989 -0.60938 0.052103 -0.293400 0.162741 ... 0.108854 -0.025618 0.149078 -0.917385 0.110629 0.146427
Can I use this as is by loading it using df = spark.read.format("csv").option("header", "true").load("file.csv")? I'm attempting to train a Random Forest model. I've tried researching it, but it doesn't seem to be a big topic. I don't want to just attempt it without being fully sure it would work because the cluster I use has long queue times.
Yes! You'll want to infer the schema too.
df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("file.csv")
If you have many files with the same column names and data types, save the schema to reuse.
schema = df.schema
And then next time you read a csv file with the same columns, you can
df = spark.read.format("csv").option("header", "true").option("schema", schema).load("file.csv")
This is my code:
import matplotlib.patches as pat
oval = pat.Ellipse(v1_mean,v2_mean,v1_std*2,v2_std*2)
fig,graph = plt.subplots()
graph.scatter(v1,v2)
graph.scatter(v1_mean,v2_mean, s=100)
graph.text(v1_mean,v2_mean, 'Mean')
graph.add_patch(oval)
And this is the error that comes:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-40-2278a0e6f4cf> in <module>()
7 graph.scatter(v1_mean,v2_mean, s=100)
8 graph.text(v1_mean,v2_mean, 'Mean')
----> 9 graph.add_patch(oval)
10
11 graph.xlabel('V1')
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py in add_patch(self, p)
2033 if p.get_clip_path() is None:
2034 p.set_clip_path(self.patch)
-> 2035 self._update_patch_limits(p)
2036 self.patches.append(p)
2037 p._remove_method = lambda h: self.patches.remove(h)
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _update_patch_limits(self, patch)
2053 vertices = patch.get_path().vertices
2054 if vertices.size > 0:
-> 2055 xys = patch.get_patch_transform().transform(vertices)
2056 if patch.get_data_transform() != self.transData:
2057 patch_to_data = (patch.get_data_transform() -
/opt/conda/lib/python3.6/site-packages/matplotlib/patches.py in get_patch_transform(self)
1492
1493 def get_patch_transform(self):
-> 1494 self._recompute_transform()
1495 return self._patch_transform
1496
/opt/conda/lib/python3.6/site-packages/matplotlib/patches.py in _recompute_transform(self)
1476 not directly access the transformation member variable.
1477 """
-> 1478 center = (self.convert_xunits(self.center[0]),
1479 self.convert_yunits(self.center[1]))
1480 width = self.convert_xunits(self.width)
IndexError: invalid index to scalar variable.
Basically, what I am trying to do is plot an oval shape and some data into the same graph. But it seems like the error has got to do with the center of the oval, but I dont know what is exactly wrong. It's strange that I followed exactly what the teacher has done, but mine came with an error while his is ok.
It's strange that I followed exactly what the teacher has done, but mine came with an error while his is ok.
Probably you didn't follow exactly. According to the documentation of matplotlib.patches.Ellipse the xy coordinates of ellipse centre are to be given as a tuple rather than individual arguments, so it's not
oval = pat.Ellipse(v1_mean,v2_mean,v1_std*2,v2_std*2)
but
oval = pat.Ellipse((v1_mean, v2_mean), v1_std*2, v2_std*2)
instead. Unfortunately Ellipse didn't warn about this and stored a single number as the ellipse center.
I am using gensim, but when I try to save to a s3 location with Mmcorpus.serialize it sends an error:
corpora.MmCorpus.serialize('s3://my_bucket/corpus.mm', corpus)
2016-01-12 15:55:41,957 : INFO : storing corpus in Matrix Market format to s3://my_bucket/corpus.mm
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-33-513a98b2dfd4> in <module>()
----> 1 corpora.MmCorpus.serialize('s3://my_bucket/corpus.mm', corpus)
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/gensim/corpora/indexedcorpus.py in serialize(serializer, fname, corpus, id2word, index_fname, progress_cnt, labels, metadata)
92 offsets = serializer.save_corpus(fname, corpus, id2word, labels=labels, metadata=metadata)
93 else:
---> 94 offsets = serializer.save_corpus(fname, corpus, id2word, metadata=metadata)
95
96 if offsets is None:
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/gensim/corpora/mmcorpus.py in save_corpus(fname, corpus, id2word, progress_cnt, metadata)
47 logger.info("storing corpus in Matrix Market format to %s" % fname)
48 num_terms = len(id2word) if id2word is not None else None
---> 49 return matutils.MmWriter.write_corpus(fname, corpus, num_terms=num_terms, index=True, progress_cnt=progress_cnt, metadata=metadata)
50
51 # endclass MmCorpus
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/gensim/matutils.py in write_corpus(fname, corpus, progress_cnt, index, num_terms, metadata)
484 is allowed to be larger than the available RAM.
485 """
--> 486 mw = MmWriter(fname)
487
488 # write empty headers to the file (with enough space to be overwritten later)
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/gensim/matutils.py in __init__(self, fname)
434 if fname.endswith(".gz") or fname.endswith('.bz2'):
435 raise NotImplementedError("compressed output not supported with MmWriter")
--> 436 self.fout = utils.smart_open(self.fname, 'wb+') # open for both reading and writing
437 self.headers_written = False
438
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/smart_open/smart_open_lib.py in smart_open(uri, mode, **kw)
132 return S3OpenWrite(key, **kw)
133 else:
--> 134 raise NotImplementedError("file mode %s not supported for %r scheme", mode, parsed_uri.scheme)
135
136 elif parsed_uri.scheme in ("hdfs", ):
NotImplementedError: ('file mode %s not supported for %r scheme', 'wb+', 's3')
NOTE: s3://my_bucket exists (with another name), and corpus is the same from the tutorial of gensim.
Which is the correct way of do it? I want to achive the following: store the corpus (or a model, like LDA) in S3 and getting it from S3 and run it again.
>>> import pylab as pl
>>> x = np.linspace(0,4*np.pi, 100)
>>> pl.plot(x, np.sin(x))
[<matplotlib.lines.Line2D object at 0x025B8350>]
after install numpy, scipy, sympy, matplotlib, ipython
---------------------------------------------------------------------------
TypeError Python 2.7.3: C:\Python27\python.exe
Fri Sep 28 09:59:01 2012
A problem occured executing Python code. Here is the sequence of function
calls leading up to the error, with the most recent (innermost) call last.
C:\Python27\scripts\ipython.py in <module>()
13
14 [or simply IPython.Shell.IPShell().mainloop(1) ]
15
16 and IPython will be your working environment when you start python. The final
17 sys.exit() call will make python exit transparently when IPython finishes, so
18 you don't have an extra prompt to get out of.
19
20 This is probably useful to developers who manage multiple Python versions and
21 don't want to have correspondingly multiple IPython versions. Note that in
22 this mode, there is no way to pass IPython any command-line options, as those
23 are trapped first by Python itself.
24 """
25
26 import IPython.Shell
27
---> 28 IPython.Shell.start().mainloop()
global IPython.Shell.start.mainloop = undefined
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
C:\Python27\lib\site-packages\IPython\Shell.pyc in start(user_ns=None)
1244
1245 # New versions of pygtk don't need the brittle threaded support.
1246 th_mode = check_gtk(th_mode)
1247 return th_shell[th_mode]
1248
1249
1250 # This is the one which should be called by external code.
1251 def start(user_ns = None):
1252 """Return a running shell instance, dealing with threading options.
1253
1254 This is a factory function which will instantiate the proper IPython shell
1255 based on the user's threading choice. Such a selector is needed because
1256 different GUI toolkits require different thread handling details."""
1257
1258 shell = _select_shell(sys.argv)
-> 1259 return shell(user_ns = user_ns)
1260
1261 # Some aliases for backwards compatibility
1262 IPythonShell = IPShell
1263 IPythonShellEmbed = IPShellEmbed
1264 #************************ End of file <Shell.py> ***************************
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
C:\Python27\lib\site-packages\IPython\Shell.pyc in __init__(self=<IPython.Shell.IPShell instance>, argv=None, user_ns=None, user_global_ns=None, debug=1, shell_class=<class 'IPython.iplib.InteractiveShell'>)
58 # Default timeout for waiting for multithreaded shells (in seconds)
59 GUI_TIMEOUT = 10
60
61 #-----------------------------------------------------------------------------
62 # This class is trivial now, but I want to have it in to publish a clean
63 # interface. Later when the internals are reorganized, code that uses this
64 # shouldn't have to change.
65
66 class IPShell:
67 """Create an IPython instance."""
68
69 def __init__(self,argv=None,user_ns=None,user_global_ns=None,
70 debug=1,shell_class=InteractiveShell):
71 self.IP = make_IPython(argv,user_ns=user_ns,
72 user_global_ns=user_global_ns,
---> 73 debug=debug,shell_class=shell_class)
global For = undefined
global more = undefined
global details = undefined
global see = undefined
global the = undefined
global __call__ = undefined
global method = undefined
global below. = undefined
74
75 def mainloop(self,sys_exit=0,banner=None):
76 self.IP.mainloop(banner)
77 if sys_exit:
78 sys.exit()
79
80 #-----------------------------------------------------------------------------
81 def kill_embedded(self,parameter_s=''):
82 """%kill_embedded : deactivate for good the current embedded IPython.
83
84 This function (after asking for confirmation) sets an internal flag so that
85 an embedded IPython will never activate again. This is useful to
86 permanently disable a shell that is being called inside a loop: once you've
87 figured out what you needed from it, you may then kill it and the program
88 will then continue to run without the interactive shell interfering again.
C:\Python27\lib\site-packages\IPython\ipmaker.pyc in make_IPython(argv=[r'C:\Python27\scripts\ipython.py'], user_ns=None, user_global_ns=None, debug=1, rc_override=None, shell_class=<class 'IPython.iplib.InteractiveShell'>, embedded=False, **kw={})
506 # tweaks. Basically options which affect other options. I guess this
507 # should just be written so that options are fully orthogonal and we
508 # wouldn't worry about this stuff!
509
510 if IP_rc.classic:
511 IP_rc.quick = 1
512 IP_rc.cache_size = 0
513 IP_rc.pprint = 0
514 IP_rc.prompt_in1 = '>>> '
515 IP_rc.prompt_in2 = '... '
516 IP_rc.prompt_out = ''
517 IP_rc.separate_in = IP_rc.separate_out = IP_rc.separate_out2 = '0'
518 IP_rc.colors = 'NoColor'
519 IP_rc.xmode = 'Plain'
520
--> 521 IP.pre_config_initialization()
522 # configure readline
523
524 # update exception handlers with rc file status
525 otrap.trap_out() # I don't want these messages ever.
526 IP.magic_xmode(IP_rc.xmode)
527 otrap.release_out()
528
529 # activate logging if requested and not reloading a log
530 if IP_rc.logplay:
531 IP.magic_logstart(IP_rc.logplay + ' append')
532 elif IP_rc.logfile:
533 IP.magic_logstart(IP_rc.logfile)
534 elif IP_rc.log:
535 IP.magic_logstart()
536
C:\Python27\lib\site-packages\IPython\iplib.pyc in pre_config_initialization(self=<IPython.iplib.InteractiveShell object>)
820 self.user_ns, # globals
821 # Skip our own frame in searching for locals:
822 sys._getframe(depth+1).f_locals # locals
823 ))
824
825 def pre_config_initialization(self):
826 """Pre-configuration init method
827
828 This is called before the configuration files are processed to
829 prepare the services the config files might need.
830
831 self.rc already has reasonable default values at this point.
832 """
833 rc = self.rc
834 try:
--> 835 self.db = pickleshare.PickleShareDB(rc.ipythondir + "/db")
global Optional = undefined
global inputs = undefined
836 except exceptions.UnicodeDecodeError:
837 print "Your ipythondir can't be decoded to unicode!"
838 print "Please set HOME environment variable to something that"
839 print r"only has ASCII characters, e.g. c:\home"
840 print "Now it is",rc.ipythondir
841 sys.exit()
842 self.shadowhist = IPython.history.ShadowHist(self.db)
843
844 def post_config_initialization(self):
845 """Post configuration init method
846
847 This is called after the configuration files have been processed to
848 'finalize' the initialization."""
849
850 rc = self.rc
C:\Python27\lib\site-packages\IPython\Extensions\pickleshare.pyc in __init__(self=PickleShareDB('C:\Documents and Settings\martinhylee\_ipython\db'), root=u'C:\\Documents and Settings\\martinhylee\\_ipython/db')
38 import cPickle as pickle
39 import UserDict
40 import warnings
41 import glob
42
43 def gethashfile(key):
44 return ("%02x" % abs(hash(key) % 256))[-2:]
45
46 _sentinel = object()
47
48 class PickleShareDB(UserDict.DictMixin):
49 """ The main 'connection' object for PickleShare database """
50 def __init__(self,root):
51 """ Return a db object that will manage the specied directory"""
52 self.root = Path(root).expanduser().abspath()
---> 53 if not self.root.isdir():
54 self.root.makedirs()
55 # cache has { 'key' : (obj, orig_mod_time) }
56 self.cache = {}
57
58
59 def __getitem__(self,key):
60 """ db['key'] reading """
61 fil = self.root / key
62 try:
63 mtime = (fil.stat()[stat.ST_MTIME])
64 except OSError:
65 raise KeyError(key)
66
67 if fil in self.cache and mtime == self.cache[fil][1]:
68 return self.cache[fil][0]
TypeError: _isdir() takes exactly 1 argument (0 given)
**********************************************************************
Oops, IPython crashed. We do our best to make it stable, but...
A crash report was automatically generated with the following information:
- A verbatim copy of the crash traceback.
- A copy of your input history during this session.
- Data on your current IPython configuration.
It was left in the file named:
'C:\Documents and Settings\martinhylee\_ipython\IPython_crash_report.txt'
If you can email this file to the developers, the information in it will help
them in understanding and correcting the problem.
You can mail it to: Fernando Perez at fperez.net#gmail.com
with the subject 'IPython Crash Report'.
If you want to do it now, the following command will work (under Unix):
mail -s 'IPython Crash Report' fperez.net#gmail.com < C:\Documents and Settings\martinhylee\_ipython\IPython_crash_report.txt
To ensure accurate tracking of this issue, please file a report about it at:
https://bugs.launchpad.net/ipython/+filebug
Error in sys.excepthook:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\CrashHandler.py", line 157, in __call__
report.write(self.make_report(traceback))
File "C:\Python27\lib\site-packages\IPython\CrashHandler.py", line 215, in make_report
rpt_add('BZR revision : %s \n\n' % Release.revision)
AttributeError: 'module' object has no attribute 'revision'
Original exception was:
Traceback (most recent call last):
File "C:\Python27\scripts\ipython.py", line 28, in <module>
IPython.Shell.start().mainloop()
File "C:\Python27\lib\site-packages\IPython\Shell.py", line 1259, in start
return shell(user_ns = user_ns)
File "C:\Python27\lib\site-packages\IPython\Shell.py", line 73, in __init__
debug=debug,shell_class=shell_class)
File "C:\Python27\lib\site-packages\IPython\ipmaker.py", line 521, in make_IPython
IP.pre_config_initialization()
File "C:\Python27\lib\site-packages\IPython\iplib.py", line 835, in pre_config_initialization
self.db = pickleshare.PickleShareDB(rc.ipythondir + "/db")
File "C:\Python27\lib\site-packages\IPython\Extensions\pickleshare.py", line 53, in __init__
if not self.root.isdir():
TypeError: _isdir() takes exactly 1 argument (0 given)
Try this:
from pylab import *
x = np.linspace(0.4 * np.pi, 100)
plot(x, np.sin(x))
show()