"KeyError: in converted code" when packing data with numeric features in Tensorflow Tutorial - tensorflow

I'm using TF 2.0 on Google Colab. I copied from of the most code the Tensorflow "Load CSV Data" tutorial and changed up some config variables for my training and eval / test csv files. When I ran it, I got this error (only last frame is shown, full output is here:
In
NUMERIC_FEATURES = ['microtime', 'dist']
packed_train_data = raw_train_data.map(
PackNumericFeatures(NUMERIC_FEATURES))
packed_test_data = raw_test_data.map(
PackNumericFeatures(NUMERIC_FEATURES))
Out
/tensorflow-2.0.0/python3.6/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
235 except Exception as e: # pylint:disable=broad-except
236 if hasattr(e, 'ag_error_metadata'):
--> 237 raise e.ag_error_metadata.to_exception(e)
238 else:
239 raise
KeyError: in converted code:
<ipython-input-19-85ea56f80c91>:6 __call__ *
numeric_features = [features.pop(name) for name in self.names]
/tensorflow-2.0.0/python3.6/tensorflow_core/python/autograph/impl/api.py:396 converted_call
return py_builtins.overload_of(f)(*args)
KeyError: 'dist'

The "in converted code" is used when autograph wraps errors that likely occur in user code. In this case, the following like is relevant:
<ipython-input-19-85ea56f80c91>:6 __call__ *
numeric_features = [features.pop(name) for name in self.names]
The error message is missing some critical information and we should fix it, but it suggests a call to features.pop(name) raises KeyError, so likely that key is missing from features.

Related

When trying to create html report the program throws error in

When executing the below
profile = ProfileReport(df,title="Data Profile Report")
profile.to_file("data_profile_report.html")
Here is the exception thrown
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
c:\Projections 2022-08-16\Projections.py in <cell line: 4>()
102 # %%
103 #Creating EDA of data
104 profile = ProfileReport(df_cdap,title="CDAP Data Profile Report")
----> 105 profile.to_file("cdap_data_profile_report.html")
File c:\Users\fengq\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_profiling\profile_report.py:257, in ProfileReport.to_file(self, output_file, silent)
254 self.config.html.assets_prefix = str(output_file.stem) + "_assets"
255 create_html_assets(self.config, output_file)
--> 257 data = self.to_html()
259 if output_file.suffix != ".html":
260 suffix = output_file.suffix
File c:\Users\fengq\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_profiling\profile_report.py:368, in ProfileReport.to_html(self)
360 def to_html(self) -> str:
361 """Generate and return complete template as lengthy string
362 for using with frameworks.
363
(...)
366
367 """
--> 368 return self.html
...
--> 810 fig = manager.canvas.figure
811 if fig_label:
812 fig.set_label(fig_label)
AttributeError: 'NoneType' object has no attribute 'canvas'
I've tried to re-install python and reinstalling the dependencies for pandas-profiling but nothing seems to work so far. I've also tried downgrading python to python 3.9 and the matplotlib to an older version as well. It has not changed this error.
I notice that the error seems to be attributed to "manager.canvas.figure" but I'm not sure how to resolve it from that point onwards. Any help is greatly appreciated!
The problem resolved as I set the matplotlib to inline as per some comments that I was able to find on another forum. I'm still really interested to learn what causes this! Please feel free to answer and suggest other solutions! I would love to try them!

Amazon Sagemaker ModelError when serving model

I have uploaded a transformer roberta model in S3 bucket. Am now trying to run inference against the model using Pytorch with SageMaker Python SDK. I specified the model directory s3://snet101/sent.tar.gz which is a compressed file of the model (pytorch_model.bin) and all its dependencies. Here is the code
model = PyTorchModel(model_data=model_artifact,
name=name_from_base('roberta-model'),
role=role,
entry_point='torchserve-predictor2.py',
source_dir='source_dir',
framework_version='1.4.0',
py_version = 'py3',
predictor_cls=SentimentAnalysis)
predictor = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')
test_data = {"text": "How many cows are in the farm ?"}
prediction = predictor.predict(test_data)
I get the following error on the predict method of the predictor object:
ModelError Traceback (most recent call last)
<ipython-input-6-bc621eb2e056> in <module>
----> 1 prediction = predictor.predict(test_data)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
123
124 request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 125 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
126 return self._handle_response(response)
127
~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
674 error_code = parsed_response.get("Error", {}).get("Code")
675 error_class = self.exceptions.from_code(error_code)
--> 676 raise error_class(parsed_response, operation_name)
677 else:
678 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/roberta-model-2020-12-16-09-42-37-479 in account 165258297056 for more information.
I checked the server log error
java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Can't load config for '/.sagemaker/mms/models/model'. Make sure that:
'/.sagemaker/mms/models/model' is a correct model identifier listed on 'https://huggingface.co/models'
or '/.sagemaker/mms/models/model' is the correct path to a directory containing a config.json file
How can I fix this?
I have the same problem, it seems like the endpoint is trying to load the pretrained model using the path '/.sagemaker/mms/models/model' and fails.
Maybe this path is not correct or perhaps has no access to the S3 bucket so it cannot store the model into the given path.

What is OSError: [Errno 95] Operation not supported for pandas to_csv on colab?

My input is:
test=pd.read_csv("/gdrive/My Drive/data-kaggle/sample_submission.csv")
test.head()
It ran as expected.
But, for
test.to_csv('submitV1.csv', header=False)
The full error message that I got was:
OSError Traceback (most recent call last)
<ipython-input-5-fde243a009c0> in <module>()
9 from google.colab import files
10 print(test)'''
---> 11 test.to_csv('submitV1.csv', header=False)
12 files.download('/gdrive/My Drive/data-
kaggle/submission/submitV1.csv')
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in
to_csv(self, path_or_buf, sep, na_rep, float_format, columns,
header, index, index_label, mode, encoding, compression, quoting,
quotechar, line_terminator, chunksize, tupleize_cols, date_format,
doublequote, escapechar, decimal)
3018 doublequote=doublequote,
3019 escapechar=escapechar,
decimal=decimal)
-> 3020 formatter.save()
3021
3022 if path_or_buf is None:
/usr/local/lib/python3.6/dist-packages/pandas/io/formats/csvs.pyi
in save(self)
155 f, handles = _get_handle(self.path_or_buf,
self.mode,
156 encoding=self.encoding,
--> 157
compression=self.compression)
158 close = True
159
/usr/local/lib/python3.6/dist-packages/pandas/io/common.py in
_get_handle(path_or_buf, mode, encoding, compression, memory_map,
is_text)
422 elif encoding:
423 # Python 3 and encoding
--> 424 f = open(path_or_buf, mode,encoding=encoding,
newline="")
425 elif is_text:
426 # Python 3 and no explicit encoding
OSError: [Errno 95] Operation not supported: 'submitV1.csv'
Additional Information about the error:
Before running this command, if I run
df=pd.DataFrame()
df.to_csv("file.csv")
files.download("file.csv")
It is running properly, but the same code is producing the operation not supported error if I try to run it after trying to convert test data frame to a csv file.
I am also getting a message A Google Drive timeout has occurred (most recently at 13:02:43). More info. just before running the command.
You are currently in the directory in which you don't have write permissions.
Check your current directory with pwd. It might be gdrive of some directory inside it, that's why you are unable to save there.
Now change the current working directory to some other directory where you have permissions to write. cd ~ will work fine. It wil chage the directoy to /root.
Now you can use:
test.to_csv('submitV1.csv', header=False)
It will save 'submitV1.csv' to /root

pandas read_html error: unexpected keyword argument 'max_rows'

I'm trying to write a scraper to scrape Option prices from Yahoo Finance. The code below is working and even gives the correct output answer. The problem is that right before the answer, I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _repr_html_(self)
694 See Also
695 --------
--> 696 to_html : Convert DataFrame to HTML.
697
698 Examples
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, bold_rows, classes, escape, max_rows, max_cols, show_dimensions, notebook, decimal, border, table_id)
2035 Dictionary mapping columns containing datetime types to stata
2036 internal format to use when writing the dates. Options are 'tc',
-> 2037 'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
2038 or a name. Datetime columns that do not have a conversion type
2039 specified will be converted to 'tc'. Raises NotImplementedError if
~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/format.py in to_html(self, classes, notebook, border)
751 need_leadsp = dict(zip(fmt_columns, map(is_numeric_dtype, dtypes)))
752
--> 753 def space_format(x, y):
754 if (y not in self.formatters and
755 need_leadsp[x] and not restrict_formatting):
TypeError: __init__() got an unexpected keyword argument 'max_rows'
I tried researching the cause of the error in different stackoverflow questions, as well as the github repo of the pandas library. The closest thing I found was in the what's new section of the pandas 0.24.0 "max_rows and max_cols parameters removed from HTMLFormatter since truncation is handled by DataFrameFormatter GH23818"
My code is as follows:
import lxml
import requests
from time import sleep
ticker = 'AAPL'
url = "http://finance.yahoo.com/quote/%s/options?p=%s"%(ticker,ticker)
response = requests.get(url, verify=False)
print ("Parsing %s"%(url))
sleep(15)
parser = lxml.html.fromstring(response.text)
tables = parser.xpath('//table')
print(len(tables))
puts = lxml.etree.tostring(tables[1], method='html')
df = pd.read_html(puts, flavor='bs4')[0]
df.tail()
The df.tail() shows correctly the last 5 rows of the table, but I can't seem to remove the error. Also every time I use the dataframe, I get a correct result, but the error is shown again.
Thanks in advance in helping with my error.
For future reference:
The error was driven by the anaconda install of the packages.
By pip installing the packages, the error goes away.
BR

Accessing S3 from Dask.bag

As the title suggests, I'm trying to use a dask.bag to read a single file from S3 on an EC2 instance:
from distributed import Executor, progress
from dask import delayed
import dask
import dask.bag as db
data = db.read_text('s3://pycuda-euler-data/Ba10k.sim1.fq')
I get a very long error:
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
/home/ubuntu/anaconda3/lib/python3.5/site-packages/s3fs/core.py in info(self, path, refresh)
322 bucket, key = split_path(path)
--> 323 out = self.s3.head_object(Bucket=bucket, Key=key)
324 out = {'ETag': out['ETag'], 'Key': '/'.join([bucket, key]),
/home/ubuntu/anaconda3/lib/python3.5/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
277 # The "self" in this scope is referring to the BaseClient.
--> 278 return self._make_api_call(operation_name, kwargs)
279
/home/ubuntu/anaconda3/lib/python3.5/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
571 if http.status_code >= 300:
--> 572 raise ClientError(parsed_response, operation_name)
573 else:
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
<ipython-input-43-0ad435c69ecc> in <module>()
4 #data = db.read_text('/Users/zen/Code/git/sra_data.fastq')
5 #data = db.read_text('/Users/zen/Code/git/pycuda-euler/data/Ba10k.sim1.fq')
----> 6 data = db.read_text('s3://pycuda-euler-data/Ba10k.sim1.fq', blocksize=900000)
/home/ubuntu/anaconda3/lib/python3.5/site-packages/dask/bag/text.py in read_text(urlpath, blocksize, compression, encoding, errors, linedelimiter, collection, storage_options)
89 _, blocks = read_bytes(urlpath, delimiter=linedelimiter.encode(),
90 blocksize=blocksize, sample=False, compression=compression,
---> 91 **(storage_options or {}))
92 if isinstance(blocks[0], (tuple, list)):
93 blocks = list(concat(blocks))
/home/ubuntu/anaconda3/lib/python3.5/site-packages/dask/bytes/core.py in read_bytes(urlpath, delimiter, not_zero, blocksize, sample, compression, **kwargs)
210 return read_bytes(storage_options.pop('path'), delimiter=delimiter,
211 not_zero=not_zero, blocksize=blocksize, sample=sample,
--> 212 compression=compression, **storage_options)
213
214
/home/ubuntu/anaconda3/lib/python3.5/site-packages/dask/bytes/s3.py in read_bytes(path, s3, delimiter, not_zero, blocksize, sample, compression, **kwargs)
91 offsets = [0]
92 else:
---> 93 size = getsize(s3_path, compression, s3)
94 offsets = list(range(0, size, blocksize))
95 if not_zero:
/home/ubuntu/anaconda3/lib/python3.5/site-packages/dask/bytes/s3.py in getsize(path, compression, s3)
185 def getsize(path, compression, s3):
186 if compression is None:
--> 187 return s3.info(path)['Size']
188 else:
189 with s3.open(path, 'rb') as f:
/home/ubuntu/anaconda3/lib/python3.5/site-packages/s3fs/core.py in info(self, path, refresh)
327 return out
328 except (ClientError, ParamValidationError):
--> 329 raise FileNotFoundError(path)
330
331 def _walk(self, path, refresh=False):
FileNotFoundError: pycuda-euler-data/Ba10k.sim1.fq
As far as I can tell, this is exactly what the docs say to do and unfortunately many examples I see online use the older from_s3() method that no longer exists.
However I am able to access the file using s3fs:
sample, partitions = s3.read_bytes('pycuda-euler-data/Ba10k.sim1.fq', s3=s3files, delimiter=b'\n')
sample
b'#gi|30260195|ref|NC_003997.3|_5093_5330_1:0:0_1:0:0_0/1\nGATAACTCGATTTAAACCAGATCCAGAAAATTTTCA\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_7142_7326_1:1:0_0:0:0_1/1\nCTATTCCGCCGCATCAACTTGGTGAAGTAATGGATG\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_5524_5757_3:0:0_2:0:0_2/1\nGTAATTTAACTGGTGAGGACGTGCGTGATGGTTTAT\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_2706_2926_1:0:0_3:0:0_3/1\nAGTAAAACAGATATTTTTGTAAATAGAAAAGAATTT\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_500_735_3:1:0_0:0:0_4/1\nATACTCTGTGGTAAATGATTAGAATCATCTTGTGCT\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_2449_2653_3:0:0_1:0:0_5/1\nCTTGAATTGCTACAGATAGTCATAGGTTAGCCCTTC\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_3252_3460_0:0:0_0:0:0_6/1\nCGATGTAATTGATACAGGTGGCGCTGTAAAATGGTT\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_1860_2095_0:0:0_1:0:0_7/1\nATAAAAGATTCAATCGAAATATCAGCATCGTTTCCT\n+\n222222222222222222222222222222222222\n#gi|30260195|ref|NC_003997.3|_870_1092_1:0:0_0:0:0_8/1\nTTGGAAAAACCCATTTAATGCATGCAATTGGCCTTT\n ... etc.
What am I doing wrong?
EDIT:
Upon suggestion, I went back and checked permissions. On the bucket I added a Grantee Everyone List, and on the file a Grantee Everyone Open/Download. I still get the same error.
Dask uses the library s3fs to manage data on S3. The s3fs project uses Amazon's boto3. You can provide credentials in two ways:
Use a .boto file
You can put a .boto file in your home directory
Use the storage_options= keyword
You can add a storage_option= keyword to your db.read_text call to include credential information by hand. This option is a dictionary whose values will be added to the s3fs.S3FileSystem constructor.