OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.: b'/content/test.hdf' - google-colaboratory

I downloaded MODIS data from its http server and was trying to load it in xarray on google colab. I added the netrc file and the file is not corrupted since gdalinfo on that gave no error.
URL = 'https://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.061/2019.02.24/MOD09GA.A2019055.h09v07.061.2020288120208.hdf'
result = requests.get(URL)
filename = 'test.hdf'
with open(filename, 'wb') as f:
f.write(result.content)
when I run
xr.open_dataset('test.hdf',engine = 'netcdf4' )
this is the error
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/content/test1.hdf',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
203 kwargs = kwargs.copy()
204 kwargs["mode"] = self._mode
--> 205 file = self._opener(*self._args, **kwargs)
206 if self._mode == "w":
207 # ensure file doesn't get overriden when opened again
src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()
src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.: b'/content/test.hdf'
What is the error about and how to resolve it?
netcdf4 version 1.5.8
xarray version 0.18.1

Related

Cannot download datasets files from Kaggle to Google Colab

def load_kaggle_csv(dataset_name, file_name):
dataset_url = f"https://www.kaggle.com/{dataset_name}/download"
!kaggle datasets download -d $dataset_url -f $file_name
return pd.read_csv(file_name)
dataset_name = "datasets/diegosilvadefrana/2023-data-scientists-jobs-descriptions"
file_name = "Jobs.csv"
Error:
Invalid dataset specification https://www.kaggle.com/datasets/diegosilvadefrana/2023-data-scientists-jobs-descriptions/download
FileNotFoundError Traceback (most recent call last)
in
9 file_name = "Jobs.csv"
10
---> 11 df = load_kaggle_csv(dataset_name, file_name)
8 frames
/usr/local/lib/python3.8/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
700 if ioargs.encoding and "b" not in ioargs.mode:
701 # Encoding
--> 702 handle = open(
703 handle,
704 ioargs.mode,
FileNotFoundError: [Errno 2] No such file or directory: 'Jobs.csv'
Here's a workaround:
! kaggle datasets download diegosilvadefrana/2023-data-scientists-jobs-descriptions

error in building model "ner_ontonotes_bert_mult" in DeepPavlov

I'm trying to build model "ner_ontonotes_bert_mult" via GoogleColab using example in documentation:
from deeppavlov import build_model, configs
ner_model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True)
but get error:
TypeError: init() got an unexpected keyword argument 'num_tags'
p.s. if I try to load another model (e.g. "ner_rus_bert"), the error does not appear
Full error (* maybe the error is related to directoty /packages/deeppavlov/models/torch_bert/crf.py* ):
2022-12-17 13:08:23.235 INFO in 'deeppavlov.download'['download'] at line 138: Skipped http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_mult_torch_crf.tar.gz download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_mult_torch_crf.tar.gz download because of matching hashes
Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2022-12-17 13:08:30.1 ERROR in 'deeppavlov.core.common.params'['params'] at line 108: Exception in <class 'deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger'>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
TypeError: __init__() got an unexpected keyword argument 'num_tags'
ERROR:deeppavlov.core.common.params:Exception in <class 'deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger'>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
TypeError: __init__() got an unexpected keyword argument 'num_tags'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-75-5e156706e7e4> in <module>
1 # from deeppavlov import configs, build_model
2
----> 3 ner_model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True)
5 frames
/usr/local/lib/python3.8/dist-packages/deeppavlov/core/commands/infer.py in build_model(config, mode, load_trained, install, download)
53 .format(component_config.get('class_name', component_config.get('ref', 'UNKNOWN'))))
54
---> 55 component = from_params(component_config, mode=mode)
56
57 if 'id' in component_config:
/usr/local/lib/python3.8/dist-packages/deeppavlov/core/common/params.py in from_params(params, mode, **kwargs)
100 kwargs['mode'] = mode
101
--> 102 component = obj(**dict(config_params, **kwargs))
103 try:
104 _refs[config_params['id']] = component
/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py in __init__(self, n_tags, pretrained_bert, bert_config_file, attention_probs_keep_prob, hidden_keep_prob, optimizer, optimizer_parameters, learning_rate_drop_patience, learning_rate_drop_div, load_before_drop, clip_norm, min_learning_rate, use_crf, **kwargs)
180 self.use_crf = use_crf
181
--> 182 super().__init__(optimizer=optimizer,
183 optimizer_parameters=optimizer_parameters,
184 learning_rate_drop_patience=learning_rate_drop_patience,
/usr/local/lib/python3.8/dist-packages/deeppavlov/core/models/torch_model.py in __init__(self, device, optimizer, optimizer_parameters, lr_scheduler, lr_scheduler_parameters, learning_rate_drop_patience, learning_rate_drop_div, load_before_drop, min_learning_rate, *args, **kwargs)
96 self.opt = deepcopy(kwargs)
97
---> 98 self.load()
99 # we need to switch to eval mode here because by default it's in `train` mode.
100 # But in case of `interact/build_model` usage, we need to have model in eval mode.
/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py in load(self, fname)
293 self.model.to(self.device)
294 if self.use_crf:
--> 295 self.crf = CRF(self.n_classes).to(self.device)
296
297 self.optimizer = getattr(torch.optim, self.optimizer_name)(
/usr/local/lib/python3.8/dist-packages/deeppavlov/models/torch_bert/crf.py in __init__(self, num_tags, batch_first)
11
12 def __init__(self, num_tags: int, batch_first: bool = False) -> None:
---> 13 super().__init__(num_tags=num_tags, batch_first=batch_first)
14 nn.init.zeros_(self.transitions)
15 nn.init.zeros_(self.start_transitions)
TypeError: __init__() got an unexpected keyword argument 'num_tags'
Make sure that you are using the latest version of DeepPavlov by running:
!pip install deeppavlov
Then import all the required packages:
from deeppavlov import configs, build_model
Install the model's requirements and download the pretrained model:
ner_model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True, install=True)
You can get more information about this model and many others in our recent Medium article.

s3fs timeout issue on an AWS Lambda function within a VPN

s3fs seems to fail from time to time when reading from an S3 bucket using an AWS Lambda function within a VPN. I am using s3fs==0.4.0 and pandas==1.0.1.
import s3fs
import pandas as pd
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
s3_file = event['Records'][0]['s3']['object']['key']
s3fs.S3FileSystem.connect_timeout = 1800
s3fs.S3FileSystem.read_timeout = 1800
with s3fs.S3FileSystem(anon=False).open(f"s3://{bucket}/{s3_file}", 'rb') as f:
self.data = pd.read_json(f, **kwargs)
The stacktrace is the following:
Traceback (most recent call last):
File "/var/task/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/var/task/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/var/task/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/var/task/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/var/task/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/urllib3/packages/six.py", line 735, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/var/task/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/var/task/urllib3/connectionpool.py", line 994, in _validate_conn
conn.connect()
File "/var/task/urllib3/connection.py", line 300, in connect
conn = self._new_conn()
File "/var/task/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f4d578e3ed0>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/var/task/botocore/endpoint.py", line 244, in _send
return self.http_session.send(request)
File "/var/task/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://my_bucket.s3.eu-west-1.amazonaws.com/?list-type=2&prefix=my_folder%2Fsomething%2F&delimiter=%2F&encoding-type=url"
Has someone faced this same issue? Why would it fail only sometimes? Is there a s3fs configuration that could help for this specific issue?
Actually there was no problem at all with s3fs. Seems like we were using a Lambda function with two Subnets within the VPC and one was working normally but the other one wasn't allowed to access S3 resources, therefore when a Lambda was spawned using the second network it wouldn't be able to connect at all.
Fixing this issue was as easy as removing the second subnet.
You could also use boto3 which is supported by AWS, in order to get json from S3.
import json
import boto3
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.resource('s3')
file_object = s3_resource.Object(bucket, key)
json_content = json.loads(file_object.get()['Body'].read())

Restoring official Tensorflow Resnet-50 Checkpoint gives 'ExperimentalFunctionBufferingResource' Error

When trying to reload the official tensorflow models for ResNet-50 checkpoint here:
http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp32_20181001.tar.gz
...using this code:
import os
import tensorflow as tf
print(tf.__version__)
saver = tf.train.import_meta_graph(os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207.meta'))
I get this error:
1.13.1
Traceback (most recent call last):
File "chehckpoint_to_savedmodel.py", line 11, in <module>
'model.ckpt-225207.meta'))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
**kwargs))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 399, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 159, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'ExperimentalFunctionBufferingResource'
Funny that googling "KeyError: 'ExperimentalFunctionBufferingResource'" returns zero hits. That's a first.
Ideas?
Not sure how else to reload this model. I also tried this:
path = os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207')
checkpoint = tf.train.Checkpoint()
status = checkpoint.restore(path)
print(status)
status.assert_consumed()
But it fails the assertion with no other information.
Thanks in advance.
P
This seems to be a issue with TF >= 1.13 versions. Try downgrading to 1.12 and give it a try. It should work.
Issues to track would be these : #29751

Error downloading PDF files

I have the following (simplified) code:
import os
import scrapy
class TestSpider(scrapy.Spider):
name = 'test_spider'
start_urls = ['http://www.pdf995.com/samples/pdf.pdf', ]
def parse(self, response):
save_path = 'test'
file_name = 'test.pdf'
self.save_page(response, save_path, file_name)
def save_page(self, response, save_dir, file_name):
os.makedirs(save_dir, exist_ok=True)
with open(os.path.join(save_dir, file_name), 'wb') as afile:
afile.write(response.body)
When i run it, I get this error:
[scrapy.core.scraper] ERROR: Error downloading <GET http://www.pdf995.com/samples/pdf.pdf>
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1301, in _inlineCallbacks
result = g.send(result)
File "C:\Python36\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1278, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <200 http://www.pdf995.com/samples/pdf.pdf>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1301, in _inlineCallbacks
result = g.send(result)
File "C:\Python36\lib\site-packages\scrapy\core\downloader\middleware.py", line 53, in process_response
spider=spider)
File "C:\Python36\lib\site-packages\scrapy_beautifulsoup\middleware.py", line 16, in process_response
return response.replace(body=str(BeautifulSoup(response.body, self.parser)))
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 79, in replace
return cls(*args, **kwargs)
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 20, in __init__
self._set_body(body)
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 55, in _set_body
"Response body must be bytes. "
TypeError: Response body must be bytes. If you want to pass unicode body use TextResponse or HtmlResponse.
Do I need to introduce a middleware or something to handle this? This looks like it should be valid, at least by other examples.
Note: at the moment I'm not using a pipeline because there in my real spider I have a lot of checks on whether the related item has been scraped, validating if this pdf belongs to the item, and checking a custom name of a pdf to see if it was downloaded. And as mentioned, many samples did what I'm doing here so I thought it would be easier and work.
The issue because of your own scrapy_beautifulsoup\middleware.py which is trying to replace the return response.replace(body=str(BeautifulSoup(response.body, self.parser))).
You need to correct that and that should fix the issue