S3ResponseError: 403 Forbidden.An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist - amazon-s3

try:
conn = boto.connect_s3(access_key,secret_access_key)
bucket = conn.get_bucket(bucket_name, validate=False)
k1 = Key(bucket)
k1.key = 'Date_Table.csv'
# k = bucket.get_key('Date_Table.csv')
k1.make_public()
k1.get_contents_to_filename(tar)
except Exception as e:
print(e)
i am getting error
S3ResponseError: 403 Forbidden
AccessDeniedAccess
DeniedD9ED8BFF6D6A993Eaw0KmxskATNBTDUEo3SZdwrNVolAnrt9/pkO/EGlq6X9Gxf36fQiBAWQA7dBSjBNZknMxWDG9GI=
i tried all posibility and still getting same error .. please guide me how to solve this issue.
i tried other way as below and getting error
An error occurred (NoSuchKey) when calling the GetObject operation:
The specified key does not exist.
session = boto3.session.Session(aws_access_key_id=access_key, aws_secret_access_key=secret_access_key,region_name='us-west-2')
print ("session:"+str(session)+"\n")
client = session.client('s3', endpoint_url=s3_url)
print ("client:"+str(client)+"\n")
stuff = client.get_object(Bucket=bucket_name, Key='Date_Table.csv')
print ("stuff:"+str(stuff)+"\n")
stuff.download_file(local_filename)
ge

Always use boto3. boto is deprecated.
As long as you setup AWS CLI credential, you don't need to pass the hard-coded credential. Read boto3 credential setup throughly.
There is no reason to initiate boto3.session unless you are using different region and user profile.
Take your time and study difference between service client(boto3.client) vs service resources(boto3.resources).
Low level boto3.client is easier to use for experiments. Use high level boto3.resource if you need to pass around arbitrary object.
Here is the simple code for boto3.client("s3").download_file.
import boto3
# initiate the proper AWS services client, i.e. S3
s3 = boto3.client("s3")
s3.download_file('your_bucket_name', 'Date_Table.csv', '/your/local/path/and/filename')

Related

Timeout error when writing large amounts of data to big query

I am getting the following error when trying to write large amounts of data to big query using
client.insert_rows_json()
google.api_core.exceptions.RetryError: Deadline of 600.0s exceeded while calling target function, last exception: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
method. I have tried modifying the timeout parameter in the following way:
client.insert_rows_json(*args, timeout=1000000)
but I still get the same timeout error where the deadline is still at 600.0s.
Is there someway to establish the client with:
credentials = service_account.Credentials.from_service_account_info(service_account_json)
client = bigquery.Client(credentials=credentials, project=credentials.project_id)
and specify how long before timeout should occur?
Please try this solution
from google.cloud import bigquery
from google.oauth2 import service_account
# Load the service account credentials
service_account_json = 'path/to/service_account.json'
credentials = service_account.Credentials.from_service_account_file(service_account_json)
# Create the client with the desired timeout value
client = bigquery.Client(
credentials=credentials,
project=credentials.project_id,
default_query_job_config=bigquery.QueryJobConfig(
timeout=1800, # Set the timeout to 1800 seconds (30 minutes)
),
)

Load from GCS to GBQ causes an internal BigQuery error

My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.

Getting error while connecting ADLS to Notebook in AML

I am getting below error while connecting dataset created and registered in AML notebook and which is based on ADLS. When I connect this dataset in designer I am able to visualize the same. Below is the code that I am using. Please let me know the solution if anyone have faced the same error.
Examle 1 Import dataset to notebbok
from azureml.core import Workspace, Dataset
subscription_id = 'abcd'
resource_group = 'RGB'
workspace_name = 'DSG'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='abc')
dataset.to_pandas_dataframe()
Error 1
ExecutionError: Could not execute the specified transform.
(Error in getting metadata for path /local/top.txt.
Operation: GETFILESTATUS failed with Unknown Error: The operation has timed out..
Last encountered exception thrown after 5 tries.
[The operation has timed out.,The operation has timed out.,The operation has timed out.,The operation has timed out.,The operation has timed out.]
[ServerRequestId:])|session_id=2d67
Example 2 Import data from datastore to notebook
from azureml.core import Workspace, Datastore, Dataset
datastore_name = 'abc'
workspace = Workspace.from_config()
datastore = Datastore.get(workspace, datastore_name)
datastore_paths = [(datastore, '/local/top.txt')]
df_ds = Dataset.Tabular.from_delimited_files(
path=datastore_paths, validate=True,
include_path=False, infer_column_types=True,
set_column_types=None, separator='\t',
header=True, partition_format=None
)
df = df_ds.to_pandas_dataframe()
Error 2
Cannot load any data from the specified path. Make sure the path is accessible.
Try removing the initial slash from your path 'local/top.txt'
datastore_paths = [(datastore, 'local/top.txt')]
For your dataset abc, can you visualize/preview the data on ml.azure.com?
Might be due to the fact that your data permission is not set up correctly in ADLS. You need to give permission to the service principal for the file/folder you are access.
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Data Access Setting on a file in ADLS

Error on redshift UNLOAD command

I am trying to UNLOAD a Redshift table to an S3 bucket, but I am getting errors that I can't resolve.
When using 's3://mybucket/' as the destination (which is the documented way to specify the destination), I have an error saying S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint..
After some research I have tried to change the destination to include the full bucket url, without success.
All these destinations:
's3://mybucket.s3.amazonaws.com/',
's3://mybucket.s3.amazonaws.com/myprefix',
's3://mybucket.s3.eu-west-2.amazonaws.com/',
's3://mybucket.s3.eu-west-2.amazonaws.com/myprefix'
return this error S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1', which is also the error returned when I use a bucket name that doesn't exist.
My Redshift cluster and my s3 buckets all exist in the same region, eu-west-2.
What am I doing wrong?
[appendix]
Full command:
UNLOAD ('select * from mytable')
to 's3://mybucket.s3.amazonaws.com/'
iam_role 'arn:aws:iam::0123456789:role/aws-service-
role/redshift.amazonaws.com/AWSServiceRoleForRedshift'
Full errors:
ERROR: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 6ADF2C929FD2BE08,ExtRid vjcTnD02Na/rRtLvWsk5r6p0H0xncMJf6KBK
DETAIL:
-----------------------------------------------
error: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 6ADF2C929FD2BE08,ExtRid vjcTnD02Na/rRtLvWsk5r6p0H0xncMJf6KBK
code: 8001
context: Listing bucket=mybucket prefix=
query: 0
location: s3_unloader.cpp:226
process: padbmaster [pid=30717]
-----------------------------------------------
ERROR: S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1',Status 400,Error AuthorizationHeaderMalformed,Rid 559E4184FA02B03F,ExtRid H9oRcFwzStw43ynA+rinTOmynhWfQJlRz0QIcXcm5K7fOmJSRcOcHuVlUlhGebJK5iH2L
DETAIL:
-----------------------------------------------
error: S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1',Status 400,Error AuthorizationHeaderMalformed,Rid 559E4184FA02B03F,ExtRid H9oRcFwzStw43ynA+rinTOmynhWfQJlRz0QIcXcm5K7fOmJSRcOcHuVlUlhGebJK5iH2L
code: 8001
context: Listing bucket=mybucket.s3.amazonaws.com prefix=
query: 0
location: s3_unloader.cpp:226
process: padbmaster [pid=30717]
-----------------------------------------------
Bucket zone
Cluster zone

HTTPSConnectionPool Max retries exceeded

I've got a django app in production running on nginx/uwsgi. We recently started using SSL for all our connections. Since moving to SSL, I often get the following message:
HTTPSConnectionPool(host='foobar.com', port=443):
Max retries exceeded with url: /foo/bar
Essentially what happens is I've got the browser communicating with django server code, which then uses the requests library to call an api. Its the connection to the api that generates the error. Also, I've moved all our requests into one session (a requests session, that is), but this hasn't helped.
I've bumped up the number of uwsgi listeners since I thought that could be the problem, but our load isn't that high. Also, we never had this problem before SSL. Does anyone have some advice as to how to solve this problem?
Edit
Code snippet of how I call the API. I've posted it (mostly) verbatim. Note its not the code that actually fails, but the requests library that throws an exception when calling self.session.post
def save_answer(self):
logger.info("Saving answer to question")
url = "%s1.0/exam/learneranswer/" % self.api_url
response = {'success': False}
data = {'questionorder': self.request.POST.get('questionorder'),
'paper': self.request.POST.get('paper')}
data['answer'] = ",".join(self.request.POST.getlist('answer'))
r = self.session.post(url, data=simplejson.dumps(data))
if r.status_code == 201:
logger.info("Answer saved successfully")
response['success'] = True
elif r.status_code == 400:
if r.text == "Paper expired":
logger.warning("Timer has expired")
response['message'] = 'Your time has run out'
if r.text == "Question locked":
response['message'] = \
'This question is locked and cannot be answered anymore'
else:
logger.error("Unknown error")
self.log_error(r, "Unknown Error while saving answer")
else:
logger.error("Internal error")
self.log_error(r, "Internal error in api while saving answer")
return simplejson.dumps(response)
I've found that this error happens when some item in one of my views throws an exception. For example, when using the django 'requests' framework to post data to another URL:
r = requests.post(url, data=json.dumps(payload), headers=headers, timeout=5)
The downrange server was having connection issues, which threw an exception and that bubbled up and gave me the error you had above. I replaced with this:
try:
r = requests.post(url, data=json.dumps(payload), headers=headers, timeout=5)
except requests.exceptions.ConnectionError as e:
r = "No response"
And that fixed it (of course, I'd suggest adding in more error handling, but the above is the relevant subset).
You must disable validation like this
requests.get('https://google.com', verify=False)
You should specify your CA.
This Error occurs as a result of python script trying to connect to IBM service even before your wifi or ethernet connection is established. Have a try/catch to rectify or if trying to run as service then run service after network is established.