I have some logs in CloudWatch and everyday, I keep getting new logs. Now, I want to store today's and yesterday's logs in Cloud Watch itself but logs that are 2 days older have to be moved to S3.
I have tried using the below code to export CloudWatch Logs to S3 :
import boto3
import collections
region = 'us-east-1'
def lambda_handler(event, context):
s3 = boto3.client('s3')
response = s3.create_export_task(
taskName='export_task',
logGroupName='/aws/lambda/test2',
logStreamNamePrefix='2016/11/29/',
fromTime=1437584472382,
to=1437584472402,
destination='prudhvi1234',
destinationPrefix='AWS'
)
print response
When I run this, I got the following error :
'S3' object has no attribute 'create_export_task': AttributeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 10, in lambda_handler
response = s3.create_export_task(
AttributeError: 'S3' object has no attribute 'create_export_task'
What might the mistake be?
client = boto3.client('logs')
You are accessing logs from CloudWatch and not S3. Hence the error.
Subsequently
response = client.create_export_task(
taskName='export_task',
logGroupName='/aws/lambda/test2',
logStreamNamePrefix='2016/11/29/',
fromTime=1437584472382,
to=1437584472402,
destination='prudhvi1234',
destinationPrefix='AWS'
)
http://boto3.readthedocs.io/en/latest/reference/services/logs.html#CloudWatchLogs.Client.create_export_task
checkout this link for more information
Related
I have a problem with concurrent creations of Cloudfront invalidations from AWS Lambda for the same object.
I have set up a Lambda handler to be triggered by specific S3 objects creations and removals, in order to perform invalidation of cached versions on my Cloudfront distribution. This is the function code, written using Python. The code does not detect if an invalidation is currently in progress:
from __future__ import print_function
import boto3
import time
import boto3
from botocore.config import Config
config = Config(
retries = {
'max_attempts': 6,
'mode': 'standard'
}
)
cloudfront = boto3.client('cloudfront', config=config)
def lambda_handler(event, context):
for items in event["Records"]:
path = "/" + items["s3"]["object"]["key"]
print(path)
invalidation = cloudfront.create_invalidation(DistributionId='xxxxx',
InvalidationBatch={
'Paths': {
'Quantity': 1,
'Items': [path]
},
'CallerReference': str(time.time())
})
I wonder how I would tell the function to only trigger when there is no invalidation status of InProgress for that same object?
I wonder how I would tell the function to only trigger when there is no invalidation status of InProgress for that same object?
The function will always trigger. There is no way to tell it to not trigger based on something happening in CloudFront.
However, you could add some logic in the function to only send an invalidation request to CloudFront if one isn't already running for that path. To do this you would list the current invalidations, and then get the details of each invalidation to see if it has the same path.
Here is my code that I use to create a s3 client and generate a presigned url, which are some quite standard codes. They have been up running in the server for quite a while. I pulled the code out and ran it locally in a jupyter notebook
def get_s3_client():
return get_s3(create_session=False)
def get_s3(create_session=False):
session = boto3.session.Session() if create_session else boto3
S3_ENDPOINT = os.environ.get('AWS_S3_ENDPOINT')
if S3_ENDPOINT:
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_DEFAULT_REGION = os.environ["AWS_DEFAULT_REGION"]
s3 = session.client('s3',
endpoint_url=S3_ENDPOINT,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_DEFAULT_REGION)
else:
s3 = session.client('s3', region_name='us-east-2')
return s3
s3 = get_s3_client()
BUCKET=[my-bucket-name]
OBJECT_KEY=[my-object-name]
signed_url = s3.generate_presigned_url(
'get_object',
ExpiresIn=3600,
Params={
"Bucket": BUCKET,
"Key": OBJECT_KEY,
}
)
print(signed_url)
When I tried to download the file using the url in the browser, I got an error message and it says "The specified key does not exist." I noticed in the error message that my object key becomes "[my-bucket-name]/[my-object-name]" rather than just "[my-object-name]".
Then I used the same bucket/key combination to generate a presigned url using aws cli, which is working as expected. I found out that somehow the s3 client method (boto3) inserted [my-object-name] in front of [my-object-name] compared to the aws cli method. Here are the results
From s3.generate_presigned_url()
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-bucket-name]/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAV17K253JHUDLKKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T175014Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=5cdcc38e5933e92b5xed07b58e421e5418c16942cb9ac6ac6429ac65c9f87d64
From aws cli s3 presign
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAYA7K15LJHUDAVKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T155926Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=58208f91985bf3ce72ccf884ba804af30151d158d6ba410dd8fe9d2457369894
I've been working on this and searching for solutions for day and half and I couldn't find out what was wrong with my implementation. I guess it might be that I ignored some basic but important settings to create a s3 client using boto3 or something else. Thanks for the help!
Ok, myth is solved, I shouldn't provide the endpoint_url=S3_ENDPOINT param when I create the s3 client, boto3 will figure it out. After i removed it, everything works as expected.
I am trying to get S3 bucket tags using "get_bucket_tagging".
Code:
response = client.get_bucket_tagging(Bucket='bucket_name')
print(response['TagSet'])
I am getting output till there are any tags present. But getting following error when there are 0 tags.
An error occurred (NoSuchTagSet) when calling the GetBucketTagging
operation: The TagSet does not exist
Is there any other method to check that?
From this document:
NoSuchTagSetError - There is no tag set associated with the bucket.
So when there is no tag set associated with the bucket, error/exception is expected. You need to handle this exception.
import boto3
client = boto3.client('s3')
try:
response = client.get_bucket_tagging(Bucket='bucket_name')
print(response['TagSet'])
except Exception, e:
# Handle exception
# Do something
print e
I can fetch data from native BigQuery tables using a service account.
However, I encounter an error when attempting to select from a Google Sheets-based table in BigQuery using the same service account.
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(
json_credentials_path='creds.json',
project='xxx',
)
# this works fine
print('test basic query: select 1')
job = client.run_sync_query('select 1')
job.run()
print('results:', list(job.fetch_data()))
print('-'*50)
# this breaks
print('attempting to fetch from sheets-based BQ table')
job2 = client.run_sync_query('select * from testing.asdf')
job2.run()
The output:
⚡ ~/Desktop ⚡ python3 bq_test.py
test basic query: select 1
results: [(1,)]
--------------------------------------------------
attempting to fetch from sheets-based BQ table
Traceback (most recent call last):
File "bq_test.py", line 16, in <module>
job2.run()
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/query.py", line 381, in run
method='POST', path=path, data=self._build_resource())
File "/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.cloud.exceptions.Forbidden: 403 POST https://www.googleapis.com/bigquery/v2/projects/warby-parker-1348/queries: Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
I've attempted to use oauth2client.service_account.ServiceAccountCredentials for explicitly defining scopes, including a scope for drive, but I get the following error when attempting to do so:
ValueError: This library only supports credentials from google-auth-library-python. See https://google-cloud-python.readthedocs.io/en/latest/core/auth.html for help on authentication with this library.
My understanding is that auth is handled via IAM now, but I don't see any roles to apply to this service account that have anything to do with drive.
How can I select from a sheets-backed table using the BigQuery python client?
I've ran into the same issue and figured out how to solve it.
When exploring google.cloud.bigquery.Client class, there is a global variable tuple SCOPE that is not being updated by any arguments nor by any Credentials object, persisting its default value to the classes that follows its use.
To solve this, you can simply add a new scope URL to the google.cloud.bigquery.Client.SCOPE tuple.
In the following code I add the Google Drive scope to it:
from google.cloud import bigquery
#Add any scopes needed onto this scopes tuple.
scopes = (
'https://www.googleapis.com/auth/drive'
)
bigquery.Client.SCOPE+=scopes
client = bigquery.Client.from_service_account_json(
json_credentials_path='/path/to/your/credentials.json',
project='your_project_name',
)
With the code above you'll be able to query data from Sheets-based tables in BigQuery.
Hope it helps!
I think you're right that you need to pass the scope for gdrive when authenticating. The scopes are passed here https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/core/google/cloud/client.py#L126 and it seems like the BigQuery client lacks these scopes https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/client.py#L117 . I suggest asking on github and also as a workaround you can try to override client credentials including gdrive scope, but you'll need to use google.auth.credentials from GoogleCloudPlatform/google-auth-library-python instead of oauth2client, as error message suggests.
I have been struggling for the last couple of hours but seem to be blind here. I am trying to establish a link between scrapy and Amazon S3 but keep getting the error that the bucket does not exist (it does, checked a dozen times).
The error message:
2016-11-01 22:58:08 [scrapy] ERROR: Error storing csv feed (30 items) in: s3://onvista.s3-website.eu-central-1.amazonaws.com/feeds/vista/2016-11-01T21-57-21.csv
in combination with
botocore.exceptions.ClientError: An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
My settings.py:
ITEM_PIPELINES = {
'onvista.pipelines.OnvistaPipeline': 300,
#'scrapy.pipelines.files.S3FilesStore': 600
}
AWS_ACCESS_KEY_ID = 'key'
AWS_SECRET_ACCESS_KEY = 'secret'
FEED_URI = 's3://onvista.s3-website.eu-central-1.amazonaws.com/feeds/%(name)s/%(time)s.csv'
FEED_FORMAT = 'csv'
Has anyone a working setting for me to have a glimpse?
Instead of referring to an Amazon S3 bucket via its Hosed Website URL, refer to it by name.
The scrapy Feed Exports documentation gives an example of:
s3://mybucket/scraping/feeds/%(name)s/%(time)s.json
In your case, that would make it:
s3://onvista/feeds/%(name)s/%(time)s.json