Quicksight Dataset suddenly cant refresh from S3 anymore - amazon-s3

I have a Quicksight dataset that’s been working fine for months pulling data from S3 via a manifest file but all of a sudden all the refreshes started failing with the errors below since yesterday: FAILURE_TO_PROCESS_JSON_FILE= Error details: S3 Iterator: S3 problem reading data
I’ve double checked the manifest file format and the S3 bucket permissions for Quicksight and everything seems fine and nothing has changed on our end for this to suddenly stop working out of the blue…
Manifest file:
{
"fileLocations": [
{
"URIPrefixes": [
"https://s3.amazonaws.com/solar-dash-live/"
]
}
],
"globalUploadSettings": {
"format": "JSON"
}
}
The error I get in the email alert is different and says "Amazon QuickSight couldn't parse a manifest file as valid JSON." However I verified that the above JSON is formatted correctly.
Also, if I create a new dataset with the same manifest file, it will show the data in the preview tool but its just the Refresh that fails so clearly the manifest file is formatted correctly if Quicksight is able to initially pull data from S3 but only fails later.

Related

aws opensearch restore manual snapshot to different cluster

Is there a way to copy an aws opensearch manual snapshot to a different s3 bucket and restore it to a different cluster?
I have tf deployment that installs aws opensearch cluster with manual snapshots pointing to s3 configured. These work and i can restore to the same cluster that the snapshot was taken. However I cant find a solution to restore to a different cluster. Do you have to share the original s3 bucket or is it possible to copy the files to a new s3 (which would be ideal!)
When i do a copy of the snapshots from source to destination and then run a restore i get an error:
{
"error" : {
"root_cause" : [
{
"type" : "security_exception",
"reason" : "no permissions for [] and User [name=admin, backend_roles=[], requestedTenant=]"
}
],
"type" : "security_exception",
"reason" : "no permissions for [] and User [name=admin, backend_roles=[], requestedTenant=]"
},
"status" : 403
}
This is based on:
1: run the below on the source cluster and find the last snapshot
GET /_snapshot/snapshot_repo/_all
2: copy the snapshot files from source cluster s3 to destination cluster s3
3: run a restore of the snapshot in the destination cluster
POST _snapshot/snapshot_repo/snapshot-2022-11-15-11-01-59/_restore
note: when i run a get snapshots i do not get the list of copied snapshots (which i was hoping i would!).
Any ideas of solution?

Multiple GTM containers in vue THROWS Error in production

I am able to add multiple ids in the id array and it loads well and properly adds the script on local
However, when the code makes it to production, it throws invalid errors because the IDs are getting concatenated(probably due to minified files, not sure). How can I fix this?
Pictures of local and deployed version are attached
Google tag manager on prodution
Google tag manager on my local machine
Here's is my config code using vue-gtm
Vue.use(VueGtm, {
id: ['GTM-M4X2575', 'GTM-T98TL4V'],
enabled: true,
debug: true
});
Am I missing anything?

Python Boto3 Lambda Upload Temp File

I'm working on a Lambda that compresses image files in an S3 bucket. I'm able to download the image in the Lambda, compress it as a new file. I'm trying to upload the new file to the same S3 bucket and I keep on getting hit with the following error:
module initialization error: expected string or bytes-like object
Here's the code to upload:
s3 = boto3.client('s3')
s3.upload_file(filename,my_bucket,basename)
Here are the logs from one of the test uploads:
Getting ready to download Giggidy.png
This is what we're calling our temp file: /tmp/tmp6i7fvb6z.png
Let's compress /tmp/tmp6i7fvb6z.png
Compressed /tmp/tmp6i7fvb6z.png to /tmp/tmpmq23jj5c.png
Getting ready to upload /tmp/tmpmq23jj5c.png
File to Upload, filename: /tmp/tmpmq23jj5c.png
Mime Type: image/png
Name in Bucket, basename: tmpmq23jj5c.png
START RequestId: e9062ca9-ed2c-11e9-99ee-e3a40680ga9d Version: $LATEST
module initialization error: expected string or bytes-like object
END RequestId: e9062ca9-ed2c-11e9-99ee-e3a40680ga9d
How can I upload a file within the context of a Lambda?
UPDATE: I've uploaded my code to a gist for review: https://gist.github.com/kjenney/068531ffe01e14bb7a2351dc55592551
I also moved the boto3 client connection up in my script thinking that might be hosing the upload but I still get the same error in the same order. 'process' is my handler function.
Your problem is this line:
client.upload_file(filename,my_bucket,basename)
From the documentation, the format is:
client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
Note that the bucket name is a string. That's why the error says expected string.
However, your code sets my_bucket as:
my_bucket = s3.Bucket(bucket)
You should use the name of the bucket rather than the bucket object.

Configuring Google cloud bucket as Airflow Log folder

We just started using Apache airflow in our project for our data pipelines .While exploring the features came to know about configuring remote folder as log destination in airflow .For that we
Created a google cloud bucket.
From Airflow UI created a new GS connection
I am not able to understand all the fields .I just created a sample GS Bucket under my project from google console and gave that project ID to this Connection.Left key file path and scopes as blank.
Then edited airflow.cfg file as follows
remote_base_log_folder = gs://my_test_bucket/
remote_log_conn_id = test_gs
After this changes restarted the web server and scheduler .But still my Dags is not writing logs to the GS bucket .I am able to see the logs which is creating logs in base_log_folder .But nothing is created in my bucket .
Is there any extra configuration needed from my side to get it working
Note: Using Airflow 1.8 .(Same issue I faced with AmazonS3 also. )
Updated on 20/09/2017
Tried the GS method attaching screenshot
Still I am not getting logs in the bucket
Thanks
Anoop R
I advise you to use a DAG to connect airflow to GCP instead of UI.
First, create a service account on GCP and download the json key.
Then execute this DAG (you can modify the scope of your access):
from airflow import DAG
from datetime import datetime
from airflow.operators.python_operator import PythonOperator
def add_gcp_connection(ds, **kwargs):
"""Add a airflow connection for GCP"""
new_conn = Connection(
conn_id='gcp_connection_id',
conn_type='google_cloud_platform',
)
scopes = [
"https://www.googleapis.com/auth/pubsub",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/cloud-platform",
]
conn_extra = {
"extra__google_cloud_platform__scope": ",".join(scopes),
"extra__google_cloud_platform__project": "<name_of_your_project>",
"extra__google_cloud_platform__key_path": '<path_to_your_json_key>'
}
conn_extra_json = json.dumps(conn_extra)
new_conn.set_extra(conn_extra_json)
session = settings.Session()
if not (session.query(Connection).filter(Connection.conn_id ==
new_conn.conn_id).first()):
session.add(new_conn)
session.commit()
else:
msg = '\n\tA connection with `conn_id`={conn_id} already exists\n'
msg = msg.format(conn_id=new_conn.conn_id)
print(msg)
dag = DAG('add_gcp_connection', start_date=datetime(2016,1,1), schedule_interval='#once')
# Task to add a connection
AddGCPCreds = PythonOperator(
dag=dag,
task_id='add_gcp_connection_python',
python_callable=add_gcp_connection,
provide_context=True)
Thanks to Yu Ishikawa for this code.
Yes, you need to provide additional information for both, S3 and GCP connection.
S3
Configuration is passed via extra field as JSON. You can provide only profile
{"profile": "xxx"}
or credentials
{"profile": "xxx", "aws_access_key_id": "xxx", "aws_secret_access_key": "xxx"}
or path to config file
{"profile": "xxx", "s3_config_file": "xxx", "s3_config_format": "xxx"}
In case of the first option, boto will try to detect your credentials.
Source code - airflow/hooks/S3_hook.py:107
GCP
You can either provide key_path and scope (see Service account credentials) or credentials will be extracted from your environment in this order:
Environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to a file with stored credentials information.
Stored "well known" file associated with gcloud command line tool.
Google App Engine (production and testing)
Google Compute Engine production environment.
Source code - airflow/contrib/hooks/gcp_api_base_hook.py:68
The reason for logs not being written to your bucket could be related to service account rather than config on airflow itself. Make sure it has access to the mentioned bucket. I had same problems in the past.
Adding more generous permissions to the service account, e.g. even project wide Editor and then narrowing it down. You could also try using gs client with that key and see if you can write to the bucket.
For me personally this scope works fine for writing logs: "https://www.googleapis.com/auth/cloud-platform"

ETL Pull failing, error message giving mixed messages

When following the instructions on http://developer.gooddata.com/article/loading-data-via-api, I always get a HTTP400 error:
400: Neither expected file "upload_info.json" nor archive "upload.zip" found (is accessible) in ""
When I HTTP GET the same path that I did for the HTTP PUT, the file downloads just fine.
Any pointers to what I'm probably doing wrong?
GoodData is going trough migration from AWS to RackSpace.
Try to change of all get/post/put requests:
secure.gooddata.com to na1.secure.gooddata.com
secure-di.gooddata.com to na1-di.gooddata.com
You can check the datacenter where the project is located via /gdc/projects/{projectId} resource - the "project.content.cluster" field.
For example:
https://secure.gooddata.com/gdc/projects/myProjectId:
{
"project" : {
"content" : {
"cluster" : "na1",
....
For AWS this field has an empty value, "na1" means rackspace.