Run Cypher file present in S3 using apoc.cypher.runFile - amazon-s3

I am trying to run call apoc.cypher.runFile(""), but it returns Failed to invoke procedure apoc.cypher.runFile: Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: (my presigned url for the s3 file)
I want to know if it is possible to import cypher scripts stored in the s3 bucket, by using the presigned url and the apoc.cypher.runFile stored procedure.
Please help!!
TIA.

Based on this documentation below, you need to add this in your config and restart the server.
We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:
apoc.import.file.use_neo4j_config=false
Ref: https://neo4j.com/labs/apoc/4.4/overview/apoc.cypher/apoc.cypher.runFile/

Related

AWS S3 Connection in druid

I have set up a clustered Druid with the configuration as mentioned in the Druid documentation
https://druid.apache.org/docs/latest/tutorials/cluster.html
I am using AWS S3 for deep storage. Following is the snippet of my common configuration file
druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage", "druid-s3-extensions", "druid-orc-extensions", "druid-lookups-cached-global"]
# For S3:
druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=druid/segments
#druid.storage.disableAcl=true
druid.storage.sse.type=s3
#druid.s3.accessKey=...
#druid.s3.secretKey=...
# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=bucket-name
druid.indexer.logs.s3Prefix=druid/stage/indexing-logs
While running any ingestion task I am getting Access denied error
Java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ; S3 Extended Request ID: ), S3 Extended Request ID:
at org.apache.druid.storage.s3.S3DataSegmentPusher.push(S3DataSegmentPusher.java:103) ~[?:?]
at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$4(AppenderatorImpl.java:791) ~[druid-server-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.19.0.jar:0.19.0]
I am using s3 for two purposes
read data from s3 and ingest it. This connection is working fine and data is being from s3 location
for deep storage. I am getting error over here.
I am using Profile information authentication method to provide s3 credential. So I already have configured aws cli with appropriate credentials. Also, s3 data is encrypted by AES256 so i have added druid.storage.sse.type=s3 in config file.
Can someone help me out here as I am not able to debug the issue.
You asked how to approach debugging this. Normally I would:
Ssh onto the ec2 instance and run aws sts get-caller-identity. This will tell you what principal your requests are sent from. Then, I would confirm that principal has the S3 access that is expected.
I would confirm that I can write to the bucket in your configuration.
druid.storage.type=s3
druid.storage.bucket=<bucket-name>
druid.storage.baseKey=druid/segments
I would try some of the other auth methods such as exporting the keys into the environment mentioned in the third option since that is a simple test. Then I would run step 1 again to confirm my principal reflects those keys. And then I would try running your code again.

dms s3 source endpoint connection fails

Getting below connection error when trying to validate S3 source endpoint of DMS.
Test Endpoint failed: Application-Status: 1020912, Application-Message: Failed to connect to database.
Followed all the steps listed in the below links but still maybe I am missing something...
https://aws.amazon.com/premiumsupport/knowledge-center/dms-connection-test-fail-s3/
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.S3.html
The role associated with the endpoint does have access to the S3 bucket of the endpoint, along with dms being listed as trusted entity.
I got this same error when trying to use S3 as a target.
The one thing not mentioned in the documentation, and which turned out to be the root cause for my error, is that the DMS Replication Instance and the Bucket need to be in the same region.

PySpark Writing DataFrame Partitions to S3

I've been trying to partition and write a spark dataframe to S3 and I get an error.
df.write.partitionBy("year","month").mode("append")\
.parquet('s3a://bucket_name/test_folder/')
Error message is:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxxx,
AWS Error Code: SignatureDoesNotMatch,
AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method.
However, when I simply write without partitioning it does work.
df.write.mode("append").parquet('s3a://bucket_name/test_folder/')
What could be causing this problem?
I resolved this problem by upgrading from aws-java-sdk:1.7.4 to aws-java-sdk:1.11.199 and hadoop-aws:2.7.7 to hadoop-aws:3.0.0 in my spark-submit.
I set this in my python file using:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.11.199,org.apache.hadoop:hadoop-aws:3.0.0 pyspark-shell
But you can also provide them as arguments to spark-submit directly.
I had to rebuild Spark providing my own version of Hadoop 3.0.0 to avoid dependency conflicts.
You can read some of my speculation as to the root cause here: https://stackoverflow.com/a/51917228/10239681

Amazon S3 File Read Timeout. Trying to download a file using JAVA

New to Amazon S3 usage.I get the following error when trying to access the file from Amazon S3 using a simple java method.
2016-08-23 09:46:48 INFO request:450 - Received successful response:200, AWS Request ID: F5EA01DB74D0D0F5
Caught an AmazonClientException, which means the client encountered an
internal error while trying to communicate with S3, such as not being
able to access the network.
Error Message: Unable to store object contents to disk: Read timed out
The exact lines of code worked yesterday.I was able to download 100% of 5GB file in 12 min. Today I'm in a better connected environment but only 2% or 3% of the file is downloaded and then the program fails.
Code that I'm using to download.
s3Client.getObject(new GetObjectRequest("mybucket", file.getKey()), localFile);
You need to set the connection timeout and the socket timeout in your client configuration.
Click here for a reference article
Here is an excerpt from the article:
Several HTTP transport options can be configured through the com.amazonaws.ClientConfiguration object. Default values will suffice for the majority of users, but users who want more control can configure:
Socket timeout
Connection timeout
Maximum retry attempts for retry-able errors
Maximum open HTTP connections
Here is an example on how to do it:
Downloading files >3Gb from S3 fails with "SocketTimeoutException: Read timed out"

multiple file upload to bigquery

I am trying to do multiple file upload simultaneously to google big-query using command line tool. I got following error :
BigQuery error in load operation: Could not connect with BigQuery server.
Http response status: 503
Http response content:
Service Unavailable
Any way to workaround this problem ?
How do I upload multiple files simultaneously to google big-query using command line tool.
Multiple file upload should work (and we use it every day). If you're getting a 503, that indicates something is wrong with the service. One thing you might want to make sure of is that if you're using a * in your command line that you have it quoted so that the shell doesn't expand it automatically before it gets passed to bq.
If you're getting a 503 error, can you retry the command the flag --apilog=- (this needs to be one of the first params) which will dump the interaction with the server to stdout. The problem may be obvious from that log, but if it isn't can you update your question with the relevant portions of the log? If you're not comfortable posting that information on a public forum, can you e-mail it to me at tigani at google dot com?