How do I connect to Neptune using Java - amazon-neptune

I have the following code based on the docs...
#Controller
#RequestMapping("neptune")
public class NeptuneEndpoint {
#GetMapping("")
#ResponseBody
public String test(){
Cluster.Builder builder = Cluster.build();
builder.addContactPoint("...endpoint...");
builder.port(8182);
Cluster cluster = builder.create();
GraphTraversalSource g = EmptyGraph.instance()
.traversal()
.withRemote(
DriverRemoteConnection.using(cluster)
);
GraphTraversal t = g.V().limit(2).valueMap();
t.forEachRemaining(
e -> System.out.println(e)
);
cluster.close();
return "Neptune Up";
}
}
But when I try to run I get ...
java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
Also how would I add Secret key from AWS IAM account?

Neptune doesn't allow you to connect to the db instance from your local machine. You can only connect to Neptune via an EC2 inside the same VPC as Neptune (aws documentation).
Try making a runnable jar of this code and run it inside an ec2, the code should work fine. If you're trying to debug something from your local system, then use PuTTY instance tunneling to connect to ec2 which then will be forwarded to neptune cluster.

Have you created an instance with IAM auth enabled?
If yes, you will have to sign your request using SigV4. More information (and examples) on how to connect using SigV4 is available at https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-gremlin-java.html
The examples given in the documentation above also contain information on how to use your IAM credentials to connect to a Neptune cluster.

I just had the same issue and the root cause was a dependency version conflict with Netty which is unfortunately a very pervasive dependency. Gremlin 3.3.2 uses io.netty/netty-all version 4.0.56.Final. You might find your project depends on another Netty jar such as io.netty/netty or io.netty/netty-handler both of which can cause issues so you will need to excluded them from other dependencies in your POM or use managed-dependencies to set a project level Netty version.

Another option is to use a AWS SigV4 signing proxy that acts as a bridge between Neptune and your local development environment. One of these proxies is https://github.com/monken/aws4-proxy
npm install --global aws4-proxy
# have your credentials exported as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
aws4-proxy --service neptune-db --endpoint cluster-die4eenu.cluster-eede5pho.eu-west-1.neptune.amazonaws.com --region eu-west-1
wscat localhost:3000/gremlin

Refer this
Note: You need to be in the same VPC to access Neptune cluster.

Related

.NET Core 3.x setting development AWS credentials

I have EC2 instances (via Elastic Beanstalk) running my ASP.Net Core 3.1 web app without a problem. AWS credentials are included in the key pair configured with the instance.
I want to now store my Data Protection keys in a S3 bucket that I created for them, so I can share the keys among all of the EC2 instances. However, when I add this service in my Startup.ConfigureServices, I get a runtime error locally:
services.AddDefaultAWSOptions(Configuration.GetAWSOptions("AWS"));
services.AddAWSService<IAmazonS3>();
services.AddDataProtection()
.SetApplicationName("Crums")
.PersistKeysToAWSSystemsManager("/CrumsWeb/DataProtection");
My app runs fine locally if I comment out the .PersistKeysToAWSSystemsManager("/CrumsWeb/DataProtection"); line above. When I uncomment the line, the error occurs. So it has something to do with that, but I can't seem to figure it out.
I was going to use PersistKeysToAwsS3 by hotchkj, but it was deprecated when AWS came out with PersistKeysToAWSSystemsManager.
The runtime error AmazonClientException: No RegionEndpoint or ServiceURL configured happens on CreateHostBuilder in my Program.cs:
I've spent many hours on this trying just to get Visual Studio 2019 to run my app locally, using suggestions from these sites:
https://aws.amazon.com/blogs/developer/configuring-aws-sdk-with-net-core/
https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-config-netcore.html
ASP NET Core AWS No RegionEndpoint or ServiceURL configured when deployed to Heroku
No RegionEndpoint or ServiceURL configured
https://github.com/secretorange/aws-aspnetcore-environment-startup
https://www.youtube.com/watch?v=C4AyfV3Z3xs&ab_channel=AmazonWebServices
My appsettings.Development.json (and I also tried it in appsettings.json) contains:
"AWS": {
"Profile": "default",
"Region": "us-east-1",
"ProfilesLocation": "C:\\Users\\username\\.aws\\credentials"
}
And the credentials file contains:
[default]
aws_access_key_id = MY_ACCESS_KEY
aws_secret_access_key = MY_SECRET_KEY
region = us-east-1
toolkit_artifact_guid=GUID
I ended up abandoning PersistKeysToAWSSystemsManager for storing my Data Protection keys because I don't want to set up yet another AWS service just to store keys in their Systems Manager. I am already paying for an S3 account, so I chose to use the deprecated NuGet package AspNetCore.DataProtection.Aws.S3.
I use server-side encryption on the bucket I created for the keys. This is the code in Startup.cs:
services.AddDataProtection()
.SetApplicationName("AppName")
.PersistKeysToAwsS3(new AmazonS3Client(RegionEndpoint.USEast1), new S3XmlRepositoryConfig("S3BucketName")
{
KeyPrefix = "DataProtectionKeys/", // Folder in the S3 bucket for keys
});
Notice the RegionEndpoint parameter in the PersistKeysToAwsS3, which resolved the No RegionEndpoint or ServiceURL Configured error.
I added the AmazonS3FullAccess policy to the IAM role that's running in all my instances.
This gives the instance the permissions to access the S3 bucket. My local development computer also seems to be able to access the S3 bucket, although I don't know where it's getting credentials from. I tried several iterations of appsettings.json and credentials file changes to locally set region and credentials, but nothing worked. Maybe it's using credentials I entered when I set up the AWS Toolkit in Visual Studio.

How to programmatically set up Airflow 1.10 logging with localstack s3 endpoint?

In attempt to setup airflow logging to localstack s3 buckets, for local and kubernetes dev environments, I am following the airflow documentation for logging to s3. To give a little context, localstack is a local AWS cloud stack with AWS services including s3 running locally.
I added the following environment variables to my airflow containers similar to this other stack overflow post in attempt to log to my local s3 buckets. This is what I added to docker-compose.yaml for all airflow containers:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
I've also added my localstack s3 creds to airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
Additionally, I've installed apache-airflow[hooks], and apache-airflow[s3], though it's not clear which one is really needed based on the documentation.
I've followed the steps in a previous stack overflow post in attempt verify if the S3Hook can write to my localstack s3 instance:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
But I get botocore.exceptions.NoCredentialsError: Unable to locate credentials.
After adding credentials to airflow console under /admin/connection/edit as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records. is returned. Other people have encountered this same issue and it may have been related to networking.
Regardless, a programatic setup is needed, not a manual one.
I was able to access the bucket using a standalone Python script (entering AWS credentials explicitly with boto), but it needs to work as part of airflow.
Is there a proper way to set up host / port / credentials for S3Hook by adding MyS3Conn to airflow.cfg?
Based on the airflow s3 hooks source code, it seems a custom s3 URL may not yet be supported by airflow. However, based on the airflow aws_hook source code (parent) it seems it should be possible to set the endpoint_url including port, and it should be read from airflow.cfg.
I am able to inspect and write to my s3 bucket in localstack using boto alone. Also, curl http://localstack:4572/local-mochi-airflow-logs returns the contents of the bucket from the airflow container. And aws --endpoint-url=http://localhost:4572 s3 ls returns Could not connect to the endpoint URL: "http://localhost:4572/".
What other steps might be needed to log to localstack s3 buckets from airflow running in docker, with automated setup and is this even supported yet?
I think you're supposed to use localhost not localstack for the endpoint, e.g. host = http://localhost:4572.
In Airflow 1.10 you can override the endpoint on a per-connection basis but unfortunately it only supports one endpoint at a time so you'd be changing it for all AWS hooks using the connection. To override it, edit the relevant connection and in the "Extra" field put:
{"host": "http://localhost:4572"}
I believe this will fix it?
I managed to make this work by referring to this guide. Basically you need to create a connection using the Connection class and pass the credentials that you need, in my case I needed AWS_SESSION_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, REGION_NAME to make this work. Use this function as a python_callable in a PythonOperator which should be the first part of the DAG.
import os
import json
from airflow.models.connection import Connection
from airflow.exceptions import AirflowFailException
def _create_connection(**context):
"""
Sets the connection information about the environment using the Connection
class instead of doing it manually in the Airflow UI
"""
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
REGION_NAME = os.getenv("REGION_NAME")
credentials = [
AWS_SESSION_TOKEN,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
REGION_NAME,
]
if not credentials or any(not credential for credential in credentials):
raise AirflowFailException("Environment variables were not passed")
extras = json.dumps(
dict(
aws_session_token=AWS_SESSION_TOKEN,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=REGION_NAME,
),
)
try:
Connection(
conn_id="s3_con",
conn_type="S3",
extra=extras,
)
except Exception as e:
raise AirflowFailException(
f"Error creating connection to Airflow :{e!r}",
)

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

How to do kerberos authentication on a flink standalone installation?

I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far:
I created a keytab file for my user.
In my Flink job, I added the following code:
UserGroupInformation.loginUserFromKeytab("myusername", "/path/to/keytab");
Finally I am using a TextOutputFormatto write data to the HDFS.
When I run the job, I'm getting the following error:
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBE
ROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1730)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:405)
For some odd reason, Flink seems to try SIMPLE authentication, even though I called loginUserFromKeytab. I found another similar issue on Stackoverflow (Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)) which had an answer explaining that:
Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes.
That may mean that I have to do some authentication at the OS level e.g. with kinit. Since my knowledge of Kerberos is very limited I have no idea how I would do it. Also I would like to understand how the program running after kinit actually knows which Kerberos ticket to pick from the local cache when there is no configuration whatsoever regarding this.
I'm not a Flink user, but based on what I've seen with Spark & friends, my guess is that "Authenticated on all worker nodes" means that each worker process has
a core-site.xml config available on local fs with
hadoop.security.authentication set to kerberos (among other
things)
the local dir containing core-site.xml added to the CLASSPATH so that it is found automatically by the Hadoop Configuration object [it will revert silently to default hard-coded values otherwise, duh]
implicit authentication via kinit and the default cache [TGT set globally for the Linux account, impacts all processes, duh] ## or ## implicit authentication via kinit and a "private" cache set thru KRB5CCNAME env variable (Hadoop supports only "FILE:" type) ## or ## explicit authentication via UserGroupInformation.loginUserFromKeytab() and a keytab available on the local fs
That UGI "login" method is incredibly verbose, so if it was indeed called before Flink tries to initiate the HDFS client from the Configuration, you will notice. On the other hand, if you don't see the verbose stuff, then your attempt to create a private Kerberos TGT is bypassed by Flink, and you have to find a way to bypass Flink :-/
You can also configure your stand alone cluster to handle authentication for you without additional code in your jobs.
Export HADOOP_CONF_DIR and point it to directory where core-site.xml and hdfs-site.xml is located
Add to flink-conf.yml
security.kerberos.login.use-ticket-cache: false
security.kerberos.login.keytab: <path to keytab>
security.kerberos.login.principal: <principal>
env.java.opts: -Djava.security.krb5.conf=<path to krb5 conf>
Add pre-bundled Hadoop to lib directory of your cluster https://flink.apache.org/downloads.html
The only dependencies you should need in your jobs is:
compile "org.apache.flink:flink-java:$flinkVersion"
compile "org.apache.flink:flink-clients_2.11:$flinkVersion"
compile 'org.apache.hadoop:hadoop-hdfs:$hadoopVersion'
compile 'org.apache.hadoop:hadoop-client:$hadoopVersion'
In order to access a secured HDFS or HBase installation from a standalone Flink installation, you have to do the following:
Log into the server running the JobManager, authenticate against Kerberos using kinit and start the JobManager (without logging out or switching the user in between).
Log into each server running a TaskManager, authenticate again using kinit and start the TaskManager (again, with the same user).
Log into the server from where you want to start your streaming job (often, its the same machine running the JobManager), log into Kerberos (with kinit) and start your job with /bin/flink run.
In my understanding, kinit is logging in the current user and creating a file somewhere in /tmp with some login data. The mostly static class UserGroupInformation is looking up that file with the login data when its loaded the first time. If the current user is authenticated with Kerberos, the information is used to authenticate against HDFS.

How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

I have an aws ec2 cluster setup by the spark-ec2 script.
I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.
I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.
I installed Zeppelin locally with
mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1
My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.
I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077
from local[*]
When I test out
sc
in the interpreter, I recieve this error
java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at
When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.
NOTE: I eventually would also want to access my s3 buckets with something like:
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first
if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!
Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077 but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).