How to apply changes to Zeppelin env and site in Amazon EMR? - amazon-emr

I am trying to apply changes to the Zeppelin environmental values in my EMR after launch but it is not working. The changes I am trying to add are as below taken from https://zeppelin.apache.org/docs/0.9.0/setup/storage/storage.html
zeppelin-env.sh: (from /etc/zeppelin/conf/ and /etc/zeppelin/conf.dist/)
export ZEPPELIN_NOTEBOOK_S3_CANNED_ACL=BucketOwnerFullControl
Or zeppelin-site.xml: (from /etc/zeppelin/conf/ and /etc/zeppelin/conf.dist/)
<property>
<name>zeppelin.notebook.s3.cannedAcl</name>
<value>BucketOwnerFullControl</value>
<description>Saves notebooks in S3 with the given Canned Access Control List.</description>
</property>
As in the text block, I tried overwriting both the files in conf and conf.dist and then
sudo systemctl stop zeppelin and start. But whenever I re-enter Zeppelin and look at the configurations page I can confirm that the changes did not take effect (also tested). I am not sure what is going on.
How do I add environmental or site values on a running EMR? I'd prefer doing it in a paragraph, but that doesn't seem possible. I am using EMR 5.30.1 if that adds any context.
Update: Zeppelin 0.82 does not support cannedACL settings. Support started from Zeppelin 0.9 which starts with EMR 5.33.0. BUT still does not work.
I can use aws api to upload a file to the other account bucket and it works fine when I add the ACL there, but not in Zeppelin, even when the change reflects in the Configuration page. And this goes for Zepp 0.9 and 0.10. What is going on? Why does this not work? I'm doing a simple df.write.parquet

Related

Tag based policies in Apache Ranger not working

I am new to Apache Ranger and the BigData field in general. I am working on an on-prem big data pipeline. I have configured resource based policies in Apache Ranger (ver 2.2.0) using ranger hive plugin (Hive ver 2.3.8) and they seem to be working fine. But I am having problems with tag based policies and would like someone to tell me where I am going wrong. I have configured a tag based policy in Ranger by doing the following -
1. Create a tag in Apache Atlas (eg. TAG_C1) on a hive column (column C1) (for this first
install Apache Atlas, Atlas Hook for Hive, then create tag in Atlas).
This seems to be working fine.
2. Install Atlas plugin in Apache Ranger.
3. Install RangerTagSync (but did not install Kafka).
4. Atlas Tag (TAG_C1) is being seen in Apache Ranger when I create Tag based masking policy in ranger.
5. But masking is not visible in hive which I access via beeline.
Is Kafka important for Tag based policies in Apache Ranger? What am I doing wrong in these steps?
Kafka is important for tagsync and for atlas too. Kafka is the one thats gonna notify rangertagsync about the tag assigments/changes in apache atlas.

AWS EMR - how to copy files to all the nodes?

is there a way to copy a file to all the nodes in EMR cluster thought EMR command line? I am working with presto and have created my custom plugin. The problem is I have to install this plugin on all the nodes. I don't want to login to all the nodes and copy it.
You can add it as a bootstrap script to let this happen during the launch of the cluster.
#Sanket9394 Thanks for the edit!
If you have the control to Bring up a new EMR, then you should consider using the bootstrap script of the EMR.
But incase you want to do it on Existing EMR (bootstrap is only available during launch time)
You can do this with the help of AWS Systems Manager (ssm) and EMR inbuilt client.
Something like (python):
emr_client = boto3.client('emr')
ssm_client = boto3.client('ssm')
You can get the list of core instances using emr_client.list_instances
finally send a command to each of these instance using ssm_client.send_command
Ref : Check the last detailed example Example Installing Libraries on Core Nodes of a Running Cluster on https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-install-kernels-libs.html#emr-jupyterhub-install-libs
Note: If you are going with SSM , you need to have proper IAM policy of ssm attached to the IAM role of your master node.

How to change the Apache Zeppelin UI appearance and make edits to elements

I'm currently running Apache Zeppelin 0.7.2 on an AWS EMR machine. Is there any way to replace the zeppelin logo and words at the top with any other text and images?
I tried to use the Inspect Elements feature in Chrome on the Zeppelin Webpage and tracked down the image location, which is being loaded from:
/var/run/zeppelin/webapps/webapp/assets/images/zepLogoW.png
I tried to replace the above image file with the target image and made changes to the navbar.html to change the zeppelin word at the top left navigation bar. However, even after making these changes when I restart the Zeppelin service using :
sudo stop zeppelin
sudo start zeppelin
The changes don't reflect in the browser even after refreshing.
Is there any way to make such changes reflect in the browser and persist as well.
Thanks in advance!
Zeppelin uses Jetty which explodes a .war file to produce the web root directory.
Whenever the server is started the war is exploded and the web root is overriden, so changes made to that directory won't survive a service restart.
You can either edit the zeppelin-web code and compile the module to create your own war file you can replace the original with (in EMR it is located at /usr/lib/zeppelin/zeppelin-web-0.x.x.war), or you can replace the logo file and make source edits to the exploded files and they should be served fine (should probably clear browser cache) but will disappear as soon as the Zeppelin service restarts.

Spinnaker Support for App ELB in AWS

Am facing 2 issues with Spinnaker new installation.
I could not see my Application load balancers listed in dropdown of load balancers tab while creating pipeline. We are currently using only app. load balancers in our current set up. I tried editing the JSON file of pipeline with below config and it didn't work. I verfied it by checking the ASG created in my AWS account and checked if there is any ELB/Target group associated but I couldn't see any.
"targetGroups": [
"TG-APP-ELB-NAME-DEV"
],
Hence, I would like to confirm how I can get support of App. ELB into Spinnaker installation and how to use it.
Also I have an ami search issue found.My current set up briefing is below.
One managing account - prod where my spinnaker ec2 is running & my prod application instances are running
Two managed accounts - dev & test where my application test instances are running.
When I create a new AMI in my dev AWS account and am trying to search the newly created AMI from my Spinnaker and it failed with error that it couldn't search the AMI first. Then I shared my AMI in dev to prod after which it was able to search it but failed with UnAuthorized error
Please help me clarify
1. If sharing is required for any new AMI from dev -> Prod or our spinnakerManaged role would take care of permissions
2. How to fix this problem and create AMI successfully.
Regarding #1, have you created the App Load Balancer through the Spinnaker UI or directly through AWS?
If it is the former, then make sure it follows the naming convention agreed by Spinnaker (I believe the balancer name should start with the app name)

Apache Hama on Amazon Elastic MapReduce

I am trying to run Apache Hama on Amazon Elastic MapReduce using https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama script. However, when trying out with one master node and two slave nodes, peer.getNumPeers() in the BSP code reports only 1 peer. I am suspecting whether Hama runs in local mode.
Moreover, looking at configurations at https://hama.apache.org/getting_started_with_hama.html, my understanding is that the list of all the servers should go in hama-site.xml file for property hama.zookeeper.quorum and also in groomservers file. However, I wonder whether these are being configured properly in the install script. Would really appreciate if anyone could point out whether it's a limitation in the script or whether I am doing something wrong.
#Madhura
Hama doesn't always need groomserver file to run fully distributed mode.
groomserver file is needed to run hama cluster using only start-bspd.sh. But emr-bootstrap-action of hama runs groomservers on each slave nodes using hama-daemon.sh file. Code executed in install script is as follow.
$ /bin/hama-daemon.sh --config ${HAMA_HOME}/conf start groom
I think you need to check the emr logs whether they have error or not.