/home/hadoop/bin/hadoop missing in ami 4.x - amazon-emr

I am trying to migrate a legacy mapreduce pipeline that is using ami 3.x to ami 4.x. It currently has bash scripts as part of the bootstrapping and one of them calls hadoop fs-get s3n://somefile ~/otherfile. This fails in my current migration attempt to ami 4.x. And adding ls /home/hadoop/bin the script shows that the directory /home/hadoop/bin does not exist so of course the binary /home/hadoop/bin/hadoop would not exist. Is there something I need to configure to ensure the hadoop binary exists? I can't seem to find anything obvious in the documentation.

The file system layout changed considerably between 3.x and 4.x. The differences between 3.x and 4.x and instructions for migrating can be found here: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-4.1.0/emr-release-differences.html
The short answer for solving your issue though is that you should use "aws s3 cp" instead of "hadoop fs-get" in bootstrap actions, since Hadoop is not installed until after bootstrap actions run on 4.x+.

Related

GlassFish 4.1.2 updatetool/pkg tools fail - missing pkg-bootstrap

Summary: The pkg-bootstrap.jar and related files are missing from the latest GlassFish 4.1.2 and this prevents the updatetool from running. What is the proper way to install and run updatetool on Windows 10?
Detail: I was working with the Java EE 7 tutorial and downloaded the Java EE 7 SDK Update 3 (not Web Profile) which is based on GlassFish Open Source Edition 4.1.2. I ran into a problem running the updatetool on Windows 10. When run, it gives the option to install itself but the installation fails. It looks like the update tool uses the pkg tool, and that uses a pkg-bootstrap to install itself the first time. However, this is no longer included in GlassFish 4.1.2. When the updatetool is run, it produces the following errors:
C:\glassfish4\bin>updatetool
The software needed for this command (updatetool) is not installed.
If you choose to install Update Tool, your system will be automatically
configured to periodically check for software updates. If you would like
to configure the tool to not check for updates, you can override the
default behavior via the tool's Preferences facility.
When this tool interacts with package repositories, some system information
such as your system's IP address and operating system type and version
is sent to the repository server. For more information please see:
http://wikis.oracle.com/display/updatecenter/UsageMetricsUC2
Once installation is complete you may re-run this command.
Would you like to install Update Tool now (y/n): y
C:\glassfish4>"C:\Program Files\Java\jdk1.8.0_121\bin\java" -Dimage.path="C:\glassfish4\bin\\.." -jar "C:\glassfish4\bin\\..\pkg/lib/pkg-client.jar" refresh
Error: Unable to access jarfile C:\glassfish4\bin\\..\pkg/lib/pkg-client.jar
C:\glassfish4>"C:\Program Files\Java\jdk1.8.0_121\bin\java" -Dimage.path="C:\glassfish4\bin\\.." -jar "C:\glassfish4\bin\\..\pkg/lib/pkg-bootstrap.jar" "C:\Users\[userid]\AppData\Local\Temp\pkg-bootstrap21687.props"
Error: Unable to access jarfile C:\glassfish4\bin\\..\pkg/lib/pkg-bootstrap.jar
C:\glassfish4\bin\pkg does not exist in either the latest Java EE 7 SDK Update 3 or the latest GlassFish 4.1.2. Some research on the nightly builds shows that the directory trees glassfish4/.org.opensolaris,pkg and glassfish4/pkg were removed between builds glassfish-4.1.2-b03-02_25_2017 and glassfish-4.1.2-b03-03_07_2017. I can't find anything that explains why they were removed or an alternate way to install the updatetool. My work around was to copy the two trees from glassfish-4.1.2-b03-02_25_2017 into c:\glassfish4 (from the Java EE 7 SDK Update 3) and that seems to work. But, I figure that if this was removed, there was a good reason for it, and I shouldn't be hacking it.
If there was a separate installation step for the package tool, I missed it. What is the proper way to get the updatetool to run on GlassFish 4.1.2?
I have jdk1.8.0_121 and jre1.8.0_121.
Thanks for your help.
I had the same problem as DevDevDev.
I went to the link in his post:
http://download.oracle.com/glassfish/4.1.2/nightly/index.html
Downloaded the archive:
glassfish-4.1.2-b03-02_25_2017
http://download.oracle.com/glassfish/4.1.2/nightly/glassfish-4.1.2-b03-02_25_2017.zip
Extracted the missing folders into my glassfish directory:
/glassfish4/pkg
/glassfish4/.org.opensolaris,pkg
As DevDevDev I have questions about why it was removed but it works for me...for now.... Hope it helps someone else. Thank you DevDevDev I would not have solved this without your post!
I was working with Java SE. Then I needed to work with JAX-WS, so I went into the same website as you.
Basically, it says that you have to:
Download the package (a compressed file with a folder called glassfish4)
Unzip the downloaded file (does not specify where)
voilá
It did not work for me, so I kept searching and I found this: https://forums.netbeans.org/post-91328.html
You just need to download this update from netbeans plugin Manager:
"Java EE Base"
Good luck!
I got the same problem too. It seems that glassfish 4.1 did not integrate the Update Tool, so as doc of oracle suggests, we'd better install SDK 6(glassfish 3). Here is Java EE 6 SDK Update 3, note that the version provided here is with JDK 7. If you already installed JDK in your windows 10, you may ignore it.
When you finish downloading the .exe file, you should not install SDK by double-click the .exe file. Instead, you should run below command:
java_ee_sdk-6u3-jdk7-windows-x64.exe -j [JRE-Home]
note, command here is the name of your .exe file and it needs console arg of JRE Home, mine command is as below:
java_ee_sdk-6u3-jdk7-windows-x64.exe -j D:\JDK\jre
It seems that unzipping the file using Windows explorer's zip support doesn't work properly. If you instead do as described in the README and run:
jar xvf glassfish-4.1.zip
The archive is extracted properly and all the needed pkg files are there.
What files do you need? I had the save problem I was looking for the files of tutorial. Finally I found them here: ..../glassfish4/docs/javaee-tutorial/

How to create a bootstrap action for Impala on EMR

The latest version of Impala that I can find an EMR bootstrap action This one this is from 2015 and installs Impala 2.2.0
Is there an easy way to update this to 2.7 or 2.8? Spinning up an Ubuntu 14.04 box to do a build is one option, but I'm unclear how to ultimately install it on an EMR cluster.
As this moment, there is no documented script to easily update impala to work on EMR. Its a sys admin thing and yes, you might look at the install script and tweak it to include your own build. Also, make sure you have all dependencies.
Once you install it, you will need to ensure you have relevant configuration files placed somewhere in the CLASSPATH established by set-classpath.sh for Impala's use of EMR's HDFS, HBase , Hive metastore or S3 .

Travis config for deploying a static site without any build actions

I'd like to use Travis to push a static HTML/JavaScript website to an Amazon S3 bucket on each commit to master. Is there any way to configure my .travis.yml so it doesn't try to run any sort of build process? Just a deploy?
It seems like this is mainly controlled by the language setting which defaults to Ruby, so Ruby is being (unnecessarily) installed on each build.
I don't know how the ruby box works (I use the java box for my work); that being said, I think that the travis CI boxes have their base language already installed so you aren't really unnecessarily installing ruby each time.
If you want, there supposedly is an undocumented option language: generic.
This way you can just run the required bash commands to deploy your code to Amazon S3

Pig versions and UDF

I am using pig version as 0.12,But for creating UDFs i am using the jar file of Pig 0.9 version.
I simply downloaded the jar file for Pig 0.9 version and added that in my eclipse classpath.
All the UDFs that I created using Pig 0.9 version API works fine.
But I would like to know the impact on that.
Is there any problem that I will face in future
The issues that you will face is API inconsistencies as time goes by. Some of the core APIs are relatively stable. Heck, most. But the longer you use an old Pig API the higher the chance you'll get an issue running in the cluster.
Something else to think about is are you overriding your Pig version in the cluster. For example, say you have an uber-jar with the pig scripts in it. If that JAR contains Pig v.09, you'll actually use that version rather than .12. By not migrating, you might be pulling in the wrong version of Pig.

How to find hadoop-exmaples.jar in version 2.4?

I just set up the hadoop environment on my Mac and wanted to try to test whether it was installed properly.
And
namenode -format
works fine, however, nearly all the online tutorials used "hadoop-examples.jar" which was in libexec.
My hadoop is the newest release, 2.4 and there's no such jar in libexec or any other folder. Do they remove it or this is not used for testing the environment anymore?
In Hadoop-2.4.0 there is different examples available for each application,
For MapReduce rleated jar files,
cd hadoop/mapreduce
For HDFS,
cd hadoop/hdfs
"hadoop-examples.jar" not available in latest version; for that they add more jar files for different applications.
You can use if from Hadoop.