MimeUtility decode code can't work on cloud - apache-spark-sql

I have a hive udf for cleaning text, the text is encoded with "quoted-printable", this is the code I refer from online:
InputStream is = new ByteArrayInputStream(text.getBytes());
try {
InputStream isAfterDecode = MimeUtility.decode(is, "quoted-printable");
text = new BufferedReader(
new InputStreamReader(isAfterDecode, StandardCharsets.UTF_8))
.lines()
.collect(Collectors.joining(System.lineSeparator()));
} catch (MessagingException e) {
throw new RuntimeException(e);
}
When I test it on my IDEA, it worked fine. However, when I packaged it as a Hive function jar and uploaded to Cloud(Dataproc) to use it by SparkSql, it can't work. Do I need to do anything else?
I though it may be caused by the environment?

Related

Can you debug java code in a Bamboo java spec #BambooSpec main method?

I'm using bamboo and bamboo java-spec using the pipeline as java code in a bitbucket hosted project.
I'm trying to use a json file as a configuration file to specify which stages I want to run in my pipeline.
So I've created a configuration.json file. And i've added the following code in my #BambooSpec annotated PlanSpec class.
private static Map<?, ?> getConfiguration(String configurationFile) throws Exception {
System.out.println("Does this work at all?");
Map<?, ?> map = null;
try {
// create Gson instance
Gson gson = new Gson();
// create a reader
Reader reader = Files.newBufferedReader(Paths.get(configurationFile));
// convert JSON file to map
map = gson.fromJson(reader, Map.class);
// print map entries
for (Map.Entry<?, ?> entry : map.entrySet()) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
// close reader
reader.close();
} catch (Exception ex) {
ex.printStackTrace();
throw ex;
}
return map;
}
But bamboo only shows logs for what's run as part of the plan spec. The System.out.println's, are not visable.
Is there a way to debug my code at runtime?
Edit: in the mean time I found out I can just run my code locally in the IDE. And then it'll complain about not finding a .credentials file. But that doesn't matter. I can at least test the code, from before it publishes the plan.

AmazonS3: Getting warning: S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection

Here's the warning that I am getting:
S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
I tried using try with resources but S3ObjectInputStream doesn't seem to close via this method.
try (S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah blah
}
I also tried below code and explicitly closing but that doesn't work either:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah
s3ObjectInputStream.close();
s3object.close();
}
Any help would be appreciated.
PS: I am only reading two lines of the file from S3 and the file has more data.
Got the answer via other medium. Sharing it here:
The warning indicates that you called close() without reading the whole file. This is problematic because S3 is still trying to send the data and you're leaving the connection in a sad state.
There's two options here:
Read the rest of the data from the input stream so the connection can be reused.
Call s3ObjectInputStream.abort() to close the connection without reading the data. The connection won't be reused, so you take some performance hit with the next request to re-create the connection. This may be worth it if it's going to take a long time to read the rest of the file.
Following option #1 of Chirag Sejpal's answer I used the below statement to drain the S3AbortableInputStream to ensure the connection can be reused:
com.amazonaws.util.IOUtils.drainInputStream(s3ObjectInputStream);
I ran into the same problem and the following class helped me
#Data
#AllArgsConstructor
public class S3ObjectClosable implements Closeable {
private final S3Object s3Object;
#Override
public void close() throws IOException {
s3Object.getObjectContent().abort();
s3Object.close();
}
}
and now you can use without warning
try (final var s3ObjectClosable = new S3ObjectClosable(s3Client.getObject(bucket, key))) {
//same code
}
To add an example to Chirag Sejpal's answer (elaborating on option #1), the following can be used to read the rest of the data from the input stream before closing it:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
try (S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent()) {
try {
// Read from stream as necessary
} catch (Exception e) {
// Handle exceptions as necessary
} finally {
while (s3ObjectInputStream != null && s3ObjectInputStream.read() != -1) {
// Read the rest of the stream
}
}
// The stream will be closed automatically by the try-with-resources statement
}
I ran into the same error.
As others have pointed out, the /tmp space in lambda is limited to 512 MB.
And if the lambda context is re-used for a new invocation, then the /tmp space is already half-full.
So, when reading the S3 objects and writing all the files to the /tmp directory (as I was doing),
I ran out of disk space somewhere in between.
Lambda exited with error, but NOT all bytes from the S3ObjectInputStream were read.
So, two things one need to keep in mind:
1) If the first execution causes the problem, be stingy with your /tmp space.
We have only 512 MB
2) If the second execution causes the problem, then this could be resolved by attacking the root problem.
Its not possible to delete the /tmp folder.
So, delete all the files in the /tmp folder after the execution is finished.
In java, here is what I did, which successfully resolved the problem.
public String handleRequest(Map < String, String > keyValuePairs, Context lambdaContext) {
try {
// All work here
} catch (Exception e) {
logger.error("Error {}", e.toString());
return "Error";
} finally {
deleteAllFilesInTmpDir();
}
}
private void deleteAllFilesInTmpDir() {
Path path = java.nio.file.Paths.get(File.separator, "tmp", File.separator);
try {
if (Files.exists(path)) {
deleteDir(path.toFile());
logger.info("Successfully cleaned up the tmp directory");
}
} catch (Exception ex) {
logger.error("Unable to clean up the tmp directory");
}
}
public void deleteDir(File dir) {
File[] files = dir.listFiles();
if (files != null) {
for (final File file: files) {
deleteDir(file);
}
}
dir.delete();
}
This is my solution. I'm using spring boot 2.4.3
Create an amazon s3 client
AmazonS3 amazonS3Client = AmazonS3ClientBuilder
.standard()
.withRegion("your-region")
.withCredentials(
new AWSStaticCredentialsProvider(
new BasicAWSCredentials("your-access-key", "your-secret-access-key")))
.build();
Create an amazon transfer client.
TransferManager transferManagerClient = TransferManagerBuilder.standard()
.withS3Client(amazonS3Client)
.build();
Create a temporary file in /tmp/{your-s3-key} so that we can put the file we download in this file.
File file = new File(System.getProperty("java.io.tmpdir"), "your-s3-key");
try {
file.createNewFile(); // Create temporary file
} catch (IOException e) {
e.printStackTrace();
}
file.mkdirs(); // Create the directory of the temporary file
Then, we download the file from s3 using transfer manager client
// Note that in this line the s3 file downloaded has been transferred in to the temporary file that we created
Download download = transferManagerClient.download(
new GetObjectRequest("your-s3-bucket-name", "your-s3-key"), file);
// This line blocks the thread until the download is finished
download.waitForCompletion();
Now that the s3 file has been successfully transferred into the temporary file that we created. We can get the InputStream of the temporary file.
InputStream input = new DataInputStream(new FileInputStream(file));
Because the temporary file is not needed anymore, we just delete it.
file.delete();

Jreport JasperRunManager.runReportToPdfStream null pointer exception

i'm trying to get a PDF report using Jasper in my Java web application but i'm facing to an null pointer exception and i'm not able to find which is the error.
here below my code :
private void caricaReport() {
try{
InputStream is = getClass().getResourceAsStream("reports/miooperearte.jasper");
File OutDir = new File(outputDir);
File outDir = new File(outputDir);
outDir.mkdirs();
OutputStream os = new FileOutputStream(new File(outDir, "testReportNadia.pdf"));
HashMap parameterMap = new HashMap();
parameterMap.put("immagini_base_dir", "/Applications/MAMP/htdocs/Dboperearte/app/webroot/images/");
Collection data = leggiOpere();
JRBeanCollectionDataSource dataSource = new JRBeanCollectionDataSource(data,false);
JasperRunManager.runReportToPdfStream(is, os, parameterMap, dataSource);
}
catch (Exception e) {
e.printStackTrace();
}
}
variables "is", "os", "parameterMap" and "dataSource" are all filled, exception doesn't show which is the null problem only say null pointer exception...
any idea which can help me to solve or find the problem ?
Thanks
I would guess that the parameterMap doesn't contain an entry for something that the JasperRunManager is expecting - make sure you aren't missing any values from there.

Best way to extract .zip and put in .jar

I have been trying to find the best way to do this I have thought of extracting the contents of the .jar then moving the files into the directory then putting it back as a jar. Im not sure is the best solution or how I will do it. I have looked at DotNetZip & SharpZipLib but don't know what one to use.
If anyone can give me a link to the code on how to do this it would be appreciated.
For DotNetZip you can find very simple VB.NET examples of both creating a zip archive and extracting a zip archive into a directory here. Just remember to save the compressed file with extension .jar .
For SharpZipLib there are somewhat more comprehensive examples of archive creation and extraction here.
If none of these libraries manage to extract the full JAR archive, you could also consider accessing a more full-fledged compression software such as 7-zip, either starting it as a separate process using Process.Start or using its COM interface to access the relevant methods in the 7za.dll. More information on COM usage can be found here.
I think you are working with Minecraft 1.3.1 no? If you are, there is a file contained in the zip called aux.class, which unfortunately is a reserved filename in windows. I've been trying to automate the process of modding, while manipulating the jar file myself, and have had little success. The only option I have yet to explore is find a way to extract the contents of the jar file to a temporary location, while watching for that exception. When it occurs, rename the file to a temp name, extract and move on. Then while recreating the zip file, give the file the original name in the archive. From my own experience, SharZipLib doesnt do what you need it do nicely, or at least I couldnt figure out how. I suggest using Ionic Zip (Dot Net Zip) instead, and trying the rename route on the offending files. In addition, I also posted a question about this. You can see how far I got at Extract zip entries to another Zip file
Edit - I tested out .net zip more (available from http://dotnetzip.codeplex.com/), and heres what you need. I imagine it will work with any zip file that contains reserved file names. I know its in C#, but hey cant do all the work for ya :P
public static void CopyToZip(string inArchive, string outArchive, string tempPath)
{
ZipFile inZip = null;
ZipFile outZip = null;
try
{
inZip = new ZipFile(inArchive);
outZip = new ZipFile(outArchive);
List<string> tempNames = new List<string>();
List<string> originalNames = new List<string>();
int I = 0;
foreach (ZipEntry entry in inZip)
{
if (!entry.IsDirectory)
{
string tempName = Path.Combine(tempPath, "tmp.tmp");
string oldName = entry.FileName;
byte[] buffer = new byte[4026];
Stream inStream = null;
FileStream stream = null;
try
{
inStream = entry.OpenReader();
stream = new FileStream(tempName, FileMode.Create, FileAccess.ReadWrite);
int size = 0;
while ((size = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
stream.Write(buffer, 0, size);
}
inStream.Close();
stream.Flush();
stream.Close();
inStream = new FileStream(tempName, FileMode.Open, FileAccess.Read);
outZip.AddEntry(oldName, inStream);
outZip.Save();
}
catch (Exception exe)
{
throw exe;
}
finally
{
try { inStream.Close(); }
catch (Exception ignore) { }
try { stream.Close(); }
catch (Exception ignore) { }
}
}
}
}
catch (Exception e)
{
throw e;
}
}

YUI-Compressor: result file is empty

I am using the YUI Compressor library to minify CSS and JavaScript files. I directly use the classes CssCompressor and JavaScriptCompressor.
Unfortunatly some of the resulting files are empty without any warnings or exceptions.
I already tried it with the versions:
yuicompressor-2.4.2.jar
yuicompressor-2.4.6.jar
yuicompressor-2.4.7pre.jar
My used code is:
public static void compress(File file) {
try {
long start = System.currentTimeMillis();
File targetFile = new File("results", file.getName() + ".min");
Writer writer = new FileWriter(targetFile);
if (file.getName().endsWith(".css")) {
CssCompressor cssCompressor = new CssCompressor(new FileReader(file));
cssCompressor.compress(writer, -1);
} else if (file.getName().endsWith(".js")) {
JavaScriptCompressor jsCompressor = new JavaScriptCompressor(new FileReader(file), new MyErrorReporter());
jsCompressor.compress(writer, -1, true, false, false, true);
}
long end = System.currentTimeMillis();
System.out.println("\t compressed " + file.getName() + " within " + (end - start) + " milliseconds");
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Files which do not work (are empty afterwards) are e.g.
http://code.google.com/p/open-cooliris/source/browse/trunk/fancy/jquery.fancybox.css?r=2
http://nodejs.org/sh_main.js
I know there are some bugs within the YUICompressor using media but could this be in relation with the empty results?
I had the same problem.
In my case it stemmed from that my javascript code was not ECMA valid (we use a variable named double which is not allowed according to the ECMA rules).
I did not have the courage to check if your js is valid but trying to compress different parts of your js file can easily lead you to the problem if it exists.
Well, after a while of debugging I figured out a solution.
The problem was not the YUI Compressor it self but it was the FileWriter given to the method.
Flushing an closing the FileWriter should solve the problem with empty result files
since I only need the minified String for further processing I now use a StringWriter with closing and flushing