mapreduce job on yarn exited with exitCode: -1000 beacuse of resource changed on src filesystem - azure-storage

Application application_1552978163044_0016 failed 5 times due to AM Container for appattempt_1552978163044_0016_000005 exited with exitCode: -1000
Diagnostics:
java.io.IOException: Resource
abfs://xxx#xxx.dfs.core.windows.net/hdp/apps/2.6.5.3006-29/mapreduce/mapreduce.tar.gz
changed on src filesystem (expected 1552949440000, was 1552978240000
Failing this attempt. Failing the application.

Just based on the exception information, it seems to be caused by Azure Storage could not keep the original timestamp of the copied file. I searched a workaround that recommended to change the source code of yarn-common to disable the code block of timestamp check when copy file to avoid the exception throws to make the MR job continous to work.
Here is the source code in the latest version of yarn-common which check the timestamp for copied file and throws the exception.
/** #L255
* Localize files.
* #param destination destination directory
* #throws IOException cannot read or write file
* #throws YarnException subcommand returned an error
*/
private void verifyAndCopy(Path destination)
throws IOException, YarnException {
final Path sCopy;
try {
sCopy = resource.getResource().toPath();
} catch (URISyntaxException e) {
throw new IOException("Invalid resource", e);
}
FileSystem sourceFs = sCopy.getFileSystem(conf);
FileStatus sStat = sourceFs.getFileStatus(sCopy);
if (sStat.getModificationTime() != resource.getTimestamp()) {
throw new IOException("Resource " + sCopy +
" changed on src filesystem (expected " + resource.getTimestamp() +
", was " + sStat.getModificationTime());
}
if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
throw new IOException("Resource " + sCopy +
" is not publicly accessible and as such cannot be part of the" +
" public cache.");
}
}
downloadAndUnpack(sCopy, destination);
}

Related

Read file buffer treats archives (zip, jar, etc) as directory and throws IOException

I have a requirement to break large files into 2MB slices for transport across a bridge. The following block of code works for all file types except archives (zip, jar, etc), for which it throws an IOException, the message of which is "Is a directory".
try (FileChannel fileChannel = FileChannel.open(fullpath); WriteStream writeStream = channel.connect().orElseThrow()) {
byte[] bytes = origFileName.getBytes();
ByteBuffer buffer = ByteBuffer.allocate(maxsize - 10);
buffer.putChar('M');
buffer.put(bytes);
buffer.flip();
writeStream.write(buffer);
while (fileChannel.position() < fileChannel.size()) {
buffer = ByteBuffer.allocate(maxsize - 10);
buffer.putChar('D');
fileChannel.read(buffer);
buffer.flip();
writeStream.write(buffer);
}
} catch (IOException e) {
LOG.error("IO exception " + e.getMessage() + ", " + origFileName, e);
}
The error in the log is:
IO exception Is a dir, file.zip
The error is thrown by the statement 'fileChannel.read(buffer)'. Any help you can give me in determining how to fix this is greatly appreciated.
It turns out an upstream process is converting the zip file into a directory. Please don't waste any more time on this question. I'm sorry I didn't better research the situation before posting.

Bad JSON: expected object value. Dropbox API file upload error

I am trying to upload file to Dropbox using dropbox java sdk
I created a app in dropbox, gave permission to write, app status is Development.
The code works till "write mode done" logger, it fails after that. fileStream is of type InputStream
DbxClientV2 client = createClient();
String path = "/" + fileName.toString();
try {
System.out.println("STARTING...............");
UploadBuilder ub = client.files().uploadBuilder(path);
System.out.println("ub done");
ub.withMode(WriteMode.ADD);
System.out.println("write mode done");
FileMetadata metadata = ub.uploadAndFinish(fileStream);
System.out.println("File uploaded");
System.out.println(metadata.toStringMultiline());
;
System.out.println("FINISHED...............");
} catch (IOException e) {
e.printStackTrace();
}
Exception I get is
com.dropbox.core.BadResponseException: Bad JSON: expected object value.
at [Source: ; line: 1, column: 1]
at com.dropbox.core.DbxRequestUtil.unexpectedStatus(DbxRequestUtil.java:347)
at com.dropbox.core.DbxRequestUtil.unexpectedStatus(DbxRequestUtil.java:324)
at com.dropbox.core.DbxUploader.finish(DbxUploader.java:268)
at com.dropbox.core.DbxUploader.uploadAndFinish(DbxUploader.java:126)
.
.
.
Caused by: com.fasterxml.jackson.core.JsonParseException: expected object value.
at [Source: ; line: 1, column: 1]
at com.dropbox.core.stone.StoneSerializer.expectStartObject(StoneSerializer.java:91)
at com.dropbox.core.ApiErrorResponse$Serializer.deserialize(ApiErrorResponse.java:53)

PLC4X:Exception during scraping of Job

I'm actually developing a project that read data from 19 PLCs Siemens S1500 and 1 modicon. I have used the scraper tool following this tutorial:
PLC4x scraper tutorial
but when the scraper is working for a little amount of time I get the following exception:
I have changed the scheduled time between 1 to 100 and I always get the same exception when the scraper reach the same number of received messages.
I have tested if using PlcDriverManager instead of PooledPlcDriverManager could be a solution but the same problem persists.
In my pom.xml I use the following dependency:
<dependency>
<groupId>org.apache.plc4x</groupId>
<artifactId>plc4j-scraper</artifactId>
<version>0.7.0</version>
</dependency>
I have tried to change the version to an older one like 0.6.0 or 0.5.0 but the problem still persists.
If I use the modicon (Modbus TCP) I also get this exception after a little amount of time.
Anyone knows why is happening this error? Thanks in advance.
Edit: With the scraper version 0.8.0-SNAPSHOT I continue having this problem.
Edit2: This is my code, I think the problem can be that in my scraper I am opening a lot of connections and when it reaches 65526 messages it fails. But since all the processing is happenning inside the lambda function and I'm using a PooledPlcDriverManager, I think the scraper is using only one connection so I dont know where is the mistake.
try {
// Create a new PooledPlcDriverManager
PlcDriverManager S7_plcDriverManager = new PooledPlcDriverManager();
// Trigger Collector
TriggerCollector S7_triggerCollector = new TriggerCollectorImpl(S7_plcDriverManager);
// Messages counter
AtomicInteger messagesCounter = new AtomicInteger();
// Configure the scraper, by binding a Scraper Configuration, a ResultHandler and a TriggerCollector together
TriggeredScraperImpl S7_scraper = new TriggeredScraperImpl(S7_scraperConfig, (jobName, sourceName, results) -> {
LinkedList<Object> S7_results = new LinkedList<>();
messagesCounter.getAndIncrement();
S7_results.add(jobName);
S7_results.add(sourceName);
S7_results.add(results);
logger.info("Array: " + String.valueOf(S7_results));
logger.info("MESSAGE number: " + messagesCounter);
// Producer topics routing
String topic = "s7" + S7_results.get(1).toString().substring(S7_results.get(1).toString().indexOf("S7_SourcePLC") + 9 , S7_results.get(1).toString().length());
String key = parseKey_S7("s7");
String value = parseValue_S7(S7_results.getLast().toString(),S7_results.get(1).toString());
logger.info("------- PARSED VALUE -------------------------------- " + value);
// Create my own Kafka Producer
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
// Send Data to Kafka - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger.info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
}, S7_triggerCollector);
S7_scraper.start();
S7_triggerCollector.start();
} catch (ScraperException e) {
logger.error("Error starting the scraper (S7_scrapper)", e);
}
So in the end indeed it was the PLC that was simply hanging up the connection randomly. However the NiFi integration should have handled this situation more gracefully. I implemented a fix for this particular error ... could you please give version 0.8.0-SNAPSHOT a try (or use 0.8.0 if we happen to have released it already)

Handling Streaming TarArchiveEntry to S3 Bucket from a .tar.gz file

I am use aws Lamda to decompress and traverse tar.gz files then uploading them back to s3 deflated retaining the original directory structure.
I am running into an issue streaming a TarArchiveEntry to a S3 bucket via a PutObjectRequest. While first entry is successfully streamed, upon trying to getNextTarEntry() on the TarArchiveInputStream a null pointer is thrown due to the underlying GunzipCompress inflator being null, which had an appropriate value prior to the s3.putObject(new PutObjectRequest(...)) call.
I have not been able to find documentation on how / why the gz input stream inflator attribute is being set to null after partially being sent to s3.
EDIT Further investigation has revealed that the AWS call appears to be closing the input stream after completing the upload of specified content length... haven't not been able to find how to prevent this behavior.
Below is essentially what my code looks like. Thank in advance for your help, comments, and suggestions.
public String handleRequest(S3Event s3Event, Context context) {
try {
S3Event.S3EventNotificationRecord s3EventRecord = s3Event.getRecords().get(0);
String s3Bucket = s3EventRecord.getS3().getBucket().getName();
// Object key may have spaces or unicode non-ASCII characters.
String srcKey = s3EventRecord.getS3().getObject().getKey();
System.out.println("Received valid request from bucket: " + bucketName + " with srckey: " + srcKeyInput);
String bucketFolder = srcKeyInput.substring(0, srcKeyInput.lastIndexOf('/') + 1);
System.out.println("File parent directory: " + bucketFolder);
final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(getObjectContent(s3Client, bucketName, srcKeyInput)));
TarArchiveEntry currentEntry = tarInput.getNextTarEntry();
while (currentEntry != null) {
String fileName = currentEntry.getName();
System.out.println("For path = " + fileName);
// checking if looking at a file (vs a directory)
if (currentEntry.isFile()) {
System.out.println("Copying " + fileName + " to " + bucketFolder + fileName + " in bucket " + bucketName);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(currentEntry.getSize());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, tarInput, metadata)); // contents are properly and successfully sent to s3
System.out.println("Done!");
}
currentEntry = tarInput.getNextTarEntry(); // NPE here due underlying gz inflator is null;
}
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(tarInput);
}
}
That's true, AWS closes an InputStream provided to PutObjectRequest, and I don't know of a way to instruct AWS not to do so.
However, you can wrap the TarArchiveInputStream with a CloseShieldInputStream from Commons IO, like that:
InputStream shieldedInput = new CloseShieldInputStream(tarInput);
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, shieldedInput, metadata));
When AWS closes the provided CloseShieldInputStream, the underlying TarArchiveInputStream will remain open.
PS. I don't know what ByteArrayInputStream(tarInput.getCurrentEntry()) does but it looks very strange. I ignored it for the purpose of this answer.

Why NoPointerExcepeion when decompression by apache compress?

click and see The NoPointerExcepeion
I generate tar.gz files and send 2 others 4 decompress, but their progrem has error above(their progrem was not created by me), only one file has that error.
But when using command 'tar -xzvf ***' on my computer and their computer, no problem occured...
So I want 2 know what was wrong in my progrem below:
public static void archive(ArrayList<File> files, File destFile) throws Exception {
TarArchiveOutputStream taos = new TarArchiveOutputStream(new FileOutputStream(destFile));
taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_POSIX);
for (File file : files) {
//LOG.info("file Name: "+file.getName());
archiveFile(file, taos, "");
}
}
private static void archiveFile(File file, TarArchiveOutputStream taos, String dir) throws Exception {
TarArchiveEntry entry = new TarArchiveEntry(dir + file.getName());
entry.setSize(file.length());
taos.putArchiveEntry(entry);
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
int count;
byte data[] = new byte[BUFFER];
while ((count = bis.read(data, 0, BUFFER)) != -1) {
taos.write(data, 0, count);
}
bis.close();
taos.closeArchiveEntry();
}
The stack trace looks like a bug in Apache Commons Compress https://issues.apache.org/jira/browse/COMPRESS-223 that has been fixed with version 1.7 (released almost three years ago).