Unloading data from Amazon redshift to Amazon s3 - sql

I am trying to use the following code to unload data into S3 bucket. Which works but after unloading it throws some error.
Properties props = new Properties();
props.setProperty("user", MasterUsername);
props.setProperty("password", MasterUserPassword);
conn = DriverManager.getConnection(dbURL, props);
stmt = conn.createStatement();
String sql;
sql = "unload('select * from part where p_partkey in (select p_partkey from
part limit 10)') to"
+ " 's3://redshiftdump.****' "
+ " DELIMITER AS ','"
+ "ADDQUOTES "
+ "NULL AS ''"
+ "credentials 'aws_access_key_id=****;aws_secret_access_key=***' "
+ "parallel off" +
";";
boolean i = stmt.execute(sql);
stmt.close();
conn.close();
The unloading works. It is creating a file in the bucket. But it is giving me some error
java.sql.SQLException:
dataengine.impl.DSISimpleRowCountResult cannot be cast to
com.amazon.dsi.dataengine.interfaces.IResultSet
at
com.amazon.redshift.core.jdbc42.PGJDBC42Statement.createResultSet(Unknown
Source)
at com.amazon.jdbc.common.SStatement.executeQuery(Unknown Source)
what is this error and how to avoid it? Is there any way to dump the table in CSV format. Right now it is dumping the file in FILE format.

You say the UNLOAD works but you receive this error, that suggests to me that you are connecting successfully but there is an problem in the way your code interacts with the JDBC driver when the query completes.
We provide an example that may be helpful in our documentation on the page "Connect to Your Cluster Programmatically"
Regarding the output file format, you will get whatever is specified in your UNLOAD SQL but the filename will have a suffix (for example "000" or "6411_part_00") to indicate which part of the UNLOAD it is.

use executeUpdate .
def runQuery(sql: String) = {
Class.forName("com.amazon.redshift.jdbc.Driver")
val connection = DriverManager.getConnection(url, username, password)
var statement: Statement = null
try {
statement = connection.createStatement()
statement.setQueryTimeout(redshiftTimeoutInSeconds)
val result = statement.executeUpdate(sql)
logger.info(s"statement response code : ${result}")
} catch {
case e: Exception => {
logger.error(s"statement.isCloseOnCompletion :${e.getMessage} ::: ${e.printStackTrace()}")
throw new IngestionException(e.getMessage)
}
}
finally {
if(statement != null ) statement.close()
connection.close()
}
}

Related

PLC4X:Exception during scraping of Job

I'm actually developing a project that read data from 19 PLCs Siemens S1500 and 1 modicon. I have used the scraper tool following this tutorial:
PLC4x scraper tutorial
but when the scraper is working for a little amount of time I get the following exception:
I have changed the scheduled time between 1 to 100 and I always get the same exception when the scraper reach the same number of received messages.
I have tested if using PlcDriverManager instead of PooledPlcDriverManager could be a solution but the same problem persists.
In my pom.xml I use the following dependency:
<dependency>
<groupId>org.apache.plc4x</groupId>
<artifactId>plc4j-scraper</artifactId>
<version>0.7.0</version>
</dependency>
I have tried to change the version to an older one like 0.6.0 or 0.5.0 but the problem still persists.
If I use the modicon (Modbus TCP) I also get this exception after a little amount of time.
Anyone knows why is happening this error? Thanks in advance.
Edit: With the scraper version 0.8.0-SNAPSHOT I continue having this problem.
Edit2: This is my code, I think the problem can be that in my scraper I am opening a lot of connections and when it reaches 65526 messages it fails. But since all the processing is happenning inside the lambda function and I'm using a PooledPlcDriverManager, I think the scraper is using only one connection so I dont know where is the mistake.
try {
// Create a new PooledPlcDriverManager
PlcDriverManager S7_plcDriverManager = new PooledPlcDriverManager();
// Trigger Collector
TriggerCollector S7_triggerCollector = new TriggerCollectorImpl(S7_plcDriverManager);
// Messages counter
AtomicInteger messagesCounter = new AtomicInteger();
// Configure the scraper, by binding a Scraper Configuration, a ResultHandler and a TriggerCollector together
TriggeredScraperImpl S7_scraper = new TriggeredScraperImpl(S7_scraperConfig, (jobName, sourceName, results) -> {
LinkedList<Object> S7_results = new LinkedList<>();
messagesCounter.getAndIncrement();
S7_results.add(jobName);
S7_results.add(sourceName);
S7_results.add(results);
logger.info("Array: " + String.valueOf(S7_results));
logger.info("MESSAGE number: " + messagesCounter);
// Producer topics routing
String topic = "s7" + S7_results.get(1).toString().substring(S7_results.get(1).toString().indexOf("S7_SourcePLC") + 9 , S7_results.get(1).toString().length());
String key = parseKey_S7("s7");
String value = parseValue_S7(S7_results.getLast().toString(),S7_results.get(1).toString());
logger.info("------- PARSED VALUE -------------------------------- " + value);
// Create my own Kafka Producer
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
// Send Data to Kafka - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger.info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
}, S7_triggerCollector);
S7_scraper.start();
S7_triggerCollector.start();
} catch (ScraperException e) {
logger.error("Error starting the scraper (S7_scrapper)", e);
}
So in the end indeed it was the PLC that was simply hanging up the connection randomly. However the NiFi integration should have handled this situation more gracefully. I implemented a fix for this particular error ... could you please give version 0.8.0-SNAPSHOT a try (or use 0.8.0 if we happen to have released it already)

mapreduce job on yarn exited with exitCode: -1000 beacuse of resource changed on src filesystem

Application application_1552978163044_0016 failed 5 times due to AM Container for appattempt_1552978163044_0016_000005 exited with exitCode: -1000
Diagnostics:
java.io.IOException: Resource
abfs://xxx#xxx.dfs.core.windows.net/hdp/apps/2.6.5.3006-29/mapreduce/mapreduce.tar.gz
changed on src filesystem (expected 1552949440000, was 1552978240000
Failing this attempt. Failing the application.
Just based on the exception information, it seems to be caused by Azure Storage could not keep the original timestamp of the copied file. I searched a workaround that recommended to change the source code of yarn-common to disable the code block of timestamp check when copy file to avoid the exception throws to make the MR job continous to work.
Here is the source code in the latest version of yarn-common which check the timestamp for copied file and throws the exception.
/** #L255
* Localize files.
* #param destination destination directory
* #throws IOException cannot read or write file
* #throws YarnException subcommand returned an error
*/
private void verifyAndCopy(Path destination)
throws IOException, YarnException {
final Path sCopy;
try {
sCopy = resource.getResource().toPath();
} catch (URISyntaxException e) {
throw new IOException("Invalid resource", e);
}
FileSystem sourceFs = sCopy.getFileSystem(conf);
FileStatus sStat = sourceFs.getFileStatus(sCopy);
if (sStat.getModificationTime() != resource.getTimestamp()) {
throw new IOException("Resource " + sCopy +
" changed on src filesystem (expected " + resource.getTimestamp() +
", was " + sStat.getModificationTime());
}
if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
throw new IOException("Resource " + sCopy +
" is not publicly accessible and as such cannot be part of the" +
" public cache.");
}
}
downloadAndUnpack(sCopy, destination);
}

Handling Streaming TarArchiveEntry to S3 Bucket from a .tar.gz file

I am use aws Lamda to decompress and traverse tar.gz files then uploading them back to s3 deflated retaining the original directory structure.
I am running into an issue streaming a TarArchiveEntry to a S3 bucket via a PutObjectRequest. While first entry is successfully streamed, upon trying to getNextTarEntry() on the TarArchiveInputStream a null pointer is thrown due to the underlying GunzipCompress inflator being null, which had an appropriate value prior to the s3.putObject(new PutObjectRequest(...)) call.
I have not been able to find documentation on how / why the gz input stream inflator attribute is being set to null after partially being sent to s3.
EDIT Further investigation has revealed that the AWS call appears to be closing the input stream after completing the upload of specified content length... haven't not been able to find how to prevent this behavior.
Below is essentially what my code looks like. Thank in advance for your help, comments, and suggestions.
public String handleRequest(S3Event s3Event, Context context) {
try {
S3Event.S3EventNotificationRecord s3EventRecord = s3Event.getRecords().get(0);
String s3Bucket = s3EventRecord.getS3().getBucket().getName();
// Object key may have spaces or unicode non-ASCII characters.
String srcKey = s3EventRecord.getS3().getObject().getKey();
System.out.println("Received valid request from bucket: " + bucketName + " with srckey: " + srcKeyInput);
String bucketFolder = srcKeyInput.substring(0, srcKeyInput.lastIndexOf('/') + 1);
System.out.println("File parent directory: " + bucketFolder);
final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(getObjectContent(s3Client, bucketName, srcKeyInput)));
TarArchiveEntry currentEntry = tarInput.getNextTarEntry();
while (currentEntry != null) {
String fileName = currentEntry.getName();
System.out.println("For path = " + fileName);
// checking if looking at a file (vs a directory)
if (currentEntry.isFile()) {
System.out.println("Copying " + fileName + " to " + bucketFolder + fileName + " in bucket " + bucketName);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(currentEntry.getSize());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, tarInput, metadata)); // contents are properly and successfully sent to s3
System.out.println("Done!");
}
currentEntry = tarInput.getNextTarEntry(); // NPE here due underlying gz inflator is null;
}
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(tarInput);
}
}
That's true, AWS closes an InputStream provided to PutObjectRequest, and I don't know of a way to instruct AWS not to do so.
However, you can wrap the TarArchiveInputStream with a CloseShieldInputStream from Commons IO, like that:
InputStream shieldedInput = new CloseShieldInputStream(tarInput);
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, shieldedInput, metadata));
When AWS closes the provided CloseShieldInputStream, the underlying TarArchiveInputStream will remain open.
PS. I don't know what ByteArrayInputStream(tarInput.getCurrentEntry()) does but it looks very strange. I ignored it for the purpose of this answer.

Is it possible to run commands in Katalon?

Katalon is popular in automation testing. I have already used it in our project and it works amazingly.
Now, What I want to achieve is to create a test case where it opens a terminal (using mac) and type in some commands to run it like for example:
cd /documents/pem/key.pem
connect to -my server via SSH#method
sudo su
yum install php7
yum install mysql
You are not alone, and with custom keywords you can achieve what you want. Here is an example showing a test of a command line app. You could do the same thing to call any command line script you wish. Think of a runCmd keyword, or a runCmdWithOutput to grab the output and run various asserts on it.
#Keyword
def pdfMetadata(String input) {
KeywordUtil.logInfo("input: ${input}")
def csaHome = System.getenv("CSA_HOME")
def cmd = "cmd /c ${csaHome}/bin/csa -pdfmetadata -in \"${projectPath}${input}\"";
runCmd(cmd)
}
def runCmd(String cmd) {
KeywordUtil.logInfo("cmd: ${cmd}")
def proc = cmd.execute();
def outputStream = new StringBuffer();
def errStream = new StringBuffer()
proc.waitForProcessOutput(outputStream, errStream);
println(outputStream.toString());
println(errStream.toString())
if(proc.exitValue() != 0){
KeywordUtil.markFailed("Out:" + outputStream.toString() + ", Err: " + errStream.toString())
}
}
You can then use this in a test case:
CustomKeywords.'CSA.pdfMetadata'('/src/pdf/empty.pdf')
Here is another custom keyword! It is takes the file name and path, and if you don't give it a path, it search for the file in the project root directory. It export the batch file's output in a batch_reports folder in your project folder, you need to create that in advance.
#Keyword
def runPostmanBatch(String batchName , String batchPath){
// source: https://www.mkyong.com/java/how-to-execute-shell-command-from-java/
String firstParameter = "cmd /c " + batchName;
String secondParameter = batchPath;
if (batchPath == ""){
secondParameter = RunConfiguration.getProjectDir();
}
try {
KeywordUtil.logInfo("Executing " + firstParameter + " at " + secondParameter)
Process process = Runtime.getRuntime().exec(
firstParameter , null, new File(secondParameter));
StringBuilder output = new StringBuilder();
BufferedReader reader = new BufferedReader(
new InputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
output.append(line + "\n");
}
int exitVal = process.waitFor();
Date atnow = new Date()
String now = atnow.format('yy-MM-dd HH-mm-ss')
String report_path = RunConfiguration.getProjectDir() + "/postman_reports/" + RunConfiguration.getExecutionSourceName() + "_" + now + ".txt"
BufferedWriter writer = new BufferedWriter(new FileWriter(report_path));
writer.write(output.toString());
writer.close();
KeywordUtil.logInfo("postman report at: " + report_path)
if (exitVal == 0) {
println("Success!");
println(output);
KeywordUtil.markPassed("Ran successfully")
} else {
KeywordUtil.markFailed("Something went wrong")
println(exitVal);
}
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
I've done some research. I did not found any resources or any people that is looking for the same thing I am. I think this is officially, No. The answer to this is, it is not possible.
It is possible to run Katalon Studio from the command line.
There's a short tutorial here.
And it will be possible to override Profile Variables via command line execution mode from v5.10 (currently in beta).
An example given on Katalon forum is:
Simply pass the parameters in command line using: -g_XXX = XXX
Below is an example of override an URL variable:
-g_URL=http://demoaut.katalon.com

Detail Hudson test reports

Is there any way to force Hudson to give me more detailed test results - e.g. I'm comparing two strings and I want to know where they differ.
Is there any way to do this?
Thank you for help.
You should not hope Hudson give the detail information, it just shows the testing messages generated by junit.
You could show the expected string and actual string when failing asserting equals between those two strings.
For example,
protected void compareFiles(File newFile, String referenceLocation, boolean lineNumberMatters) {
BufferedReader reader = null;
BufferedReader referenceReader = null;
List<String> expectedLines = new ArrayList<String>();
try {
referenceReader = new BufferedReader(new InputStreamReader(FileLocator.openStream(Activator.getDefault().getBundle(), new Path("data/regression/" + referenceLocation), false))); //$NON-NLS-1$
expectedLines = getLinesFromReader(referenceReader);
} catch (Exception e) {
assertFalse("Exception occured during reading reference data: " + e, true); //$NON-NLS-1$
}
List<String>foundLines = new ArrayList<String>();
try {
reader = new BufferedReader(new FileReader(newFile));
foundLines = getLinesFromReader(reader);
} catch (Exception e) {
assertFalse("Exception occured during reading file: " + e, true); //$NON-NLS-1$
}
boolean throwException = expectedLines.size() != foundLines.size();
if (throwException) {
StringBuffer buffer = new StringBuffer("\n" + newFile.toString()); //$NON-NLS-1$
for (String line: foundLines)
buffer.append(line + "\n"); //$NON-NLS-1$
assertEquals("The number of lines in the reference(" + referenceLocation + ") and new output(" + newFile.getAbsolutePath()+ ") did not match!" + buffer, expectedLines.size(), foundLines.size()); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$
}
if (!lineNumberMatters) {
Collections.sort(expectedLines);
Collections.sort(foundLines);
}
/** Either the line matches character by character or it matches regex-wise, in that order */
for (int i=0;i<expectedLines.size(); i++)
assertTrue("Found errors in file (" + newFile + ")! " + foundLines.get(i) + " vs. " + expectedLines.get(i), foundLines.get(i).equals(expectedLines.get(i)) || foundLines.get(i).matches(expectedLines.get(i))); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$
}
Hudson supports JUnit directly. On your job configuration page, near the end, should be an option to "Publish JUnit test results report".
I'm not too familiar with JUnit itself, but I guess it produces (or has the ability to produce) and put results in an xml file. You just need to put the path of to the xml file (relative to the workspace) in the text box.
Once you do that, and create a build, you'll have a detailed report on your project page. You should then be able to click your way through the results for each test.