Any way to validate pig script before running in hdfs cluster? - apache-pig

I'm new to pig script. The problem I'm facing is inability to validate how syntactically correct my script is. I have to upload it to hdfs cluster and run there just to realize I missed ';' in the end of the line. Big waste of time. I use IntelliJ IDEA with pig script plugin, but while it helps to highlight pig statements it does not validate it. Apache Pig does not seem to have any compiler, you can only run it, but I can't run it locally, data is not available from my laptop. So I wonder it there is any sophisticated pig script syntax validator, so I can run it and check if my script syntactically correct before uploading to the server.

Related

Liquibase - Test changeset before executing

I have a pipeline Jenkins that execute liquibase scripts. However, lots of time the pipeline failed because there are errors in the script.
I would like to test my script locally before running the pipeline. I would run the script locally to detect if there are errors (syntaxe problem, column that doesn't exist, etc), without creating an entry in the databasechangelog.
One option is to run updateSQL, which will display the sql that liquibase update WOULD run. You can take that sql and run it in any SQL query IDE of your choice to test syntax.

Execute a shell command outside of a sandbox while in a sandbox

I'm using singularity to run python in an environnement deprived of python. I'm also running a mysql instance as explained by the IOWA state university (running an instance of mysql, and closing it when done).
For clarity, I'm using a bash script to open mysql, then do what i have to do (a python script) and close mysql, and it works fine. But Python's only way to stop if an error occured is sys.exit([value]) and this not only stops the python script, but also the bash script that ran it. This makes it impossible for me to manage the errors and close the instance of mysql if the python script exits.
My question is : Is there a way for me to execute a 'singularity instance stop mysql' while being in the python sandbox. Something to tell singularity "hey, this command here must be used on the host !" ?
I keep searching but can't find anything.
I only tried to execute it with subprocess like any other command, but it returned an error message because I don't have this instance inside the python sandbox. I don't even have singularity in this sandbox.
For any clarifications, just ask me, I'm trying to be clear but I'm pretty sure it's not very clear.
Thanks a lot !
Generally speaking, it would be a big security issue if a process could be initiated from inside a container (docker or singularity) but run in the host OS's namespace.
If the bash script is exiting on the python failure, it sounds like you're using set -e or #!/bin/bash -e. This causes the script to abort if any command returns non-zero. It's commonly recommended for safer processing, but can cause problems like this at times. To bypass that for the python step you can modify your script:
# start mysql, do some stuff
set +x # disable abort on non-zero return
python my_script.py
set -x # re-enable abort on non-zero
# shut down mysql, do other stuff

Gitlab-CI: AWS S3 deploy is failing

I am trying to create a deployment pipeline for Gitlab-CI on a react project. The build is working fine and I use artifacts to store the dist folder from my yarn build command. This is working fine as well.
The issue is regarding my deployment with command: aws s3 sync dist/'bucket-name'.
Expected: "Done in x seconds"
Actual:
error Command failed with exit code 2. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. Running after_script 00:01 Uploading artifacts for failed job 00:01 ERROR: Job failed: exit code 1
The files seem to have been uploaded correctly to the S3 bucket, however I do not know why I get an error on the deployment job.
When I run the aws s3 sync dist/'bucket-name' locally everything works correctly.
Check out AWS CLI Return Codes
2 -- The meaning of this return code depends on the command being run.
The primary meaning is that the command entered on the command line failed to be parsed. Parsing failures can be caused by, but are not limited to, missing any required subcommands or arguments or using any unknown commands or arguments. Note that this return code meaning is applicable to all CLI commands.
The other meaning is only applicable to s3 commands. It can mean at least one or more files marked for transfer were skipped during the transfer process. However, all other files marked for transfer were successfully transferred. Files that are skipped during the transfer process include: files that do not exist, files that are character special devices, block special device, FIFO's, or sockets, and files that the user cannot read from.
The second paragraph might explain what's happening.
There is no yarn build command. See https://classic.yarnpkg.com/en/docs/cli/run
As Anton mentioned, the second paragraph of his answer was the problem. The solution to the problem was removing special characters from a couple SVGs. I suspect uploading the dist folder as an artifact(zip) might have changed some of the file names altogether which was confusing to S3. By removing ® and + from the filename the issue was resolved.

Scheduling a pentaho job in SQL server agent

I have built out a simple FTP job in Pentaho that places a file in a local directory. I need to be able to call this job in a SQL server agent job which I can then schedule and use, but when I set the agent job up it runs through the steps successfully but does not produce anything to show that it was in fact successful.
I am pretty confident the Pentaho job itself is fine because it can be run through the UI, command line, and .bat file. Everything works as expected except when I try to make this SQL Server Agent job and I have no idea why!
Here is the only step in the job When I use this i'm prompted with no errors but nothing actually happens. If I try to enclose it in quotes I get an error.
Any help would be appreciated
Figured it out!
Apparently, only the first line of the command was executing. So it was navigating to a different directory but not executing any commands. I remedied this by putting everything on one line and adding a && to it.
Command line used: cd c:\pentaho\data-integration && kitchen.bat /file:C:\pentaho\Jobs\BW\FTP_BW_TRN.kjb /level:Basic

Hadoop put command doing nothing!

I am running Cloudera's distribution of Hadoop and everything is working perfectly.The hdfs contains a large number of .seq files.I need to merge the contents of all the .seq files into one large .seq file.However, the getmerge command did nothing for me.I then used cat and piped the data of some .seq files onto a local file.When i want to "put" this file into hdfs it does nothing.No error message shows up,and no file is created.
I am able to "touchz" files in the hdfs and user permissions are not a problem here.The put command simply does not work.What am I doing wrong?
Write a job that merges the all sequence files into a single one. It's just the standard mapper and reducer with only one reduce task.
if the "hadoop" commands fails silently you should have a look at it.
Just type: 'which hadoop', this will give you the location of the "hadoop" executable. It is a shell script, just edit it and add logging to see what's going on.
If the hadoop bash script fails at the beginning it is no surprise that the hadoop dfs -put command does not work.