executing pig script file with multiple stores in embedded pig program - apache-pig

I want to execute a pig script file in embedded pig program which is shown below
----testPig.pig-----
A = load '/user/biadmin/student' using PigStorage() as (name:chararray);
B = foreach A generate name;
store B into '/user/biadmin/myoutput001';
for this I have written code as shown below
> PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
> pigServer.registerScript("testPig.pig");
but it is not working.I have checked this in grunt-shell mode. there it is working fine.
So I made changes like this
---testPig.pig -----
A = load '/user/biadmin/student' using PigStorage() as (name:chararray);
B = foreach A generate name;
--store B into '/user/biadmin/myoutput001';
Embedded pig code for this is
> PigServer pigServer = new PigServer(ExecType.MAPREDUCE,prt);
> pigServer.registerScript(path);
> pigServer.store("B","/user/biadmin/myoutput20");
Now the modified code is working fine.
So now my doubt is
why I was not able to execute pig script which is having store command?
How can I execute pig script file which is having store command?

Your PigServer code is not working because; when you call .registerScript(), by default, PigServer sets the interactive mode flag on GruntParser to false. From the PigServer source code:
public void registerScript(InputStream in, Map<String,String> params,List<String> paramsFiles) throws IOException {
try {
String substituted = doParamSubstitution(in, params, paramsFiles);
GruntParser grunt = new GruntParser(new StringReader(substituted));
/********************************************/
grunt.setInteractive(false);
/********************************************/
grunt.setParams(this);
grunt.parseStopOnError(true);
} catch (org.apache.pig.tools.pigscript.parser.ParseException e) {
log.error(e.getLocalizedMessage());
throw new IOException(e.getCause());
}
}
Quoting from the GruntParser source code:
In interactive mode, executes the plan right away whenever a STORE command is encountered.
This means that when interactive mode is not active, STOREcommands will be ignored (that is they won't run automatically) until further PigServer.openIterator or PigServer.store calls (that is you explicitly make a call requiring the STORE line).
As for your second question, you might want to have a look at PigRunner class.

Related

LINQPad Dump using Util.Run

I have a LINQ script that runs in a loop for a long time, over many records, and dumps the records it processed at the end of each iteration. So I see a dump of records as the script is running.
foreach (string record in lotsOfRecords) {
// do stuff ...
record.Dump()
}
When I run this from another script using Util.Compile, Util.run, I don't see the output from the results window as the application is running. But I can tell the script is running by seeing the database changes it is causing.
using (var query = Util.Compile (myQueryPath))
{
query.Run (QueryResultFormat.HtmlFragment).AsString().Dump(); // no results
}
I've tried Text, HTML and HTML fragment and I don't see any output while the script is running. I'm running LINQPad 5.43 (Pro)

Plugin that runs tests based on file of user

I am developing a Plugin for IntelliJ for teaching purposes, where students write some code and the teacher can write tests and the students can run those tests and see if they are doing it all correctly. It would be great if I would get the file the user is writing in as a java class so that I can run the functions of that class from within another function and test it as if I would have written it.
What I have as of now:
In the Main Toolbar I have a button, where the students should be able to run the tests. I have a class that extends AnAction, now I have no Idea what I should write in it:
#Override
public void actionPerformed(AnActionEvent e) {
}
I have been going through the IntelliJ documentation for some time now and as by now I do not get any further. I sure hope that the experienced developers that can be found here can manybe give me a hint or two.
Thanks a lot in advance :)
If I understand correctly, the students would be programming within a project within IntelliJ?
Then you can get the path to the project that they are working on using the AnActionEvent event.
Project project = event.getProject();
String projectBasePath = project.getBasePath();
You could use this to send the entire src folder to your computer and do what it is that you need to do there?
But, it also sounds like you would want the students to run the test functions on their side via the plugin. In that case, one option that I know of is to again use the project.getBasePath(), or get them to select a file using a GUI, and then use ProcessBuilder to compile, run, test, etc their Java classes. You can run any Windows / shell command this way and pipe the output into the IDE, or your own tool window.
public void actionPerformed(AnActionEvent event) {
Project project = event.getProject();
String projectBasePath = project.getBasePath();
ProcessBuilder pb = new ProcessBuilder();
pb.directory(projectBasepath);
pb.command("cmd", "/k", "javac src\*.java")
pb.redirectErrorStream(true);
Process process = pb.start();
BufferedReader reader = new BufferedReader(newInputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
int exitCode = process.waitFor();
System.out.println("\nExited with error code : " + exitCode);
... // anything else you need to do
}
Let me know if this makes sense - maybe I can help you out more if you give me more specific questions.

Query Jenkins for last few builds

How can query Jenkins using API to know the last few builds that it executed. I don't know the name of build jobs. I just want Jenkins to return the last n builds it executed or the builds executed between 2 timestamps
In order to query build results via API , you have to know the job name in Jenkins. You have to append your jenkins job URL with the this suffix /api/json to get the JSON data String.
For ex: If your Jenkins server has a job named A_SLAVE_JOB , then you have to do a HTTP GET in your java rest client at this end point : http://<YourJenkinsURL>:<PortNumber>/job/A_SLAVE_JOB/api/json
This shall return a String with all the build history URL (with numbers), Last successful build and last failed build status.
You can traverse the subsequent build of a given job using a for loop. All you need is a JSON parser for extracting values from keys in JSON string. You can use org.json library in java to do the parsing. A pseudocode sample goes like this :
import org.json.*;
class myJenkinsJobParser {
public static void main(String... args){
JSONObject obj = new JSONObject("YOUR_API_RESPONSE_STRING");
String pageName = obj.getJSONObject("build").getString("status");
JSONArray arr = obj.getJSONArray("builds");
for (int i = 0; i < arr.length(); i++)
{
String url = arr.getJSONObject(i).getString("url");
// just a psuedoCode......
}
}
}
Just to summarize. The way you have to request info. Below number 0, 5 it's range of latest 5 builds.
curl -g "${SERVER}/job/${JOB}/api/json?pretty=true&tree=builds[number,url,result]{0,5}" \
--user $USER:$TOKEN

How to return Mongodb js return value to shell script

I m getting started with MongoDb scripting. My requirement is to query the mongodb for a process status and based on it kick start another process, in shell script. I ve written the following js to query and return the value from mongodb :
var statusValue=db.Collections.find({"Name":"UV"},{Status:1,_id:0}).sort({Sequence:-1}).limit(1).map( function(u) { return u.Status; } );
print (statusValue);
I call this js from a shell script. Is there a way to return the value of 'statusValue' to the calling shell?
Use "--eval" option to get return value. For example:
return=`mongo localhost/test --quiet --eval 'db.version()'`
Replace "db.version()" with your own expression.
Refer to the offical document.

How do I dynamically trigger downstream builds in jenkins?

We want to dynamically trigger integration tests in different downstream builds in jenkins. We have a parametrized integration test project that takes a test name as a parameter. We dynamically determine our test names from the git repo.
We have a parent project that uses jenkins-cli to start a build of the integration project for each test found in the source code. The parent project and integration project are related via matching fingerprints.
The problem with this approach is that the aggregate test results doesn't work. I think the problem is that the "downstream" integration tests are started via jenkins-cli, so jenkins doesn't realize they are downstream.
I've looked at many jenkins plugins to try to get this working. The Join and Parameterized Trigger plugins don't help because they expect a static list of projects to build. The parameter factories available for Parameterized Trigger won't work either because there's no factory to create an arbitrary list of parameters. The Log Trigger plugin won't work.
The Groovy Postbuild Plugin looks like it should work, but I couldn't figure out how to trigger a build from it.
def job = hudson.model.Hudson.instance.getJob("job")
def params = new StringParameterValue('PARAMTEST', "somestring")
def paramsAction = new ParametersAction(params)
def cause = new hudson.model.Cause.UpstreamCause(currentBuild)
def causeAction = new hudson.model.CauseAction(cause)
hudson.model.Hudson.instance.queue.schedule(job, 0, causeAction, paramsAction)
This is what finally worked for me.
NOTE: The Pipeline Plugin should render this question moot, but I haven't had a chance to update our infrastructure.
To start a downstream job without parameters:
job = manager.hudson.getItem(name)
cause = new hudson.model.Cause.UpstreamCause(manager.build)
causeAction = new hudson.model.CauseAction(cause)
manager.hudson.queue.schedule(job, 0, causeAction)
To start a downstream job with parameters, you have to add a ParametersAction. Suppose Job1 has parameters A and C which default to "B" and "D" respectively. I.e.:
A == "B"
C == "D"
Suppose Job2 has the same A and B parameters, but also takes parameter E which defaults to "F". The following post build script in Job1 will copy its A and C parameters and set parameter E to the concatenation of A's and C's values:
params = []
val = ''
manager.build.properties.actions.each {
if (it instanceof hudson.model.ParametersAction) {
it.parameters.each {
value = it.createVariableResolver(manager.build).resolve(it.name)
params += it
val += value
}
}
}
params += new hudson.model.StringParameterValue('E', val)
paramsAction = new hudson.model.ParametersAction(params)
jobName = 'Job2'
job = manager.hudson.getItem(jobName)
cause = new hudson.model.Cause.UpstreamCause(manager.build)
causeAction = new hudson.model.CauseAction(cause)
def waitingItem = manager.hudson.queue.schedule(job, 0, causeAction, paramsAction)
def childFuture = waitingItem.getFuture()
def childBuild = childFuture.get()
hudson.plugins.parameterizedtrigger.BuildInfoExporterAction.addBuildInfoExporterAction(
manager.build, childProjectName, childBuild.number, childBuild.result
)
You have to add $JENKINS_HOME/plugins/parameterized-trigger/WEB-INF/classes to the Groovy Postbuild plugin's Additional groovy classpath.
Execute this Groovy script
import hudson.model.*
import jenkins.model.*
def build = Thread.currentThread().executable
def jobPattern = "PUTHEREYOURJOBNAME"
def matchedJobs = Jenkins.instance.items.findAll { job ->
job.name =~ /$jobPattern/
}
matchedJobs.each { job ->
println "Scheduling job name is: ${job.name}"
job.scheduleBuild(1, new Cause.UpstreamCause(build), new ParametersAction([ new StringParameterValue("PROPERTY1", "PROPERTY1VALUE"),new StringParameterValue("PROPERTY2", "PROPERTY2VALUE")]))
}
If you don't need to pass in properties from one build to the other just take the ParametersAction out.
The build you scheduled will have the same "cause" as your initial build. That's a nice way to pass in the "Changes". If you don't need this just do not use new Cause.UpstreamCause(build) in the function call
Since you are already starting the downstream jobs dynamically, how about you wait until they done and copy the test result files (I would archive them on the downstream jobs and then just download the 'build' artifacts) to the parent workspace. You might need to aggregate the files manually, depending if the Test plugin can work with several test result pages. In the post build step of the parent jobs configure the appropriate test plugin.
Using the Groovy Postbuild Plugin, maybe something like this will work (haven't tried it)
def job = hudson.getItem(jobname)
hudson.queue.schedule(job)
I am actually surprised that if you fingerprint both jobs (e.g. with the BUILD_TAG variable of the parent job) the aggregated results are not picked up. In my understanding Jenkins simply looks at md5sums to relate jobs (Aggregate downstream test results and triggering via the cli should not affect aggregating results. Somehow, there is something additional going on to maintain the upstream/downstream relation that I am not aware of...
This worked for me using "Execute system groovy
script"
import hudson.model.*
def currentBuild = Thread.currentThread().executable
def job = hudson.model.Hudson.instance.getJob("jobname")
def params = new StringParameterValue('paramname', "somestring")
def paramsAction = new ParametersAction(params)
def cause = new hudson.model.Cause.UpstreamCause(currentBuild)
def causeAction = new hudson.model.CauseAction(cause)
hudson.model.Hudson.instance.queue.schedule(job, 0, causeAction, paramsAction)