PigServer or PigRunner? Which is better? - apache-pig

I have written embedded pig program using PigServer class.But I come to know that we can also execute queries using PigRunner class.
Can anyone tell which one is better? please Explain the reason as well.

PigRunner essentially presents the same interface as the command line program "pig" with the advantage that it can be called without going to the system shell and that it returns a PigStats objects. It is therefore convenient for running complete user supplied scripts.
PigServer however allows on-the-fly creation and registration of queries, and then programmatic iteration over the results. It therefore provides a much more flexible and complete interface to PIG.

Related

How to quickly analyse the impact of a program change?

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.
Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).
Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.
Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

Accessing RPG / .PGM command language (CL) syntax within qshell

This question pertains to the i5/OS (mainframe, as400, etc.) and its programs that are written in CL. How does one access the parameters within a given .PGM extension file (RPGLE) outside the application itself and without admin privs?
I have tried all available and related functions within the as400 app itself. None of them work. At least not with my user privs.
Overall, this mainframe app utilizes hundreds of libraries within libraries. Add thousands of files to that and you have a daunting task of accessing relevant data outside of the limited menu functions / programs(.PGM file extensions) for which you are granted access.
Frankly, the programs just suck. I really like using a SSH connection and qshell to execute SQL statements to bring up tables within the libraries. I can customize what tables are accessed and how the information is fetched.
This particular program acts as a basic query that accesses data from several tables, presents it in a new table that and allows for realtime filtering based on a limited amount of criteria.
I have tried the CAT command as below:
$ cat someprogam (and several variations with and without extensions, I forget)
But these just give me error codes
I have also tried executing SQL statements at the db2 prompt, but I get SQL STATE return codes (this isn't a .FILE extension so yes it makes sense that it wouldn't work the same way as selecting tables would).
I'd prefer accessing the data within qshell. Perhaps I am doing something wrong? Any help is greatly appreciated. Thank you
Qshell is an alternative operating environment to what I'll call native for lack of a better name. Qshell is similar to AIX but not 100% identical. Qshell does not map the output of native programs to stdout, so you can't access the output of a native program via pipes.
If a native program writes to a display, the only reasonable way to redirect that output is to screen scrape it, which can be done by writing your own tn5250 emulator, intercepting the screen output and putting it where you want it to go, including stdout. This is not particularly simple if you're not comfortable with the 5250 communications protocol.
You have two vaguely practical options. 1) Write many SQL stored procedures which, coupled together, replicate the business logic embedded within the programs you want to execute. 2) Ask the midrange developers to refactor out the business logic into a service program (they should understand this) and write a stored procedure for you to use. This way, both the midrange programs and external consumers (you) can reuse the same business logic. And when (not if) that logic changes, it'll change in one place, and all consumers will see the new rules.

How to save job logs into datasets using REXX

I just got a task to check hundreds of jobs in SDSF.ST and save these job logs into specified datasets. I think I should automate this task using REXX but actually not familiar with the language. Having searched on Google, I still can't find a workable solution.
Anyone has any experience on this?
Take a look at the Rexx interface to SDSF. As this is a language with which you are unfamiliar, you will likely find use for the programming reference and user's guide. The Rexx Language Association has some links that may be helpful in getting up to speed, but Rexx was designed to be easy to learn and use so you shouldn't have too much trouble.
You could also use the XDC line command if that's quicker than writing a program.
Another possibility is the observation that SDSF is really scanning/summarizing JES2 datasets. So, is the requirement to store sysout datasets created by batch jobs? if so, then it is possible to code the batch job so that it's sysout is stored.
if the requirement is to store the joblog itself, then get a JES2 manual and read-up on how JES manages joblogs (it has been awhile, so I have forgotten this info.). After learning how JES2 manages its job logs, then there might be an obvious solution to saving them.
The above might have some advantages. For example, what if a new job is added to the system that is needed to be saved? Do you update your REXX code or a dataset of jobnames, or is the JCL for the job itself coded in a way to save the required sysout/joblog?
This makes a difference. If someone has to update a list everytime a new production job is added, then this is time consuming and error prone. If the JCL has to be coded in a specific way then it only has to be documented and it is easy for managers to say "code production jobs in a such-and-so fashion.

MiniProfiler Ruby: Getting a better breakdown for non-SQL calls

I'm trying to profile some of our Rails controllers using Mini Profiler, but I think I'm trying to use it for something it isn't built for. I've got it profiling SQL queries just fine, but I need to break down the non-SQL code, because we're seeing a lot of chug in some pages, but the SQL doesn't seem to be the problem.
Here's a screenshot of what I'm talking about: http://cl.ly/image/2J3i1C1c072O
You can see that the top level (Executing action: show) takes 9136ms to complete, but the queries executed are only a fraction of that total time. I suppose what I'm asking is if there's a way to display more "detailed" information about the code being executed, or if I need to find a different tool to use. New Relic isn't an option, unfortunately.
Thanks for any help.
You can insert custom steps in the areas you think are responsible.
# in your initializer
Rack::MiniProfiler.profile_method SomeClass, "method"
# or
Rack::MiniProfiler.step "some step" do
# your code
end
Additionally you can run ruby-prof to figure out what is going on everywhere and then strategically instrument.

DRY for JMeter tests

Is there a way to modularize JMeter tests.
I have recorded several use cases for our application. Each of them is in a separate thread group in the same test plan. To control the workflow I wrote some primitives (e.g. postprocessor elements) that are used in many of these thread groups.
Is there a way not to copy these elements into each thread group but to use some kind of referencing within the same test plan? What would also be helpful is a way to reference elements from a different file.
Does anybody have any solutions or workarounds. I guess I am not the only one trying to follow the DRY principle...
I think this post from Atlassian describes what you're after using Module controllers. I've not tried it myself yet, but have it on my list of things to do :)
http://blogs.atlassian.com/developer/2008/10/performance_testing_with_jmete.html
Jared
You can't do this with JMeter. The UI doesn't support it. The Workbench would be a perfect place to store those common elements but it's not saved in JMX.
However, you can parameterize just about anything so you can achieve similar effects. For example, we use the same regex post processor in several thread groups. Even though we can't share the processor, the whole expression is a parameter defined in the test plan, which is shared. We only need to change one place when the regex changes.
They are talking about saving Workbench in a future version of Jmeter. Once that's done, it's trivial to add some UI to refer to the element in Workbench.
Module controllers are useful for executing the same samples in different thread groups.
It is possible to use the same assertions in multiple thread groups very easily.
At your Test Plan level, create a set of User Defined variables with names like "Expected_Result_x". Then, in your response assertion, simply reference the variable name ${Expected_Result_x}. You would still need to add the assertion manually to every page you want a particular assertion on, but now you only have to change it one place if the assertion changes.