Need help on Pig - apache-pig

I am executing a Pig script, which reads files from a directory, performs some operation and stores to some output directory. In output directory I'm getting one or more "part" files, one _SUCCESS file and one _logs directory. My questions are:
Is there any way to control the name of files generated (upon execution of STORE command) in output directory. To be specific, I don't want the names to be "part-.......". I want Pig to generate files according to the file name pattern I specify.
Is there any way to suppress the _SUCCESS file and the _log directory? Basically I don't want the _SUCCESS and _logs to be generated in the output directory.
Regards
Biswajit

See this post.
To remove _SUCCESS, use SET mapreduce.fileoutputcommitter.marksuccessfuljobs false;. I'm not 100% sure how to remove _logs but you could try SET pig.streaming.log.persist false;.

Related

How to access two different routines in two files in Trace32 CMM scripts

I have two files in two different floder locations in Trace32. I execute cd.do file_name subroutine_name in Trace32. The trace32 takes the location of first command executed as the folder from which the following commands needs to be executed. How can I execute the routines from two different folders.
There is a pretty good guide here on how to script in Trace32.
http://www2.lauterbach.com/pdf/practice_user.pdf
I do not understand why you need to have them in two different folders, shouldn't it be solved by just have it in the same folder?
Well, maybe you should simply use DO <myscript.cmm> instead of CD.DO <myscript.cmm>.
DO <myscript.cmm> executes the script at the given location but keeps the current working path.
CD.DO <myscript.cmm> changes the working path to the location of the given script and then executes the script.
However I would recommend to write your scripts in a way that it doesn't matter if they are called with CD.DO or just DO. You can achieve that with either absolute paths or with paths relative to the script locations. (I prefer the 2nd one.)
So imagine the following file structure:
C:\t32\myscripts\start.cmm
C:\t32\myscripts\folder1\routines.cmm
C:\t32\myscripts\folder2\loadapp.cmm
C:\t32\myscripts\folder2\application.elf
You can cope this structure with absolute paths like that:
start.cmm:
DO "C:/t32/myscripts/folder1/routines.cmm" subroutine_A
DO "C:/t32/myscripts/folder2/loadapp.cmm"
folder2/loadapp.cmm:
Data.LOAD.Elf "C:/t32/myscripts/folder2/application.elf"
DO "C:/t32/myscripts/folder1/routines.cmm" subroutine_B
With relative paths you could use the prefix "~~~~" before accessing other files relative from the location of the currently executed PRACTICE script. The "~~~~" is replaced with the path of the currently executed script (just like "~" stands for your home directory.) There is also a function OS.PPD() which gives you the directory of the currently executed PRACTICE script.
So above situation with relative paths look like that:
start.cmm:
DO "~~~~/folder1/routines.cmm subroutine_A"
DO "~~~~/folder2/loadapp.cmm"
folder2/loadapp.cmm:
Data.LOAD.Elf "~~~~/application.elf"
DO "~~~~/../folder1/routines.cmm" subroutine_B

BigQuery loading batch folders error

I'm trying to load group of folders files in one time with when
i set
sourceURI = 'gs://ybbi/bi_landing_zone/files_to_load/app/reports/app_network_analytics_report/201409011*'
all the folders that i'm want to load start with 20140911
but i get the error:
ERROR: Invalid path: gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/
20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0 is one of these folders(don't know why it's print the all folder name of this specific folder)
in some other folder tree cases it's works, but in this specific folder tree it's return the same error .
i know that cloud storage don't have real folders and it's part of the name of the object, but you understand what i mean.
is it bug?
Without more information, what it looks like is that you have a object file called gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/ that is not a csv/json file. Some tools may create these dummy files in order to simulate directories. BigQuery requires all objects that match the input glob path to be importable files.
One solution would be to change the glob path to include a narrower set of files. You can pass multiple paths if that makes things easier. For example, you could pass
gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/*
and
gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_some_other_path/*

load script from other file extension?

is it possible to load module from file with extension other than .lua?
require("grid.txt") results in:
module 'grid.txt' not found:
no field package.preload['grid.txt']
no file './grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt/init.lua'
no file '/usr/local/lib/lua/5.1/grid/txt.lua'
no file '/usr/local/lib/lua/5.1/grid/txt/init.lua'
no file './grid/txt.so'
no file '/usr/local/lib/lua/5.1/grid/txt.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
no file './grid.so'
no file '/usr/local/lib/lua/5.1/grid.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
I suspect that it's somehow possible to load the script into package.preaload['grid.txt'] (whatever that is) before calling require?
It depends on what you mean by load.
If you want to execute the code in a file named grid.txt in the current directory, then just do dofile"grid.txt". If grid.txt is in a different directory, give a path to it.
If you want to use the path search that require performs, then add a template for .txt in package.path, with the correct path and then do require"grid". Note the absence of suffix: require loads modules identified by names, not by paths.
If you want require("grid.txt") to work should someone try that then yes, you'll need to manually loadfile and run the script and put whatever it returns (or whatever require is documented to return when the module doesn't return anything) into package.loaded["grid.txt"].
Alternatively, you could write your own loader just for entries like this which you set into package.preload["grid.txt"] which finds and loads/runs the file or, more generically, you could write yourself a loader function, insert it into package.loaders, and then let it do its job whenever it sees a "*.txt" module come its way.

RAR a folder without persisting the full path

1) I have a folder called CCBuilds containing couple of files in this path: E:\Testing\Builds\CCBuilds.
2) I have written C# code (Process.Start) to Rar this folder and save it in E:\Testing\Builds\CCBuilds.rar using the following command
"C:\program files\winrar\rar.exe a E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds"
3) The problem is that, though the rar file gets created properly, when I unrar the file to CCBuilds2 folder (both through code using rar.exe x command or using Extract in context menu), the unrared folder contains the full path, ie. extracting E:\Testing\Builds\CCBuilds.rar ->
E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\<<my files>>
Whereas I want it to be something like this: E:\Testing\Builds\CCBuilds2\CCBuilds\<<my files>>
How can I avoid this full path persistence while adding to rar / extracting back from it. Any help is appreciated.
Use the -ep1 switch.
More info:
-ep = Files are added to an archive without including the path information. Could result in multiple files existing in the archive
with same name.
-ep1 = Do not store the path entered at the command line in archive. Exclude base folder from names.
-ep2 = Expand paths to full. Store full file paths (except drive letter and leading backslash) when archiving.
(source: http://www.qa.downappz.com/questions/winrar-command-line-to-add-files-with-relative-path-only.html)
Just in case this helps: I am currently working on an MS Access Database project (customer relations management for a small company), and one of the tasks there is to zip docx-files to be sent to customers, with a certain password encryption used.
In the VBA procedure that triggers the zip-packaging of the docx-files, I call WinRAR as follows:
c:\Programme\WinRAR\winrar.exe a -afzip -ep -pThisIsThePassword "OutputFullName" "InputFullName"
-afzip says: "Create a zip file (as opposed to a rar file)
-ep says: Do not include the paths of the source file, i.e. put the file directly into the zip folder
A full list of such switches is available in the WinRAR Help, section "Command line".
x extracts it as E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\, because you're using full path when declaring the source. Either use -ep1 or set the default working dir to E:\Testing\Builds.
Use of -ep1 is needed but it's a bit tricky.
If you use:
Winrar.exe a output.rar inputpath
Winrar.exe a E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds
it will include the input path declared:
E:\Testing\Builds\CCBuilds -> E:\Testing\Builds\CCBuilds.rar:
Testing\Builds\CCBuilds\file1
Testing\Builds\CCBuilds\file2
Testing\Builds\CCBuilds\folder1\file3
...
which will end up unpacked as you've mentioned:
E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\
There are two ways of using -ep1.
If you want the simple path:
E:\Testing\Builds\CCBuilds\
to be extracted as:
E:\Testing\Builds\CCBuilds2\CCBuilds\file1
E:\Testing\Builds\CCBuilds2\CCBuilds\file2
E:\Testing\Builds\CCBuilds2\CCBuilds\path1\file3
...
use
Winrar.exe a -ep1 E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds
the files inside the archive will look like:
CCBuilds\file1
CCBuilds\file2
CCBuilds\folder1\file3
...
or you could use ep1 to just add the files and folder structure sans the base folder with the help of recursion and defining the base path as the inner path of the structure:
Winrar.exe a -ep1 -r E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds\*
The files:
E:\Testing\Builds\CCBuilds\file1
E:\Testing\Builds\CCBuilds\file2
E:\Testing\Builds\CCBuilds\folder1\file3
...
inside the archive will look like:
file1
file2
folder1\file3
...
when extracted will look like:
E:\Testing\Builds\CCBuilds2\file1
E:\Testing\Builds\CCBuilds2\file2
E:\Testing\Builds\CCBuilds2\folder1\file3
...
Anyway, these are two ways -ep1 can be used to exclude base path with or without the folder containing the files (the base folder / or base path).

How to find any "txt" file at particular location in system?

I have a robot to find a file of the given name at a particular location in a system but now I want to find all the text files at that particular location. I have tried to use "*.txt", but it didn't worked out. Is there a way to do that?
file.exists ♥environment⟦USERPROFILE⟧\Documents\t.txt errormessage ‴Sorry, I could not find a file‴
dialog ‴File exists‴
You can use the directory command. The pattern arguments allows you to filter out files of a particular extension.
directory path ♥environment⟦USERPROFILE⟧\Desktop pattern *.txt result ♥files
dialog ♥files⟦count⟧
The above code should let you know how many files of the given extension exist in the given directory.
You could take values from the returned list and use it with file.exists command.