Can i use multistorage and CSVExcelStorage at the same time (or close) in pig? - apache-pig

I'm looking a way to store a relation into splitted folders in a CSV format.
I'm launching the pig from a shell.
I looked on stack but I don't found anything speaking about this case.
I'm using the piggybank 0.14 and the java of the last multistorage to use the multifield selection.
If I use the CSVExcelStorage to store the relation, I can cut the output file in shell, but I think this action will make me lost the CSV format.
If I use the multiStorage to store the relation, I'm not able to format the output file in CSV.
So, is it possible to apply the CSVExcelStorage from a Relation to a Relation?
Do you have any other suggestion?
Thanks,

In fine, i used a shell to simulate the multistorage with some filter and CSVExcelStorage.
sklt="file.pig.skeleton"
pig="file.pig"
cp ${sklt} ${pig}
for waza in $anOtherVar
do
echo "R2 = R1 FILTER JEANNO IN ('${waza}')" >> ${pig}
echo -e "STORE R2 INTO '$myPath/${waza}' USING org.apache.pig.piggybank.storage.CSVExcelStorage(';');\n" >> ${pig}
done
pig -f ${pig} -p table=$anOtherVar -p myPath=/past/a/box/
if this awesome solution can help some other pig addict...

Related

How to insert a Line into a file in AIX using sed preferably?

I want to insert a line "new line" into a file "Textfile.txt" at line number 3 in AIX.
Before insertion Textfile.txt looks like
one
two
four
After Insertion Textfile.txt looks like
one
two
new line
four
I have already done it on Linux how ever with AIX I am finding it not working with solution of Linux.
Surprisingly I couldn't find a simple solution for this problem anywhere.
I am using this command in Linux and is working
echo "target_node = ${arr[0]}"
echo "target_file = ${arr[1]}"
echo "target_line = ${arr[2]}"
echo "target_text = ${arr[3]}"
escape "$(ssh -f ${arr[0]} "sed -i "${arr[2]}i$(escape ${arr[3]})" ${arr[1]}; exit")"
To sum the previous bits of information written as comments:
Option -i doesn't exist in AIX!sed, use a temporary file; the syntax of command is more strict than in Linux.
sed '2a\
Insert this after the 2nd line' "$target_file" >"$target_file.tmp"
mv -- "$target_file.tmp" "$target_file"
Hi Thanks for the help,
I created script in such a way that it copies the file to linux update changes and movies to AIX.

Using documentParser function in Teradata Aster

I'm working with Teradata's Aster and am trying to parse a pdf(or html) file such that it is inserted into a table in the Beehive database in Aster. The entire pdf should correspond to a single row of data in the table.
This is to be done by using one of Aster's SQL-MR functions called documentParser. This will produce a text file(.rtf) containing a single row produced by parsing all the chapters from the pdf file, which would be then loaded into the table in Beehive.
I have been given this script that shows the use of documentParser and other steps involved in this parsing process -
/* SHELL INSTRUCTIONS */
--transform file in b64 (change file names to your relevant file)
base64 pp.pdf>pp.b64
--prepare a loadfile
rm my_load_file.txt
-- get the content of the file
var=$(cat pp.b64)
-- put in file
echo \""pp.b64"\"","\""$var"\" >> "my_load_file.txt"
-- create staging table
act -U db_superuser -w db_superuser -d beehive -c "drop table if exists public.cf_load_file;"
act -U db_superuser -w db_superuser -d beehive -c "create dimension table public.cf_load_file(file_name varchar, content varchar);"
-- load into staging table
ncluster_loader -U db_superuser -w db_superuser -d beehive --csv --verbose public.cf_load_file my_load_file.txt
-- use document parser to load the clean text (you will need to create the table beforehand)
act -U db_superuser -w db_superuser -d beehive -c "INSERT INTO got_data.cf_got_text_data (file_name, content) SELECT * FROM documentParser (ON public.cf_load_file documentCol ('content') mode ('text'));"
--done
However, I am stuck on the last step of the script because it looks like there is no function called documentParser in the list of functions that are available in Aster. This is the error I get -
ERROR: function "documentparser" does not exist
I tried to search for this function several times with the command \dF, but did not get any match.
I've attached a picture which present the gist of what I'm trying to do.
SQL-MR Document Parser
I would appreciate any help if any one has any experience with this.
What happened is that someone told you about this function documentParser but never gave you the function archive file (documentParser.zip) to install in Aster. This function does exist but it's not part of the official Aster Analytics Foundation (AAF). Please contact person who gave you this info for help.
documentParser belongs to so-called field functions that are developed and used by the Aster field team only. Not that you can't use it, but don't expect support to help you - only whoever gave you access to it.
If you don't have any contacts then next course of action I'd suggest to go to Aster Community Network and ask question about it there.

save all my work on sql in a file?

Actually , i am working on Mysql in linux terminal .
i want a way or a command to save all the queries i write and their outputs in a file .
well, write every query and redirect it to a file is very hard and useless !
if there is any bash script or command it will be helpfull .
yes , tee command can be use for this purpose .
while logging into mysql you can make this redirection like
mysql -u username -pPassword | tee -a outputfilename
your whole session will be stored in the file
This is a bit advanced, but I've just started playing with org-babel, and it's pretty great for SQL.
Set up org-babel in your init.el:
(org-babel-do-load-languages 'org-babel-load-languages
'((sql . t)))
(setq org-confirm-babel-evaluate nil
org-src-fontify-natively t
org-src-tab-acts-natively t)
And create an org-mode buffer. You can just run M-x org-mode in *scratch* if you want.
Then write your SQL:
#+BEGIN_SRC sql :engine "mysql" :dbhost "db.example.com" :dbuser "jqhacker" :dbpassword "passw0rd" :database "the_db"
show tables
select * from the_table limit 10
#+END_SRC
Evaluate it by putting the cursor in the SQL block and type C-c C-c. The results show up in the buffer. You can write as many source blocks as you like, and evaluate them in any order.
There's a lot more to org-babel: http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-sql.html
i just founded that there is an sql command to save query and output in a file ;
mysql> tee filename ;
example :
mysql> tee tmp/output.out;
..logging to file 'tmp/output.out'
now : every query and his output will be saved in a output.out file.
note : " remember to write file name without quotes"

Accurev: How can I get a list of all users from a stream?

I need to get a list of all users who've contributed to a stream. I think I can just dump the entire history of the stream then parse it for the users like this (see hist for details):
accurev hist -s <stream> -a -fv
but this seems very crude, especially since I'm not interested in the history itself. Is there a more elegant way of doing this?
This works nicely:
accurev hist -p <depot> -s <stream> -a -fv | sed -n 's/.*user: \(.*\)/\1/p' | sort | uniq
You need to run the accurev hist command to obtain this information.
You can add the "-k promote" option to restrict the output to show only promote operations.
Also, you can use the -fx option to format the output in XML and create script to generate a simple list of users.

Executing batches of commands using redis cli

I have a long text file of redis commands that I need to execute using the redis command line interface:
e.g.
DEL 9012012
DEL 1212
DEL 12214314
etc.
I can't seem to figure out a way to enter the commands faster than one at a time. There are several hundred thousands lines, so I don't want to just pile them all into one DEL command, they also don't need to all run at once.
the following code works for me with redis 2.4.7 on mac
./redis-cli < temp.redisCmds
Does that satisfy your requirements? Or are you looking to see if there's a way to programmatically do it faster?
If you don't want to make a file, use echo and \n
echo "DEL 9012012\nDEL 1212" | redis-cli
The redis-cli --pipe can be used for mass-insertion. It is available since 2.6-RC4 and in Redis 2.4.14.
For example:
cat data.txt | redis-cli --pipe
More info in: http://redis.io/topics/mass-insert
I know this is an old old thread, but adding this since it seems missed out among other answers, and one that works well for me.
Using heredoc works well here, if you don't want to use echo or explicitly add \n or create a new file -
redis-cli <<EOF
select 15
get a
EOF