Load xlsx file into Pig - apache-pig

Is there any way to load .xlsx files into Pig? I need to perform an operation in PIG using the excel file [.xlsx] as input, but i couldn't find any built-in functions available for this purpose.?
Any help to achieve this would be appreciable.
Thanks,

Try this,
First convert the xlsx file into csv then do the following,
REGISTER Location\to\piggybank.jar
Data = load 'Location\to\csv\file' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (col1,col2,..);
It (CSVExcelStorage) worked for me. Hope it works.

No, but if your excel has only one sheet then you can try to use CSVExcelStorage
How to use? check for pig example in below link
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html

Related

can we use csv file to get the data in cucumber

Can we use csv file to get data in cucumber?
By using
#RunWith(SerenityParameterizedRunner.class)
#UseTestDataFrom(value="testdata/status-levels.csv")
It is asking for #Test
as we should not use test in cucumber.
Can any one tell me how to read a file from csv file in cucumber?
You can use Transformer class. You need to customize your transformer and then use it in your stepdefinition.

Change output file format to *.csv using dymosim.exe instead of *.mat

I am trying to understand if it's possible to change the model output format to .csv instead of the default .mat file when simulating a model using dymosim.exe.
I can do this in dymola itself by using the function "convertMATtoCSV" in the base Data files library. Something like below,
DataFiles.convertMATtoCSV("output.mat", {"t"}, "output.csv");
Is there a way to do this conversion using dymosim.exe?
Kindly advise.
Thanks.
Note: cmd "dymosim.exe -h" has some options for .csv but I am not sure how to use this.
No, it is currently not possible to have dymosim.exe generated by Dymola write the result as csv-file. The CSV-options used by dymosim.exe are only for running multiple simulations.
You can:
Generate a txt result instead, if that is easier to handle for you. (By setting Simulation Setup>Output>Textual data format, this is stored as last element of settings in dsin.txt).
Perform the conversion using dymola\bin\alist.exe
Have the model write a cvs-file as well
Set up to perform this as a post-processing command in Dymola 2017 FD01.

Hive output to xlsx

I am not able to open an .xlsx file. Is this the correct way to output the result to an .xlsx file?
hive -f hiveScript.hql > output.xlsx
hive -S -f hiveScript.hql > output.xls
This will work
There is no easy way to create an Excel (.xlsx) file directly from hive. You could output you queries content to an older version of Excel (.xls) by the answers given above and it would open in Excel properly (with an initial warning in latest versions of Office) but in essence it is just a text file with .xls extension. If you open this file with any text editor you would see the contents of the query output.
Take any .xlsx file on your system and open it with a text editor and see what you get. It will be all junk characters since that is not a simple text file.
Having said that there are many programming languages that allow you to convert/read a text file and create xlsx. Since no information is provided/requested on this I will not go into details. However, you may use Pandas in Python to create excels.
output csv or tsv file, and I used Python to do converting (pandas library)
I am away from my setup right now so really cannot test this. But you can give this a try in your hive shell:
hive -f hiveScript.hql >> output.xls

Using Pentaho Kettle, how can I convert a csv using commas to a csv with pipe delimiters?

I have a CSV input file with commas. I need to change the delimiter to pipe. Which step should I use in Pentaho kettle? Please do suggest.
Thanks!
Do not use big gun when you try to shoot small target. Can use sed or awk. Or when you want to integrate with kettle, can use step to run shell script and within script use sed for example.
If your goal is to output a pipe separated CSV file from data within a transform and you're already running Kettle, just use a Text File output step.
If the goal is to do something unusual with CSV data within the transform itself, you might look into the Concat Fields step.
If the goal is simply to take a CSV file and write out another CSV with different separators, use the solution #martinnovoty suggests.
You can achieve this easy:
Add a javascript step after the load your csv step into a variable "foo" and add this code onto the js step:
var newFoo = replace(foo,",", "|");
now your cvs file is loaded in newFoo var with pipes.

How to convert MetaStock format to CSV?

http://en.wikipedia.org/wiki/MetaStock
Anybody knows how to convert metastock data format to ASCII/CSV format ?
Any sample code (c++/c#) would be of great help.
The command line tool "atem" is a portable open source metastock parser implementaion in C/C++. It's very fast.
http://freshmeat.net/projects/atem
Metalib is a non-free .NET API for reading and writing Metastock files:
http://www.trading-tools.com/metalib.htm