Hive output to xlsx - hive

I am not able to open an .xlsx file. Is this the correct way to output the result to an .xlsx file?
hive -f hiveScript.hql > output.xlsx

hive -S -f hiveScript.hql > output.xls
This will work

There is no easy way to create an Excel (.xlsx) file directly from hive. You could output you queries content to an older version of Excel (.xls) by the answers given above and it would open in Excel properly (with an initial warning in latest versions of Office) but in essence it is just a text file with .xls extension. If you open this file with any text editor you would see the contents of the query output.
Take any .xlsx file on your system and open it with a text editor and see what you get. It will be all junk characters since that is not a simple text file.
Having said that there are many programming languages that allow you to convert/read a text file and create xlsx. Since no information is provided/requested on this I will not go into details. However, you may use Pandas in Python to create excels.

output csv or tsv file, and I used Python to do converting (pandas library)

I am away from my setup right now so really cannot test this. But you can give this a try in your hive shell:
hive -f hiveScript.hql >> output.xls

Related

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

Incorrect csv file from impala output

Snapshot of the "csv" file
So I have the table created by impala saved as csv file using the following code:
impala-shell -B -o output.csv --output_delimiter=',' -q "select * from foo" which is supposed to return a csv file, but it didnt.
Anyone help will be appreciated!
The output file you have is close to a CSV file, except that it is using pipes instead of commas. To import this into Excel, just use the wizard and select pipe (|) as the delimiter.
I'm not sure what went wrong with your Impala export, but you should be able to work with what you already have.

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?
For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)

How to use csvSeparator with jpexport (jprofiler)?

I am using jprofiler to make some tests about the memory usage of my application. I would like to include them in my build process. All the steps should work in command like.
On step exports csv file from jps file with a command like:
~/jprofiler7/bin/jpexport q1.jps "TelemetryHeap" -format=csv q1_telemetry_heap.csv
On my local machine (widows), it is working. On my server (linux) the csv file is not well formatted:
"Time [s]","Committed size","Free size","Used size"
0.0,30,784,000,19,558,000,11,226,000
1.0,30,976,000,18,376,000,12,600,000
2.0,30,976,000,16,186,000,14,790,000
3.0,30,976,000,16,018,000,14,958,000
4.01,30,976,000,14,576,000,16,400,000
They is no way to distinguish the comma of csv format and the one of the numbering format.
According to the documentation, I need to change the value of -Djprofiler.csvSeparator in the file bin/export.vmoptions.
But I fail. I also try to change this value in jpexport.vmoptions and in jprofiler.vmoptions.
What should I do?
Thanks for your help
This bug was fixed in JProfiler 8.0.2.
Adding
-Djprofiler.csvSeparator=;
on a new line in bin/jpexport.vmoptions should work in JProfiler 7, though.

How to split sql in MAC OSX?

Is there any app for mac to split sql files or even script?
I have a large files which i have to upload it to hosting that doesn't support files over 8 MB.
*I don't have SSH access
You can use this : http://www.ozerov.de/bigdump/
Or
Use this command to split the sql file
split -l 5000 ./path/to/mysqldump.sql ./mysqldump/dbpart-
The split command takes a file and breaks it into multiple files. The -l 5000 part tells it to split the file every five thousand lines. The next bit is the path to your file, and the next part is the path you want to save the output to. Files will be saved as whatever filename you specify (e.g. “dbpart-”) with an alphabetical letter combination appended.
Now you should be able to import your files one at a time through phpMyAdmin without issue.
More info http://www.webmaster-source.com/2011/09/26/how-to-import-a-very-large-sql-dump-with-phpmyadmin/
This tool should do the trick: MySQLDumpSplitter
It's free and open source.
Unlike the accepted answer to this question, this app will always keep extended inserts intact so the precise form of your query doesn't matter; the resulting files will always have valid SQL syntax.
Full disclosure: I am a share holder of the company that hosts this program.
The UploadDir feature in phpMyAdmin could help you, if you have FTP access and can modify your phpMyAdmin's configuration (or are allowed to install your own instance of phpMyAdmin).
http://docs.phpmyadmin.net/en/latest/config.html?highlight=uploaddir#cfg_UploadDir
You can split into working SQL statements with:
csplit -s -f db-part db.sql "/^# Dump of table/" "{99}"
Which makes up to 99 files named 'db-part[n]' from db.sql
You can use "CREATE TABLE" or "INSERT INTO" instead of "# Dump of ..."
Also: Avoid installing any programs or uploading your data into any online service. You don't know what will be done with your information!