PyHive: how to get data into local storage? - hive

I have an existing Bash script that queries hive with this sort of prefix:
insert overwrite local directory 'tue1' row format delimited fields terminated by '|' stored as textfile ...
This works fine via the Hive command line, but using PyHive ... I connect OK to the same localhost where the script was running, but the 'local directory' part seems to be missing.
Is there some configuration I need to set up? Any guidance whatsoever will be much appreciated!

Related

Exporting Hive Table Data into .csv

This question may have been asked before, and I am relatively new to the HADOOP and HIVE language. So I'm trying to export content, as a test, to see if I am doing things correctly. The code is below.
Use MY_DATABASE_NAME;
INSERT OVERWRITE LOCAL DIRECTORY '/random/directory/test'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT date_ts,script_tx,sequence_id FROM dir_test WHERE date_ts BETWEEN '2018-01-01' and '2018-01-02';
That is what I have so far, but then it generates multiple files and I want to combine them into a .csv file or a .xls file, to be worked on. My question, what do I do next to accomplish this?
Thanks in advance.
You can achieve by following ways:
Use single reducer in the query like ORDER BY <col_name>
Store to HDFS and then use command hdfs dfs –getmerge [-nl] <src> <localdest>
Using beeline: beeline --outputformat=csv2 -f query_file.sql > <file_name>.csv

Import/ Copy csv file into PostgreSQL | File not on local Server

I am trying to import/ copy my csv file to PostgreSQL. However, I am encountering these errors. I don't have import/ write permissions to the file. Will stdin help and how?The Postgres docs provides no examples. I was henceforth asked to do bulk insert but since there are too many columns with mixed data types, I am not sure how to proceed with that further.
Command to copy the csv file:
COPY sales.sales_tickets
FROM 'C:/Users/Nandini/Downloads/AIG_Sales_Tickets.csv'
DELIMITER ',' CSV;
ERROR: must be superuser to COPY to or from a file
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
1 statement failed.
Command to do bulk insert is too time taking:
insert into sales.sales_ticket values (1,'2',3,'4','5',6,7,8,'9',10','11');
Please suggest. Thank you.
From PostgreSQL docummentation on COPY:
COPY naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
and
Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. Similarly, the command specified with PROGRAM is executed directly by the server, not by the client application, must be executable by the PostgreSQL user. COPY naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
You're trying to use the COPY command violating two of the requirements:
You're trying to execute the COPY command from a non-super user.
You're trying to read a file on your client machine, and have it copied to the server.
This won't work. If you need to perform such a COPY, you need to:
Copy the CSV file to the server; to a directory that can be read by the (system) user executing the PostgreSQL server process.
Execute the COPY command from a superuser account.
Alternative
If you can't do some of these, you can always use a tool such as pgAdmin 4 and use its Import/Export functionality.
See also How to import CSV file data into a PostgreSQL table?
You are an ideal case to use /COPY not COPY.
/COPY sales.sales_tickets
FROM 'C:/Users/Nandini/Downloads/AIG_Sales_Tickets.csv'
DELIMITER ',' CSV;

Absolute path is treated as a relative path [duplicate]

I am running this statement in a Django app:
c = connections['default'].cursor()
query="copy (select * from analysis.\"{0}\") to STDOUT DELIMITER ',' CSV HEADER;".format(view_name)
with open(csvFile,'w') as f:
c.copy_expert(query,f)
f.close()
It does not create the correct csv file. Some of the values appear to be in the wrong columns. I am trying to test the SQL statement by running it in POSTGRESQL:
copy (select * from analysis."S03_2005_activity_140807_153431_with_geom") to 'C:/djangoProjects/web_output/csvfiles/S03_2005_activity_140807_153431_with_geom.csv' DELIMITER ',' CSV HEADER;
It gives me: "ERROR: relative path not allowed for COPY to file". I have looked into the issue and it appears to typically be one of two issues: 1. confusing '\' and '/'. My slashes should be correct. 2. The server being on a different computer. I thought this may be my issue as the database is located on an external computer, but I have the connection in my Postgresql. It also runs from Django so I'm not sure why it isn't working from PG Admin.
If you want to store data / get data from your local machine and communicate with a Postgres server on a different, remote machine, you cannot simply use COPY.
Try the meta-command \copy in psql. It's a wrapper for the SQL COPY command and uses local files.
Your filename should work as is on a Windows machine, but Postgres interprets it as a local filename on the server, which is probably a Unix derivate. And there the filename would have to start with '/'.

MySQL mysqldump command error(bug in mysql 5.5)

I am working on exporting a table from my server DB which is about few thousand rows and the PHPMyadmin is unable to handle it.So I switched to the command line option
But I am running into this error after executing the mysqldump command.The error is
Couldn't execute 'SET OPTION SQL_QUOTE_SHOW_CREATE=1': You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_QUOTE_SHOW_CREATE=1' at line 1 (1064)
After doing some search on the same I found this as a bug in the mysql version 5.5 not supporting the SET OPTION command.
I am running a EC2 instance with centos on it.My mysql version is 5.5.31(from my phpinfo).
I would like to know if there is a fix for this as it wont be possible to upgrade the entire database for this error.
Or if there is any other alternative to do a export or dump,please suggest.
An alternative to mysqldump is the SELECT ... INTO form of SELECT, which allows results to be written to a file (http://dev.mysql.com/doc/refman/5.5/en/select-into.html).
Some example syntax from the above help page is:
SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM test_table;
Data can then be loaded back in using LOAD DATA INFILE (http://dev.mysql.com/doc/refman/5.5/en/load-data.html).
Again the page gives an example:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx';
And with a complete worked example pair:
When you use SELECT ... INTO OUTFILE in tandem with LOAD DATA INFILE
to write data from a database into a file and then read the file back
into the database later, the field- and line-handling options for both
statements must match. Otherwise, LOAD DATA INFILE will not interpret
the contents of the file properly. Suppose that you use SELECT ...
INTO OUTFILE to write a file with fields delimited by commas:
SELECT * INTO OUTFILE 'data.txt' FIELDS TERMINATED BY ','
FROM table2;
To read the comma-delimited file back in, the correct statement would
be:
LOAD DATA INFILE 'data.txt' INTO TABLE table2 FIELDS TERMINATED BY ',';
Not tested, but something like this:
cat yourdumpfile.sql | grep -v "SET OPTION SQL_QUOTE_SHOW_CREATE" | mysql -u user -p -h host databasename
This inserts the dump into your database, but removes the lines containing "SET OPTION SQL_QUOTE_SHOW_CREATE". The -v means reverting.
Couldn't find the english manual entry for SQL_QUOTE_SHOW_CREATE to link it here, but you don't need this option at all, when your table and database names don't include special characters or something (meaning they don't need to put in quotes).
UPDATE:
mysqldump -u user -p -h host database | grep -v "SET OPTION SQL_QUOTE_SHOW_CREATE" > yourdumpfile.sql
Then when you insert the dump into database you have to do nothing special.
mysql -u user -p -h host database < yourdumpfile.sql
I used quick and dirty hack for this.
Download mysql 5.6. (from https://downloads.mariadb.com/archive/signature/p/mysql/f/mysql-5.6.13-linux-glibc2.5-x86_64.tar.gz/v/5.6.13)
Untar and use newly downloaded mysqldump.

how can i copy data from a Hive table into local system?

i have created a table in Hive "sample" and loaded a csv file "sample.txt" into it.
now i need that data from "sample" into my local /opt/zxy/sample.txt.
How can i do that?
Hortonworks' Sandbox lets you do it through its HCatalog menu. Otherwise, the syntax is
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/c' SELECT a.* FROM b
as per Hive language manual
Since your intention is just to copy the entire file from HDFS to your local FS, I would not suggest you to do it through a Hive query, because of the following reasons :
It'll start a Mapreduce job which will take more time than a normal copy.
It'll create file(s) with different names(000000_0, 000001_0 and so on), which will require you to rename the file manually afterwards.
You might face problem in opening these files as they are without any extension. Your OS would be unable to choose an application to open these files on its own. In such a case you either have to rename the file or manually select an application to open it.
To avoid these problems you could use HDFS get command :
bin/hadoop fs -get /user/hive/warehouse/sample/sample.txt /opt/zxy/sample.txt
Simple n easy. But if you need to copy some selected data, then you have to use a Hive query.
HTH
I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select * from sample' > /opt/zxy/sample.txt
Hope that helps.
Readers who are accessing Hive from Windows OS can check out this script on Github.
It's a Python+paramiko script that extracts Hive data to local Windows OS file-system.