i have created a table in Hive "sample" and loaded a csv file "sample.txt" into it.
now i need that data from "sample" into my local /opt/zxy/sample.txt.
How can i do that?
Hortonworks' Sandbox lets you do it through its HCatalog menu. Otherwise, the syntax is
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/c' SELECT a.* FROM b
as per Hive language manual
Since your intention is just to copy the entire file from HDFS to your local FS, I would not suggest you to do it through a Hive query, because of the following reasons :
It'll start a Mapreduce job which will take more time than a normal copy.
It'll create file(s) with different names(000000_0, 000001_0 and so on), which will require you to rename the file manually afterwards.
You might face problem in opening these files as they are without any extension. Your OS would be unable to choose an application to open these files on its own. In such a case you either have to rename the file or manually select an application to open it.
To avoid these problems you could use HDFS get command :
bin/hadoop fs -get /user/hive/warehouse/sample/sample.txt /opt/zxy/sample.txt
Simple n easy. But if you need to copy some selected data, then you have to use a Hive query.
HTH
I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select * from sample' > /opt/zxy/sample.txt
Hope that helps.
Readers who are accessing Hive from Windows OS can check out this script on Github.
It's a Python+paramiko script that extracts Hive data to local Windows OS file-system.
Related
I am under strict corporate environment and don't have access to Postgres' psql. Therefore I can't do what's shown e.g. in the SO Convert SQLITE SQL dump file to POSTGRESQL. However, I can generate the sqlite dump file .sql. The resulting dump.sql file is 1.3gb big.
What would be the best way to import this data into Postgres? I also have DBeaver and can connect to both databases simultaneously but unfortunately can't do INSERT from SELECT.
I think the term for that is 'absurd', not 'strict'.
DBeaver has an 'execute script' feature. But who knows, maybe it will be blocked.
EnterpriseDB offers binary downloads. If you unzip those to a local drive you might be able to execute psql from the bin subdirectory.
If you can install psycopg2 or pg8000 for python, you should be able to connect to the database and then loop over the dump file sending each line to the database with cur.execute(line) . It might take some fiddling if the dump file has any multi-line commands, but the example you linked to doesn't show any of those.
I created a backup cmd file with this code
EXPDP system/system EXCLUDE=statistics DIRECTORY=bkp_dir DUMPFILE=FULLDB.DMP LOGFILE=FULLDB.log FULL=Y
it works good, but, when I run the backup again, it finds that the file exists
and terminate the process. it will not run unless I delete the previous file or rename it. I want to add something to the dumpfile and logfile name that creates a daily difference between them, something like the system date, or a copy number or what else.
The option REUSE_DUMPFILES specifies whether to overwrite a preexisting dump file.
Normally, Data Pump Export will return an error if you specify a dump
file name that already exists. The REUSE_DUMPFILES parameter allows
you to override that behavior and reuse a dump file name.
If you wish to dump separate file names for each day, you may use a variable using date command in Unix/Linux environment.
DUMPFILE=FULLDB_$(date '+%Y-%m-%d').DMP
Similar techniques are available in Windows, which you may explore if you're running expdp in Windows environment.
I am currently working on a project and I want to know how to save an sqllite database in rails as a csv file. I want it when you click the button, the current database on the system download. Can anybody help me? Thanks!
Your problem isn't really specific to Rails. Instead, you're mostly dealing with an administrative issue. You should write a script to export your database as csv, something like this:
#!/bin/bash
./bin/sqlite3 ./my_app/db/my_database.db <<!
.headers on
.mode csv
.output my_output_file.csv
select * from my_table;
!
This script exports a single table. If you have additional tables, you'll want to add them to your script.
The only Rails related issue is the matter of calling that script. Save the script within your application structure; I'd suggest my_app/assets or some similar location.
Now you can run that script using system(command) where command is the absolute path for your script, within a set of double-quotes.
I am trying to import/ copy my csv file to PostgreSQL. However, I am encountering these errors. I don't have import/ write permissions to the file. Will stdin help and how?The Postgres docs provides no examples. I was henceforth asked to do bulk insert but since there are too many columns with mixed data types, I am not sure how to proceed with that further.
Command to copy the csv file:
COPY sales.sales_tickets
FROM 'C:/Users/Nandini/Downloads/AIG_Sales_Tickets.csv'
DELIMITER ',' CSV;
ERROR: must be superuser to COPY to or from a file
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
1 statement failed.
Command to do bulk insert is too time taking:
insert into sales.sales_ticket values (1,'2',3,'4','5',6,7,8,'9',10','11');
Please suggest. Thank you.
From PostgreSQL docummentation on COPY:
COPY naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
and
Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. Similarly, the command specified with PROGRAM is executed directly by the server, not by the client application, must be executable by the PostgreSQL user. COPY naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
You're trying to use the COPY command violating two of the requirements:
You're trying to execute the COPY command from a non-super user.
You're trying to read a file on your client machine, and have it copied to the server.
This won't work. If you need to perform such a COPY, you need to:
Copy the CSV file to the server; to a directory that can be read by the (system) user executing the PostgreSQL server process.
Execute the COPY command from a superuser account.
Alternative
If you can't do some of these, you can always use a tool such as pgAdmin 4 and use its Import/Export functionality.
See also How to import CSV file data into a PostgreSQL table?
You are an ideal case to use /COPY not COPY.
/COPY sales.sales_tickets
FROM 'C:/Users/Nandini/Downloads/AIG_Sales_Tickets.csv'
DELIMITER ',' CSV;
I do the following from a hive table myTable.
INSERT OVERWRITE LOCAL DIRECTORY '/myDir/out' SELECT concat_ws('',NAME,PRODUCT,PRC,field1,field2,field3,field4,field5) FROM myTable;
So, this command generates 2 files 000000_0 and 000001_0 inside the folder out/.
But, I need the contents as a single file. What should I do?
There are multiple files in the directory because every reducer is writing one file. If you really need the contents as a single file, run your map reduce job with only 1 reducer which will write to a single file.
However depending on your data size, this might not be a good approach to run a single reducer.
Edit: Instead of forcing hive to run 1 reduce task and output a single reduce file, it would be better to use hadoop fs operations to merge outputs to a single file.
For example
hadoop fs -text /myDir/out/* | hadoop fs -put - /myDir/out.txt
A bit late to the game, but I found that using LIMIT large_number, where large_number is bigger than rows in your query. It forces hive to use at least a reducer. For example:
set mapred.reduce.tasks=1; INSERT OVERWRITE LOCAL DIRECTORY '/myDir/out' SELECT * FROM table_name LIMIT 1000000000
Worked flawlessly.
CLUSTER BY will make the work.