I am new to using SQL, so please bear with me.
I need to import several hundred csv files into PostgreSQL. My web search has only indicated how to import many csv files into one table. However, most csv files have different column types (all have one line headers). Is it possible to somehow run a loop, and have each csv imported to a table with the same name as the csv? Creating each table manually and specifying columns is not an option. I know that COPY will not work as the table needs to already by specified.
Perhaps this is not feasible in PostgreSQL? I would like to accomplish this in pgAdmin III or the PSQL console, but I am open to other ideas (using something like R to change the csv to a format more easily entered into PostgreSQL?).
I am using PostgreSQL on a Windows 7 computer. It was requested that I use PostgreSQL, thus the focus of the question.
The desired result is a database full of tables, that I will then join with a spreadsheet that includes specific site data. Thanks!
Use pgfutter.
The general syntax looks like this:
pgfutter csv
In order to run this on all csv files in a directory from Windows Command Prompt, navigate to the desired directory and enter:
for %f in (*.csv) do pgfutter csv %f
Note that the path for the downloaded program must be added to the list of accepted paths for Environmental Variables.
EDIT:
Here is the command line code for Linux users
Run it as
pgfutter *.csv
Or if that won't do
find -iname '*.csv' -exec pgfutter csv {} \;
In the terminal use nano to make a file to loop through moving csv files under my directory to postgres DB
>nano run_pgfutter.sh
The content of run_pgfutter.sh:
#! /bin/bash
for i in /mypath/*.csv
do
./pgfutter csv ${i}
done
Then make the file executable:
chmod u+x run_pgfutter.sh
Related
This question may have been asked before, and I am relatively new to the HADOOP and HIVE language. So I'm trying to export content, as a test, to see if I am doing things correctly. The code is below.
Use MY_DATABASE_NAME;
INSERT OVERWRITE LOCAL DIRECTORY '/random/directory/test'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT date_ts,script_tx,sequence_id FROM dir_test WHERE date_ts BETWEEN '2018-01-01' and '2018-01-02';
That is what I have so far, but then it generates multiple files and I want to combine them into a .csv file or a .xls file, to be worked on. My question, what do I do next to accomplish this?
Thanks in advance.
You can achieve by following ways:
Use single reducer in the query like ORDER BY <col_name>
Store to HDFS and then use command hdfs dfs –getmerge [-nl] <src> <localdest>
Using beeline: beeline --outputformat=csv2 -f query_file.sql > <file_name>.csv
I am currently working on a project and I want to know how to save an sqllite database in rails as a csv file. I want it when you click the button, the current database on the system download. Can anybody help me? Thanks!
Your problem isn't really specific to Rails. Instead, you're mostly dealing with an administrative issue. You should write a script to export your database as csv, something like this:
#!/bin/bash
./bin/sqlite3 ./my_app/db/my_database.db <<!
.headers on
.mode csv
.output my_output_file.csv
select * from my_table;
!
This script exports a single table. If you have additional tables, you'll want to add them to your script.
The only Rails related issue is the matter of calling that script. Save the script within your application structure; I'd suggest my_app/assets or some similar location.
Now you can run that script using system(command) where command is the absolute path for your script, within a set of double-quotes.
Is there a way in SQL to import ALL csv contained in a directory to my postgres table ?
Thank you for your help.
This isn't PostgreSQL specific at all. You need to use a loop to invoke psql once per file, generate a master file that contains \include commands for each input file, or concatenate all the input files and then run them.
All of these tasks are done with the operating system command prompt tools, they're not to do with PostgreSQL.
To concatenate the files and run the result:
type *.sql | psql
To loop over them using cmd.exe's amazingly, incredibly ugly for loop syntax (untested, see How to loop through files matching wildcard in batch file and Iterate all files in a directory using a 'for' loop for details):
FOR %%f IN (*.sql) DO (
echo %%~nf
psql -f %%~nf
)
I'm sure the 3rd way is possible too. This might work (the equivalent works in a real shell on a unix box) but is untested as I don't have Windows conveniently to hand:
FOR %%f IN (*.sql) DO (
echo "\include %%~nf"
) | psql
As far as I know none of these guarantee that the files are run in any particular order. You'd need to look up the documentation on the Windows shell.
Personally - I recommend doing this sort of thing in Powershell, or in Perl or some other less awful scripting language than the cmd.exe batch language.
Either way, all the files must have the system default 1-byte text encoding or must contain an explicit SET client_encoding= ... statement.
i have created a table in Hive "sample" and loaded a csv file "sample.txt" into it.
now i need that data from "sample" into my local /opt/zxy/sample.txt.
How can i do that?
Hortonworks' Sandbox lets you do it through its HCatalog menu. Otherwise, the syntax is
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/c' SELECT a.* FROM b
as per Hive language manual
Since your intention is just to copy the entire file from HDFS to your local FS, I would not suggest you to do it through a Hive query, because of the following reasons :
It'll start a Mapreduce job which will take more time than a normal copy.
It'll create file(s) with different names(000000_0, 000001_0 and so on), which will require you to rename the file manually afterwards.
You might face problem in opening these files as they are without any extension. Your OS would be unable to choose an application to open these files on its own. In such a case you either have to rename the file or manually select an application to open it.
To avoid these problems you could use HDFS get command :
bin/hadoop fs -get /user/hive/warehouse/sample/sample.txt /opt/zxy/sample.txt
Simple n easy. But if you need to copy some selected data, then you have to use a Hive query.
HTH
I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select * from sample' > /opt/zxy/sample.txt
Hope that helps.
Readers who are accessing Hive from Windows OS can check out this script on Github.
It's a Python+paramiko script that extracts Hive data to local Windows OS file-system.
I have table with more than 3 000 000 rows. I have try to export the data from it manually and with SQL Server Management Studio Export data functionality to Excel but I have met several problems:
when create .txt file manually copying and pasting the data (this is several times, because if you copy all rows from the SQL Server Management Studio it throws out of memory error) I am not able to open it with any text editor and to copy the rows;
the Export data to Excel do not work, because Excel do not support so many rows
Finally, with the Export data functionality I have created a .sql file, but it is 1.5 GB, and I am not able to open it in SQL Server Management Studio again.
Is there a way to import it with the Import data functionality, or other more clever way to make a backup of the information of my table and then to import it again if I need it?
Thanks in advance.
I am not quite sure if I understand your requirements (I don't know if you need to export your data to excel or you want to make some kind of backup).
In order to export data from single tables, you could use Bulk Copy Tool which allows you to export data from single tables and exporting/Importing it to files. You can also use a custom Query to export the data.
It is important that this does not generate a Excel file, but another format. You could use this to move data from one database to another (must be MS SQL in both cases).
Examples:
Create a format file:
Bcp [TABLE_TO_EXPORT] format "[EXPORT_FILE]" -n -f "[ FORMAT_FILE]" -S [SERVER] -E -T -a 65535
Export all Data from a table:
bcp [TABLE_TO_EXPORT] out "[EXPORT_FILE]" -f "[FORMAT_FILE]" -S [SERVER] -E -T -a 65535
Import the previously exported data:
bcp [TABLE_TO_EXPORT] in [EXPORT_FILE]" -f "[FORMAT_FILE] " -S [SERVER] -E -T -a 65535
I redirect the output from hte export/import operations to a logfile (by appending "> mylogfile.log" ad the end of the commands) - this helps if you are exporting a lot of data.
Here a way of doing it without bcp:
EXPORT THE SCHEMA AND DATA IN A FILE
Use the ssms wizard
Database >> Tasks >> generate Scripts… >> Choose the table >> choose db model and schema
Save the SQL file (can be huge)
Transfer the SQL file on the other server
SPLIT THE DATA IN SEVERAL FILES
Use a program like textfilesplitter to split the file in smaller files and split in files of 10 000 lines (so each file is not too big)
Put all the files in the same folder, with nothing else
IMPORT THE DATA IN THE SECOND SERVER
Create a .bat file in the same folder, name execFiles.bat
You may need to check the table schema to disable the identity in the first file, you can add that after the import in finished.
This will execute all the files in the folder against the server and the database with, the –f define the Unicode text encoding should be used to handle the accents:
for %%G in (*.sql) do sqlcmd /S ServerName /d DatabaseName -E -i"%%G" -f 65001
pause