Import Text data to Greenplum database - sql

I have a text file with some data and wants to import data to Greenplum database.After online research , I found that its better to use COPY command if your data size is small. So i decided to use this.
Here is the scenario:
I have placed my Text file at location /bin/bash /data , I can access this file using terminal, but once I run the following COPY sql script at Greenplum database it's says :
could not open file "/bin/bash /data/data.txt" for reading: No such file or directory
Below is the my sql script:
COPY userdata(customerid,time,trans,quantity) from '/bin/bash /data/data.txt' WITH DELIMITER ',';
From Greenplum database documentation I found the following line :
The COPY source file must be accessible to the master host. Specify the COPY source file name relative to the master host location.
But I do not know how to make it accessible to master host and relative to master host location.

The path to your file doesn't make any sense.
/bin/bash /data/data.txt is certainly not a valid name for a path.
If you data.txt file is located in the /data folder with content
in the following format :
12345,5:32AM,air,2
67890,6:42PM,rail,4
You could use the below command :
COPY userdata(customerid,time,trans,quantity) FROM '/data/data.txt' WITH DELIMITER AS ',';
Also sql user you should have the permission to access the the data.txt from the location /data folder.
Perhaps do a ls -l and check if the sql user can read files from data.txt

Related

COPY FROM .csv file to a remote PostgreSQL Database (Running in Linux Server)

I'm trying to import data from .csv file to a PostgreSQL database hosted in a Linux Server, using the following command:
COPY areas_brasil FROM 'C:/Temp/RELATORIO_DTB_BRASIL_MUNICIPIO.csv' with delimiter '|' null 'NULL';
But i'm receiving the following error:
ERROR: could not open file
"C:/Temp/RELATORIO_DTB_BRASIL_MUNICIPIO.csv" for reading: No such file
or directory TIP: COPY FROM instructs the PostgreSQL server process
to read a file. You may want a client-side facility such as psql's
\copy.
The .csv file is in a client computer (running on Windows 10) in which i have administrator access to the database hosted in the server (running on Linux - Debian).
Thanks for helping me!
Welcome to SO.
COPY .. FROM 'path' assumes that the file is located in the server. If you wish execute COPY without having the file into the database server, you can either use \copy or just use the STDIN of psql from your client console, e.g. in unix systems (you have to find the cat and | equivalent for Windows):
$ cat file.csv | psql yourdb -c "COPY areas_brasil FROM STDIN DELIMITER '|';"
Using \COPY inside of psql it can be done like this:
\COPY areas_brasil FROM '/home/jones/file.csv' DELIMITER '|';
See this answer for more details.

PostgreSQL Query To Create A Directory

Files are being written to a directory using the COPY query:
Copy (SELECT * FROM animals) To '/var/lib/postgresql/data/backups/2020-01-01/animals.sql' With CSV DELIMITER ',';
However if the directory 2020-01-01 does not exist, we get the error
could not open file "/var/lib/postgresql/data/backups/2020-01-01/animals.sql" for writing: No such file or directory
PostgeSQL server is running inside a Docker container with the volume mapping /mnt/backups:/var/lib/postgresql/data/backups
The Copy query is being sent from a Node.js app outside of the Docker container.
The mapped host directory /mnt/backups was created by Docker Compose and is owned by root, so the Node.js app sending the COPY query is unable to create the missing directories due to insufficient permissions.
The backup file is meant to be transferred out of the Docker container to the Docker host.
Question: Is it possible to use an SQL query to ask PostgreSQL 11.2 to create a directory if it does not exist? If not, how will you recommend the directory creation be done?
Using Node.js 12.14.1 on Ubuntu 18.04 host. Using PostgreSQL 11.2 inside container, Docker 19.03.5
An easy way to solve it is to create the file directly into the client machine. Using STDOUT from COPY you can let the query output be redirected to the client standard output, which you can catch and save in a file. For instance, using psql in the client machine:
$ psql -U your_user -d your_db -c "COPY (SELECT * FROM animals) TO STDOUT WITH CSV DELIMITER ','" > file.csv
Creating an output directoy in case it does not exist:
$ mkdir -p /mnt/backups/2020-01/ && psql -U your_user -d your_db -c "COPY (SELECT * FROM animals) TO STDOUT WITH CSV DELIMITER ','" > /mnt/backups/2020-01/file.csv
On a side note: try to avoid exporting files into the database server. Although it is possible, I consider it a bad practice. Doing so you will either write a file into the postgres system directories or give the postgres user permission to write somewhere else, and it is something you shouldn't be comfortable with. Export data directly to the client either using COPY as I mentioned or follow the advice from #Schwern. Good luck!
Postgres has its own backup and restore utilities which are likely to be a better choice than rolling your own.
When used with one of the archive file formats and combined with pg_restore, pg_dump provides a flexible archival and transfer mechanism. pg_dump can be used to backup an entire database, then pg_restore can be used to examine the archive and/or select which parts of the database are to be restored. The most flexible output file formats are the “custom” format (-Fc) and the “directory” format (-Fd). They allow for selection and reordering of all archived items, support parallel restoration, and are compressed by default. The “directory” format is the only format that supports parallel dumps.
A simple backup rotation script might look like this:
#!/bin/sh
table='animals'
url='postgres://username#host:port/database_name'
date=`date -Idate`
file="/path/to/your/backups/$date/$table.sql"
mkdir -p `dirname $file`
pg_dump $url -w -Fc --table=$table -f $file
To avoid hard coding the database password, -w means it will not prompt for a password and instead look for a password file. Or you can use any of many Postgres authentication options.

Copy from a valid csv file but postgres blind to find

I tried to copy csv data to a table with
#+begin_src sql :engine postgresql :dbuser postgres :dbpassword 1618 :database analysis
COPY us_counties_2010
FROM 'data/us_counties_2010.csv'
WITH (FORMAT CSV, HEADER);
#+end_src
It report error
psql:/tmp/babel-x3dXSm/sql-in-zo3MDm:3: ERROR: could not open file "data/us_counties_2010.csv" for reading: No such file or directory
HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
The error "data/us_counties_2010.csv" for reading: No such file or directory does not exits, make no sense.
Because, it does exsit
#+BEGIN_SRC shell
ls -l 'data/us_counties_2010.csv' | sed "s/$USER/me/g"
#+END_SRC
#+RESULTS:
: -rw-rw-r-- 1 me me 1170359 Dec 7 10:22 data/us_counties_2010.csv
What's the problem? Does postgres developers invented yet another arcane path rules to prohibit users?
Where does the file exist? You are using an relative path.
When you use "COPY", what you get is:
The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
Using \copy rather than COPY will get you not only the client's permissions, but also the clients working directory when searching for the file.
File permissions? I see that "me" has permissions. What user is postgresql?

'gunzip' is not recognized as an internal or external command, operable program or batch file. System command 'gunzip' failed

I am trying to analyse my raw GNSS data on the GNSS Analyser app from here https://github.com/google/gps-measurement-tools. The installation guide includes the following step:
4.2 gunzip installation
The automatic ftp code inside GnssAnalysis will download ephemeris zip files, and attempt to
unzip them using gunzip.
Download gzip.exe from here http://ftp.gnu.org/gnu/gzip/gzip-1.9.zip
Extract the files from the zip file, rename gzip.exe to gunzip.exe
Move gunzip.exe to somewhere in your Windows path (type path in the Windows
Command Prompt to see what your path is, typically you will find a directory
C:\Windows\system32 and you can put gunzip.exe there.)
However, upon downloading gunzip, I cant find a gzip.exe file, and hence tried renaming the gzip.c and gzip.h file instead. It did not work and I got this error when attempting to process my own raw data.
I have just tried and got success to import DB from a backup file:
gzip -d < C:\Users\my-user\Downloads\my-db-backup.sql.gz | mysql -u root -p MY_DB_NAME

Does scp allow inline file renaming in destination?

For instance, I have tried this (notice sources is remote):
scp root#$node:/sourcepath/sourcefile.log /destinationpath/destinationfile.log
The other option is to rename the file afterwards, but would be more convenient to do it on the fly while the data is downloaded via scp, therein my question. Thanks.
Maybe without scp:
ssh yourserver "cat >tmpfile && mv tmpfile datafile" <datafile
This command copies the "datafile" file to a remote server under the name "tmpfile".
Only after successful copy renames the temporary file "tmpfile" to the right name "datafile" on remote host.
If copying was not successful, the remote host will be only a temporary file.
Thus, you are protected from getting no full "datafile" file.
Sorry for my English.