importing a text file using pgAdmin - sql

I have just downloaded pgAdmin 1.14.3 in an effort to import, query, and manage large textfiles. These textfiles are either quote comma quote delimited or tab delimited (they come as quote comma quote and I edited many for use with another software). While version 1.16 allows an import function, it has not been released yet and I am wondering how to import data into a newly created table using pgAdmin.
The text files range from 12MB to 2GB, so I'm looking for a comprehensive solution that would not involve importing row by row. I tried this with phppgadmin, but ran into file size limitations embedded in the php.ini file (separate post) and am trying this as a possible workaround. I'm a little new to SQL, so not really sure of all the commands possible at my fingertips. Any helps is appreciated - thanks!

You can issue a COPY statement, like this:
COPY table_name (column_name)
FROM 'd:\test.sql';
Query returned successfully: 6 rows affected, 31 ms execution time.
See the documentation here:
http://www.postgresql.org/docs/9.1/static/sql-copy.html
Note that I did not test this in PgAdmin for large files, but using psql I have never seen a case where the file had been too big for COPY.

Related

Import MySQL dump into R (without requiring MySQL server)

Packages like RMySQL and sqldf allow one to interface with local or remote database servers. I'm creating a portable project which involves importing sql data in cases (or on devices) which do not always have access to a running server, but which do always have access to the latest .sql dump of the database.
The goal seems simple enough: import an .sql dump into R without the involvement of a MySQL server. More specifically, I'd like to create a list of lists in which the elements correspond to any databases defined in the .sql dump (there may be multiple), and those elements in turn consist of the tables in those databases.
To make this reproducible, let's take the sample sportsdb SQL file here — if you unzip it it's called sportsdb_sample_mysql_20080303.sql.
One would think sqldf might be able to do it:
read.csv.sql('sportsdb_sample_mysql_20080303.sql', sql="SELECT * FROM addresses")
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: no such table: addresses
This even though there certainly is a table addresses in the dump. This post on the sqldf list mentions the same error, but no solution.
Then there is an sql.reader function in the package ProjectTemplate, which looks promising. Poking around, the source for the function can be found here, and it assumes a running database server and relies on RMySQL — not what I need.
So... we seem to be running out of options. Any help from the hivemind appreciated!
(To reiterate, I am not looking for a solution that relies on access to an SQL server; that's easy with dbReadTable from the RMySQL package. I would very much like to bypass the server and get the data straight from the .sql dump file.)
depending on what you want to extract from the table, here is how you can play around with the data
numLines <- R.utils::countLines("sportsdb_sample_mysql_20080303.sql")
# [1] 81266
linesInDB <- readLines("sportsdb_sample_mysql_20080303.sql",n=60)
Then you can do some regex to get tables names (after CREATE TABLE), column names (between first brackets) and VALUES (lines after CREATE TABLE and between second brackets)
Reference:
Reverse engineering a mysqldump output with MySQL Workbench gives "statement starting from pointed line contains non UTF8 characters" error
EDIT: in response to OP's answer, if i interpret the python script correct, it is also reading it line by line, filter for INSERT INTO lines, parse as csv, then write to file. This is very similar to my original suggestion. My version below in R. If the file size is too large, it would be better to read in the file in chunks using some other R package
options(stringsAsFactors=F)
library(utils)
library(stringi)
library(plyr)
mysqldumpfile <- "sportsdb_sample_mysql_20080303.sql"
allLines <- readLines(mysqldumpfile)
insertLines <- allLines[which(stri_detect_fixed(allLines, "INSERT INTO"))]
allwords <- data.frame(stri_extract_all_words(insertLines, " "))
d_ply(allwords, .(X3), function(x) {
#x <- split(allwords, allwords$X3)[["baseball_offensive_stats"]]
print(x[1,3])
#find where the header/data columns start and end
valuesCol <- which(x[1,]=="VALUES")
lastCols <- which(apply(x, 2, function(y) all(is.na(y))))
datLastCol <- head(c(lastCols, ncol(x)+1), 1) - 1
#format and prepare for write to file
df <- data.frame(x[,(valuesCol+1):datLastCol])
df <- setNames(df, x[1,4:(valuesCol-1)])
#type convert before writing to file otherwise its all strings
df[] <- apply(df, 2, type.convert)
#write to file
write.csv(df, paste0(x[1,3],".csv"), row.names=F)
})
I don't think you will find a way to import a sql dump (which contains multiple tables with references) and then perform arbitrary sql queries on them within R. This would basically require the R package to run a complete database server (compatible with the one creating the dump) within R.
I would suggest exporting the tables/select statements you need as CSV from your database (see here). If you can only work from the dump and don't want to setup a server for the conversion you could use some simple regular expressions to turn the insert statements in your dump into a bunch of CSV files for the tables using a tool of your choosing like sed or awk (or even R as suggested by the other answer but that might be rather slow for this file size).
I'll reluctantly answer my own question, using the input from +bnord and +chinsoon12 (who both contributed pieces of the puzzle).
Short answer: there is no out of the box solution. As +bnord notes, it would be preferred to fix it server-side (e.g., by exporting to CSV format with mysqldump). However, as my question indicated, I'm looking for a solution that allows me to work with the sql dump, bypassing the server.
So if we have to work with the dump, how? The hardcore, manual way is to use regular expressions to convert INSERT statements to CSV, either (1) outside R using sed and awk on the .sql text file (+bnord), or (2) inside R with grep and gsub on strings loaded with readLines (+chinsoon12).
Some good soul wrote a python script that can convert sql dumps to CSV. This requires yet another piece of (potentially non-trivial to install/maintain) software, so it's not the answer I was hoping for, but it does look like a good model in case anyone wants to reinvent the wheel in R.
For now I'll stick with my modus operandi of (on Windows) running MySQL Community Server and using WorkBench to import the dump, then talk to the local server from R. A very indirect method that is a pain in the ass because of the inscrutable access rights system of MySQL (especially annoying since it's all just there in an ASCII text file), but the only way for now, it seems. Thanks all for your input!
(If a better, more complete answer comes along I'll gladly accept that, turning this into a comment if possible.)

What is the best way to import data using insert statements into a table in MS SQL?

I have exported a table from another db into an .sql file as insert statements.
The exported file has around 350k lines in it.
When i try to simply run them, I get a "not enough memory" error before the execution even starts.
How can import this file easily?
Thanks in advance,
Orkun
You have to manually split sql file into smaller pieces. Use Notepad++ or some other editor capable to handle huge files.
Also, since you wrote that you have ONE table, you could try with utility or editor which can automatically split file into pieces of predefined size.
Use SQLCMD utility.. search MICROSOFT documentation.. with that you just need to gives some parameters. One of them is file path.. no need to go through the pain of splitting and other jugglery..

Generate DDL SQL create table statement after scanning CSV file

Are there any command line tools (Linux, Mac, and/or Windows) that I could use to scan a delimited file and output a DDL create table statement with the data type determined for me?
Did some googling, but couldn't find anything. Was wondering if others might know, thanks!
DDL-generator can do this. It can generate DDL's for YAML, JSON, CSV, Pickle and HTML (although I don't know how the last one works). I just tried it on some data exported from Salesforce and it worked pretty well. Note you need to use it with Python 3, I could not get it to work with Python 2.7.
You can also try https://github.com/mshanu/idli. It can take csv file as input and can generate create statement with appropriate types.It can generate for mysql, oracle and postgres. I am actively working on this and happy to receive feedback for future improvement

Import part of MySQL dump (not all of it)

I'm going to do some stress tests and right now I have a really really huge MySQL dump file in hand that could be used as the benchmark.
There's only one table inside the dump.
What's awkward is that my server doesn't have that much disk space to actually hold this table. So I would like to just import some random part of the dump, not all of them.
Is it possible? If yes, what does the command line look like?
I have created a shell script for this. If you are on a unix based system, use
https://github.com/JoyceBabu/MySQL-Dump-Table-Extractor
Invoke the script using ./extract_table.sh sqlfile.sql
To extract a single table type the table name
To extract all tables from table1 to table2 type table1 table2
To view a list of all available tables type LIST
MySQL dump files are simply text files full of SQL statements. Write a simple program to read the dump file and write random parts of it to a new dump file.
Couldn't you just manually split the file? These are just flat text files...so open it up in your favorite text editor and delete half of the file (or however much you want).

Importing an .RPT (6 gigs) file into SQL Server 2005

I'm trying to import two seperate .RPT files into SQL, one is small, one is large. Both have issues with determining where the columns are seperated.
My solution for this was to import the file into access, define the columns and then save it as a txt file.
This worked perfectly.
The problem however is the larger file is 6 gigs and MS Access won't allow me to open it. When trying to change the extension to simply .txt and importing it into SQL, everything comes down under one column (despite there being 10) and there is no way to accurately seperate the data.
Please help!
As Tony stated Access has a hard 2GB limit on database size.
You don't say what kind of file the .RPT file is. If it is a text file, then you could break it into smaller chunks by reading it line by line and appending it into temporary files. Then import/export these smaller files one at a time.
Keep in mind the 2GB limit is on the Access database, so your temporary text files will need to be somewhat smaller because the import will likely introduce some additional overhead. Also, you may need to compact/repair the database in between import/export cycles to reclaim space in the database; simply deleting the records is not enough.
If the file has column delimiters or fixed column widths you can try the following in SQL Management Studio:
Right click on a database, select "Tasks" and then "Import data...". This will take you through a wizard where you can define the source columns and map them to an existing or new table.