Loading huge csv file using COPY

Loading huge csv file using COPY - sql

I am loading CSV file using COPY.
COPY cts FROM 'C:\...\cts.csv' using DELIMITERS',';
However, error comes out
ERROR: invalid input syntax for type double precision: ""
CONTEXT: COPY testdata, line 7, column latitude: ""
How to fix it please?

Looks like your CSV isn't quite formatted correctly. "" isn't a number, and numbers don't need to be be quoted in CSV.
I find it's usually easier in PostgreSQL to create a staging import table with all text columns, and import CSVs to there first. Then do a cleanup query to put the CSV data into the real table.

Related

Converting a massive JSON file to CSV

I have a JSON file which is 48MB (collection of tweets I data mined). I need to convert the JSON file to CSV so I can import it into a SQL database and cleanse it.
I've tried every JSON to CSV converter but they all come back with the same result of "file exceeds limits" / the file is too large. Is there a good method of converting such a massive JSON file to CSV within a short period of time?
Thank you!

A 48mb json file is pretty small. You should be able to load the data into memory using something like this
import json
with open('data.json') as data_file:
data = json.load(data_file)
Dependending on how you wrote to the json file, data may be a list, which contains many dictionaries. Try running:
type(data)
If the type is a list, then iterate over each element and inspect it. For instance:
for row in data:
print(type(row))
# print(row.keys())
If row is a dict instance, then inspect the keys and within the loop, start building up what each row of the CSV should contain, then you can either use pandas, the csv module or just open a file and write line by line with commas yourself.
So maybe something like:
import json
with open('data.json') as data_file:
data = json.load(data_file)
with open('some_file.txt', 'w') as f:
for row in data:
user = row['username']
text = row['tweet_text']
created = row['timestamp']
joined = ",".join([user, text, created])
f.write(joined)
You may still run into issues with unicode characters, commas within your data, etc...but this is a general guide.

psycopg2: export csv to database, dealing with e+ expression

I have a csv file containing
numbers like "1.456e+07"
and I am using function "copy_expert" to export the file to database
but I am getting error
psycopg2.DataError: invalid input syntax for integer: "1.5637e+07"
I notice that I can insert "100" as an integer, but when I do "1.5637e+07" with qoute, it doesn't work.
I am using pandas dataframe's to_csv to generate the csv files. not sure how to get rid of qoute for integer like "1.5637e+07" only (I have string column), or whether there is other solution.

I find out the solution
Normally, pandas doesn't put quotes around number. However, I set float_format parameter which causes this. I reset
quoting=csv.QUOTE_MINIMAL
in the function call and the quotes go away.

'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery

The culprit line is as follows. It should be composed of 14 columns, with one of the column, starting with 'Hi I'm Niger...', covering multiple line with line feeds.
17935,9a7105ee-30c8-4a6d-9374-10875b7d6288.jpg,"""top""=>""0"", ""left""=>""0"", ""width""=>""180"", ""height""=>""180""",,"",2015-07-26 19:33:57.292058,2015-07-26 20:25:30.068887,fe43876f-1b2c-464a-aa20-bf335ed3ff62,c68c8c70-bc2b-11e4-90a1-22000b21105f,{},2e790350-15fb-0133-2cb8-22000ba51078,"Hi I'm Nigerian so wish to study in sweden.
so I'm Undergraduate student I want study Engineering.
Thanks.","",{}
When loading this csv data into BigQuery via command bq load --replace --source_format=CSV -F"," ..., Error complains. Could anyone give me an solution to this BigQuery Load Data command?
- File: 0 / Line:17192 / Field:12: Missing close double quote (")
character: field starts with: <Hi I'm N>
- File: 0 / Line:17193: Too few columns: expected 14 column(s) but
got 1 column(s). For additional help: http://goo.gl/RWuPQ
- File: 0 / Line:17194: Too few columns: expected 14 column(s) but
got 3 column(s). For additional help: http://goo.gl/RWuPQ

If you are loading CSV with embedded newlines, you need to specify allowQuotedNewlines.
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowQuotedNewlines
The BigQuery default is to assume that CSV data does not contain newlines. This allows for a much higher parsing throughput when dealing with large data files since the input files can be split at arbitrary newlines. If your data contains newlines within strings, each file needs to be parsed linearly by a single machine.

Make sure you include this line before loading data to BigQuery: 'job_config.allow_quoted_newlines = True'
job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True

If you trying to load a CSV file to a table from the BigQuery google console make sure you select the Advanced option -> Quoted new lines.

Postgres extra data after last expected column CSV

I have a CSV like this
something,other,5,"The problem, is "right" here"
to import I write
COPY table_name FROM '/tmp/data.csv' CSV HEADER;
now, I think that the problem is the double quote inside the doble quote, am I right?
So, how can I fix it?
Thanks.

If you ever have doubt, you can just put the data directly into a table and then export the contents using the same copy command (only do "copy to"). Or, you can put it in the spreadsheet of your choice and export to CSV.
something,other,5,"The problem is ""right"" here"

how to import flat file source to database using sql

im currently want to inport my data from flat file to the database.
the flat file is in a txt file. in that txt file, i save a list of URLs. example:
http://www.mimi.com/Hotels-g303188-Rurrenabaque-Hotels.html
im using the SQL Server Import and Export wizard to do it. but when the time of execution, it has error saying
Error 0xc02020a1:
Data Flow Task 1: Data conversion failed. The data conversion for column
"Column 0" returned status value 4 and status text "Text was truncated or one
or more characters had no match in the target code page.".
can anyone help?..

You get this error because the text is too long for the column youve chosen to put it in.

Text was truncated or
You might want to check the size of the database column vis-a-vis your input data. Does the longest URL less than the column width?
one or more characters had no match in the target code page.".
Check if your input file has any special characters. An easy way to check this would be to save your file in ANSI (Notepad > Save As > Encoding = ANSI). Note - you'd still have to select the right code page so that the import interprets your input text correctly.
Here's a very nice link that has some background on what code pages are - http://www.joelonsoftware.com/articles/Unicode.html

Note you can also change the target column data type (to text stream for example) in the Datasource->Advanced section

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Loading huge csv file using COPY - sql

I am loading CSV file using COPY. COPY cts FROM 'C:\...\cts.csv' using DELIMITERS','; However, error comes out ERROR: invalid input syntax for type double precision: "" CONTEXT: COPY testdata, line 7, column latitude: "" How to fix it please?

Related

Converting a massive JSON file to CSV

psycopg2: export csv to database, dealing with e+ expression

'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery

Postgres extra data after last expected column CSV

how to import flat file source to database using sql

Categories

Resources