How to load csv data which is control+A separated into bigquery - google-bigquery

I'm trying to load a CSV file which is control+A separated into bigquery. What should be the option I pass for -F parameter for the bq load command? All the options I have tried are resulting in an error while loading.

I would guess that Control+A is used in some legacy formats that OP wants to load into BigQuery. From the other hand Control+A can be chosen when it is hard to select any of usually used delimiters.
My recommendation would be to load your CSV file without any delimiter, so whole row will be loaded as a one field
Assuming your rows loaded into TempTable look like below with just one column called FullRow.
'value1^Avalue2^Avalue3'
where ^A is "invisible" character
So, after you loaded your file into BigQuery - now you can parse it to separate columns and write it to final table with something like below
SELECT
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){0}(\w*)') AS col1,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){1}(\w*)') AS col2,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){2}(\w*)') AS col3
FROM TempTable
Above is confirmed to work as I used this approach multiple times. Works for both Legacy and Standard SQL

Related

Issues loading CSV into BigQuery table

Im trying to create a BigQuery table using a pretty simple csv file I have stored in GCS.
I keep getting the same error over and over again:
Could not parse '1/1/2008' as datetime for field XXX
I've checked that the csv file isn't corrupted, and I've managed to upload everything into one column so the file is readable by BigQuery.
I've added the word NULL to any empty fields thinking consecutive delimiters may be causing the issues but I am still facing the same issue.
I know data, I understand data and CSV files.
BigQuery cannot cast '1/1/2008' as DATETIME and rather would expecting something like '2008-1-1'
So, you can either modify your CSV file or just use STRING for that XXX field and than translate it into DATETIME in your queries - like below
#standardSQL
SELECT PARSE_DATETIME('%d/%m/%Y', '1/1/2008')

BigQuery load - NULL is treating as string instead of empty

My requirement is to pull the data from Different sources(Facebook,youtube, double click search etc) and load into BigQuery. When I try to pull the data, in some of the sources I was getting "NULL" when the column is empty.
I tried to load the same data to BigQuery and BigQuery is treating as a string instead of NULL(empty).
Right now replacing ""(empty string) where NULL is there before loading into BigQuery. Instead of doing this is there any way to load the file directly without any manipulations(replacing).
Thanks,
What is the file format of source file e.g. CSV, New Line Delimited JSON, Avro etc?
The reason is CSV treats an empty string as a null and the NULL is a string value. So, if you don't want to manipulate the data before loading you should save the files in NLD Json format.
As you mentioned that you are pulling data from Social Media platforms, I assume you are using their REST API and as a result it will be possible for you to save that data in NLD Json instead of CSV.
Answer to your question is there a way we can load this from web console?:
Yes, Go to your bigquery project console https://bigquery.cloud.google.com/ and create table in a dataset where you can specify the source file and table schema details.
From Comment section (for the convenience of other viewers):
Is there any option in bq commands for this?
Try this:
bq load --format=csv --skip_leading_rows=1 --null_marker="NULL" yourProject:yourDataset.yourTable ~/path/to/file/x.csv Col1:string,Col2:string,Col2:integer,Col3:string
You may consider running a command similar to: bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json
More details can be gathered from the replies to the "Best Practice to migrate data from MySQL to BigQuery" question.

How can I move data from spreadsheet to a database through SQL

I want to move the data from a spreadsheet into a database. The program I am using is called SQLWorkbenchJ. I am kinda of lost and don't really know where to start. Is there any tips or ways that might point me in the right direction.
Sql Workbench/J provides the WbImportcommand in order to load a text file into a DB table. So if you save your spreadsheet file in the CSV (comma separed value) format you can then load it in a table using this command.
Here is an example to load the text file CLASSIFICATION_CODE.csvhaving ,as field delimiter and ^ as quoting character in the CLASSIFICATION_CODEDB table.
WbImport -type=text
-file='C:\dev\CLASSIFICATION_CODE.csv'
-delimiter=,
-table=CLASSIFICATION_CODE
-quoteChar=^
-badfile='C:\dev\rejected'
-continueOnError=true
-multiLine=true
-emptyStringIsNull=false;
You might not need all the parameters of the example. Refer to the documentation to find the ones you need.
If the data you have in your spreadsheet are heterogeneous (e.g. your spreadsheet has two books) then split them in two files in order to store them in separate DB tables.

"UNLOAD" data tables from AWS Redshift and make them readable as CSV

I am currently trying to move several data tables in my current AWS instance's redshift database to a new database in a different AWS instance (for background my company has acquired a new one and we need to consolidate to on instance of AWS).
I am using the UNLOAD command below on a table and I plan on making that table a csv then uploading that file to the destination AWS' S3 and using the COPY command to finish moving the table.
unload ('select * from table1')
to 's3://destination_folder'
CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXX'
ADDQUOTES
DELIMITER AS ','
PARALLEL OFF;
My issue is that when I change the file type to .csv and open the file I get inconsistencies with the data. there are areas where many rows are skipped and on some rows when the expected columns end I get additional columns with the value "f" for unknown reasons. Any help on how I could achieve this transfer would be greatly appreciated.
EDIT 1: It looks like fields with quotes are having the quotes removed. Additionally fields with commas are having the commas separated away. I've identified some fields with quotes and commas and they are throwing everything off. Would the addquotes clause I have apply to the entire field regardless of whether there are quotes and commas within the field?
Default document will have extension as txt and with quotes. Try to open it with Excel and then save as csv file.
refer https://help.xero.com/Q_ConvertTXT

importing excel table into database

I have a following table in xlsx format which I would like to import into the my sql database:
The table is pretty complicated and I only want the records after '1)HEADING'
I have been looking at php libraries to import into sql but they only seem to be for simple excel files.
You have two ways to realize that :
First method :
1) Export it into some text format. The easiest will probably be a tab-delimited version, but CSV can work as well.
2) Use the load data capability. See http://dev.mysql.com/doc/refman/5.1/en/load-data.html
3) Look half way down the page, as it will gives a good example for tab separated data:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\'
4) Check your data. Sometimes quoting or escaping has problems, and you need to adjust your source, import command-- or it may just be easier to post-process via SQL.
Second method :
There's a simple online tool that can do this called sqlizer.io.
You upload an XLSX file to it, enter a sheet name and cell range, and it will generate a CREATE TABLE statement and a bunch of INSERT statements to import all your data into a MySQL database.