Issues loading CSV into BigQuery table - google-bigquery

Im trying to create a BigQuery table using a pretty simple csv file I have stored in GCS.
I keep getting the same error over and over again:
Could not parse '1/1/2008' as datetime for field XXX
I've checked that the csv file isn't corrupted, and I've managed to upload everything into one column so the file is readable by BigQuery.
I've added the word NULL to any empty fields thinking consecutive delimiters may be causing the issues but I am still facing the same issue.
I know data, I understand data and CSV files.

BigQuery cannot cast '1/1/2008' as DATETIME and rather would expecting something like '2008-1-1'
So, you can either modify your CSV file or just use STRING for that XXX field and than translate it into DATETIME in your queries - like below
#standardSQL
SELECT PARSE_DATETIME('%d/%m/%Y', '1/1/2008')

Related

Importing CSV file but getting timestamp error

I'm trying to import CSV files into BigQuery and on any of the hourly reports I attempt to upload it gives the code
Error while reading data, error message: Could not parse 4/12/2016 12:00:00 AM as TIMESTAMP for field SleepDay (position 1) starting at location 65 with message Invalid time zone: AM
I get that the format is trying to use AM as a timezone and causing an error but I'm not sure how best to work around it. All of the hourly entries will have AM or PM after the date-time and that will be thousands of entries.
I'm using the autodetect for my schema and I believe that's where the issue is coming up, but I'm not sure what to put in the edit as text schema option to fix it
To successfully parse an imported string to timestamp in Bigquery, the string must be in the ISO 8601 format.
YYYY-MM-DDThh:mm:ss.sss
If your source data is not available in this format, then try the below approach.
Import the CSV into a temporary table by providing explicit schema, where timestamp fields are strings.
2. Select the data from the created temporary table, use the BigQuery PARSE_TIMESTAMP function as specified below and write to the permanent table.
INSERT INTO `example_project.example_dataset.permanent_table`
SELECT
PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S %p',time_stamp) as time_stamp,
value
FROM `example_project.example_dataset.temporary_table`;

BigQuery load - NULL is treating as string instead of empty

My requirement is to pull the data from Different sources(Facebook,youtube, double click search etc) and load into BigQuery. When I try to pull the data, in some of the sources I was getting "NULL" when the column is empty.
I tried to load the same data to BigQuery and BigQuery is treating as a string instead of NULL(empty).
Right now replacing ""(empty string) where NULL is there before loading into BigQuery. Instead of doing this is there any way to load the file directly without any manipulations(replacing).
Thanks,
What is the file format of source file e.g. CSV, New Line Delimited JSON, Avro etc?
The reason is CSV treats an empty string as a null and the NULL is a string value. So, if you don't want to manipulate the data before loading you should save the files in NLD Json format.
As you mentioned that you are pulling data from Social Media platforms, I assume you are using their REST API and as a result it will be possible for you to save that data in NLD Json instead of CSV.
Answer to your question is there a way we can load this from web console?:
Yes, Go to your bigquery project console https://bigquery.cloud.google.com/ and create table in a dataset where you can specify the source file and table schema details.
From Comment section (for the convenience of other viewers):
Is there any option in bq commands for this?
Try this:
bq load --format=csv --skip_leading_rows=1 --null_marker="NULL" yourProject:yourDataset.yourTable ~/path/to/file/x.csv Col1:string,Col2:string,Col2:integer,Col3:string
You may consider running a command similar to: bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json
More details can be gathered from the replies to the "Best Practice to migrate data from MySQL to BigQuery" question.

How to load csv data which is control+A separated into bigquery

I'm trying to load a CSV file which is control+A separated into bigquery. What should be the option I pass for -F parameter for the bq load command? All the options I have tried are resulting in an error while loading.
I would guess that Control+A is used in some legacy formats that OP wants to load into BigQuery. From the other hand Control+A can be chosen when it is hard to select any of usually used delimiters.
My recommendation would be to load your CSV file without any delimiter, so whole row will be loaded as a one field
Assuming your rows loaded into TempTable look like below with just one column called FullRow.
'value1^Avalue2^Avalue3'
where ^A is "invisible" character
So, after you loaded your file into BigQuery - now you can parse it to separate columns and write it to final table with something like below
SELECT
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){0}(\w*)') AS col1,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){1}(\w*)') AS col2,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){2}(\w*)') AS col3
FROM TempTable
Above is confirmed to work as I used this approach multiple times. Works for both Legacy and Standard SQL

Pentaho | Issue with CSV file to Table output

I am working in Pentaho spoon. I have a requirement to load CSV file data into one table.
I have used , as delimter in CSV file. I can see correct data in preview of CSV file input step. But when I tried to insert data into Table Output step, I am getting data truncation error.
This is because I have below kind of values in one of my column.
"2,ABC Squere".
As you see, I have "," in my column value so it is truncating and throwing error.How to solve this problem?
I want to upload data in Table with this kind of values..
Here is one way of doing it
test.csv
--------
colA,colB,colC
ABC,"2,ABC Squere",test
See below the settings. The key is to use "" as encloser and , as delimiter.
you can change the delimiter say to PIPE and also keeping data as quoted text like "1,Name" this will treat the same as 1 column

How to write SQL Query that matches data from a .csv file to a table in MySQL?

Is it possible for me to write an SQL query from within PhpMyAdmin that will search for matching records from a .csv file and match them to a table in MySQL?
Basically I want to do a WHERE IN query, but I want the WHERE IN to check records in a .csv file on my local machine, not a column in the database.
Can I do this?
I'd load the .csv content into a new table, do the comparison/merge and drop the table again.
Loading .csv files into mysql tables is easy:
LOAD DATA INFILE 'path/to/industries.csv'
INTO TABLE `industries`
FIELDS TERMINATED BY ';'
IGNORE 1 LINES (`nogaCode`, `title`);
There are a lot more things you can tell the LOAD command, like what char wraps the entries, etc.
I would do the following:
Create a temporary or MEMORY table on the server
Copy the CSV file to the server
Use the LOAD DATA INFILE command
Run your comparison
There is no way to have the CSV file on the client and the table on the server and be able to compare the contents of both using only SQL.
Short answer: no, you can't.
Long answer: you'll need to build a query locally, maybe with a script (Python/PHP) or just uploading the CSV in a table and doing a JOIN query (or just the WHERE x IN(SELECT y FROM mytmmpTABLE...))
For anyone new asking, there is this new tool that i used : Write SQL on CSV file