Importing CSV to BigQuery parsing error - google-bigquery

I have a CSV with 225 rows and see BigQuery expects the schema to be
column1_name:data_type,
I have removed all spaces however BigQuery doesn't like my schema it returns "Parsing Error" and returns the first field name.
my pasted schema looks like this (partial)
transaction_status:STRING(6),dollarsobligated:NUMERIC(10,2),baseandexercisedoptionsvalue:NUMERIC(10,2),baseandalloptionsvalue:NUMERIC(12,2),maj_agency_cat:STRING(35),mod_agency:STRING(37),maj_fund_agency_cat:STRING(35),contractingofficeagencyid:STRING(37),contractingofficeid:STRING(51),

Try removing the dimensioning, not needed. Declaring "String" is optional, as it's the default. Instead of numeric, do "float".
So
transaction_status:STRING(6),dollarsobligated:NUMERIC(10,2),baseandexercisedoptionsvalue:NUMERIC(10,2),baseandalloptionsvalue:NUMERIC(12,2),maj_agency_cat:STRING(35),mod_agency:STRING(37),maj_fund_agency_cat:STRING(35),contractingofficeagencyid:STRING(37),contractingofficeid:STRING(51),
should be
transaction_status,dollarsobligated:float,baseandexercisedoptionsvalue:float,baseandalloptionsvalue:float,maj_agency_cat,mod_agency,maj_fund_agency_cat,contractingofficeagencyid,contractingofficeid

Related

Getting "unexpected error" message when trying to create table on BigQuery

I get an "unexpected error" message every time I try to create a table on BigQuery. I've tried inputting and omitting the schema and made sure that the file was compatible with csv format.
I was having the same issue.
I did a couple of things to clean up the csv file and I was then able to upload it successfully
Remove commas from any numerical data
If using auto detect for schema remove any spaces between headings
Remove header data completely and define schema
I HAD THE SAME PROBLEM AND I WRITE DATA TYPES IN CAPITAL.
and my problem of "unexpected error was gone"
FOR EX- TYPE (STRING) NOT (string)
TYPE (INTEGER) NOT (integer) same on.......

What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.
What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

BigQuery loading JSON file: How to ignore a field or rename it?

I have a NEWLINE_DELIMITED_JSON file on my computer and I would like to load it into a BigQuery table.
I have 3 keys in each lines. One of those is a timestamp: I would like to remove it and not get a "timestamp" column in my BigQuery table.
One of them has a wrong name: the name of the key in the JSON file is "special_id" but I would like to load it in a column named "main_id".
I can't find a way to do that while specifying the schema of the table created while loading. Is there a way to do this ?
Thanks you
For that level of flexibility:
Don't import as JSON
Import as CSV (define null character as separator)
Each line has only one column - the full JSON string
Parse inside BigQuery with maximum flexibility (JSON parsing functions and even JS)

SQL SERVER, importing from CSV file. Data conversion error

I am trying to import data from a CSV file into a table.
I am presented by this error:
"Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data
conversion for column "dstSupport" returned status value 2 and status
text "The value could not be converted because of a potential loss of
data."."
However, I do not even need to convert this column, as you see in the image below:
The column is of type bit, and I used DST_BOOL.
This ended up being a parsing error. I had a string with commas within it, and these strings within the commas were being placed in my bit column. I fixed it by changing the delimiter from a comma to a pipe

How to prevent double quotes being escaped when importing data from a text file into a hive table

I have this datatype info map and if I select this field in hive console, the result would look something like this
{"a":"value1","b":"value2"}.
How do I represent this data in a text file so that when I import it to the hive table, it is properly represented. I mean should my text file should have something like this ?
a:value1,b:value2
Are you trying to load JSON documents in Hive? There are serde's available to load and query JSON data in hive.
In your case, "a" and "b" would become column names (header)