Purpose of Json schema file while loading data into Big query from a csv file - google-bigquery

Can someone please help me by stating the purpose of providing the json schema file while loading a file to BQtable using bq command. what are the advantages?
Dose this file help to maintain data integrity by avoiding any column swap ?
Regards,
Sreekanth

Specifying a JSON schema--instead of relying on auto-detect--means that you are ensured to get the expected types for each column being loaded. If you have data that looks like this, for example:
1,'foo',true
2,'bar',false
3,'baz',true
Schema auto-detection would infer that the type of the first column is an INTEGER (a.k.a. INT64). Maybe you plan to load more data in the future, though, that looks like this:
3.14,'foo',true
1.59,'bar',false
-2.001,'baz',true
In that case, you probably want the first column to have type FLOAT (a.k.a. FLOAT64) instead. If you provide a schema when you load the first file, you can specify a type of FLOAT for that column explicitly.

Related

bigquery loadJob of a json - forcing a field to be String in the schema auto-detect

if in the beginning the json contains
"label": "foo"
and later it is
"label": "123"
bigquery returns
Invalid schema update. Field label has changed type from STRING to INTEGER
although it is "123" and not 123.
file is being loaded with
autodetect: true
is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only {"label" : "123"} and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads

What happens if I send integers to a BigQuery field "string"?

One of the columns I send (in my code) to BigQuery is integers. I added the columns to BigQuery and I was too fast and added them as type string.
Will they be automatically converted? Or will the data be totally corrupted (= I cannot trust at all the resulting string)?
Data shouldn't be automatically converted as this would destroy the purpose of having a table schema.
What I've seen people doing is saving a whole json line as string and then processing this string inside of BigQuery. Other than that, if you try to save values not correspondent to the field schema definition, you should see an error being thrown, like so:
If you need to change a table schema's definition, you can check this tutorial on updating a table schema.
Actually BigQuery converted automatically the integers that I have sent it to string, so my table populates ok

Import PostgreSQL dump into SQL Server - data type errors

I have some data which was dumped from a PostgreSQL database (allegedly, using pg_dump) which needs to get imported into SQL Server.
While the data types are ok, I am running into an issue where there seems to be a placeholder for a NULL. I see a backslash followed by an uppercase N in many fields. Below is a snippet of the data, as viewed from within Excel. Left column has a Boolean data type, and the right one has an integer as the data type
Some of these are supposed to be of the Boolean datatype, and having two characters in there is most certainly not going to fly.
Here's what I tried so far:
Import via dirty read - keeping whatever datatypes SSIS decided each field had; to no avail. There were error messages about truncation on all of the boolean fields.
Creating a table for the data based on the correct data types, though this was more fun... I needed to do the same as in the dirty read, as the source would otherwise not load properly. There was also a need to transform the data into the correct data type for insertion into the destination data source; yet, I am getting truncation issues, when it most certainly shouldn't be.
Here is a sample expression in my derived column transformation editor:
(DT_BOOL)REPLACE(observation,"\\N","")
The data type should be Boolean.
Any suggestion would be really helpful!
Thanks!
Since I was unable to circumvent the SSIS rules in order to get my data into my tables without an error, I took the quick-and-dirty approach.
The solution which worked for me was to have the source data read each column as if it were a string, and the destination table had all fields be of the datatype VARCHAR. This destination table will be used as a staging table, once in SS, I can manipulate as needed.
Thank you #cha for your input.

Apache NiFi: InferAvroSchema infers signed values as string

I'm setting up a pipeline in NiFi where I get JSON records which I then use to make a request to an API. The response I get would have both numeric and textual data. I then have to write this data to Hive. I use InferAvroSchema to infer the schema. Some numeric values are signed values like -2.46,-0.1 While inferring the type, the processor considers them as string instead of double or float or decimal type.
I know we can hard code our AVRO schema in the processors but I thought making it more dynamic by utilizing the InferAvroSchema would be even better. Is there any other way we can overcome/resolve this?
InferAvroSchema is good for guessing an initial schema, but once you need something more specific it is better to remove InferAvroSchema and provide the exact schema you need.

Tables using LIKE may only reference flat structures

I wanna have a table parameter in a RFC function module of type CGPL_TEXT1, which uses the domain type TEXT40, which is a char 40.
I tried to create it:
IT_PARTS_DESCRIPTION LIKE CGPL_TEXT1
But I keep getting this error
tables using like may only reference flat structures
I am also not able to use TYPE. If I do so, I get this error:
Flat types may only be referenced using LIKE for table parameters
Don't go there. For RFC-enabled function modules, always use a structure as a line type for your table. The RFC protocol itself also supports unstructured tables, but many adapters don't. So you should
declare a data dictionary structure Z_MY_PARTS_DATA with a single field DESCRIPTION TYPE CGPL_TEXT2
declare a data dictionary table type Z_MY_PARTS_TABLE using this structure
use that table type in your function module.
Look inside the dictionary for a table type which has only one column representing Your Text.
If You cannot find it, just go the proper way and define a z structure and a z tabletype based on that structure. This is the proper way and I also prefer to use this ( even sometimes, when I would not really need it, i do this ).... because the structures sand table types can be documented.