Replace null values to 0 in Bigquery - google-bigquery

I have null and scientific values(e) on array column in my table data. My data is in following format:
.
I do need to replace all these values to 0.

Related

Big Query Array data type related issue if the array is NULL

Please find below the table nullarraytest. The create statement:
create table nullarraytest (name array<string>, city string);
The values in the table:
insert into nullarraytest values([],"Mumbai");
insert into nullarraytest values(["abc","def"],"Pune");
insert into nullarraytest values(null,"Surat");
Issue/doubt:
The below query returns no data:
select city from nullarraytest where name is NULL;
It should return 2 rows "Mumbai" and "Surat".
The below query works properly as expected:
select city from nullarraytest where array_length(name)=0;
This returns 2 rows "Mumbai" and "Surat".
Why does the filter "name is null" doesn't work?
As #Jaytiger mentioned in the comment,
It’s given in this GCP Public documentation that BigQuery translates a NULL ARRAY into an empty ARRAY in the query result, although inside the query, NULL and empty ARRAYs are two distinct values. An empty array is not null. For nullable data types, NULL is a valid value. Currently, all existing data types are nullable, but conditions apply for ARRAYs.
You can also go through this StackOverflowpost.

spark sql Insert string column to struct of array type column

I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.
In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.
CREATE TABLE raw_lms.rawTable
( PrimaryOwners STRING
,Owners STRING
)
USING DELTA LOCATION 'xxxx/rawTable'
CREATE TABLE tl_lms.transformedTable
( PrimaryOwners array<struct<Id:STRING>>
,Owners array<struct<Id:STRING>>
)
USING DELTA LOCATION 'xxxx/transformedTable'
Raw table has the below values populated: Eg.
INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
"[]"
)
I try to insert to transform table and get the below error:
INSERT INTO tl_lms.transformedTable
SELECT PrimaryOwners,
Owners
FROM raw_lms.rawTable
Error in SQL statement: AnalysisException: cannot resolve
'spark_catalog.raw_lms.rawTable.PrimaryOwners' due to data type
mismatch: cannot cast string to array<struct<Id:string>>;
I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.
Thanks for your time and help.
As the error messages states, you can't insert a string as array. You need to use array and named_struct functions.
Change the type of raw table to correct type and types not strings and try this:
INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')),
null
);
Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:
INSERT INTO tl_lms.transformedTable
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable

Postgres: How to find rows in which a specific column has 'empty' values?

I need to find rows which have a empty values in a column after a CSV import.
The column is an INTEGER column, hence the
...
where col = ''
doesn't work for me.
You can check for empty values using
where col is null
If you want to select null as 0 (or any other default value) use coalesce
select coalesce(max(col), 0)

NULL data converted into '\N' for numeric columns in hive.?

I created a hive table which has numeric columns such as double and string columns.My file contains some NULL values for both numeric and string columns. When I try to load a file into this table, NULL values for the numeric columns is replaced by '\N' in file.I know this is hive property that handle null values for numeric type columns but i want to prevent it or Is there any way that i can change NULL into something else instead of '\N'.
By default NULL values are written in the data files as \N and \Nin the data files are being interpreted as NULL when querying the data.
This can be overridden by using TBLPROPERTIES('serialization.null.format'=...)
E.g.
TBLPROPERTIES('serialization.null.format'='') means the following:
An empty field in the data files will be treated as NULL when you query the table
When inserting rows to the table, NULL values will be written to the data files as empty fields
This property can be declared as part of the table creation
create table mytable (...)
tblproperties('serialization.null.format'='')
;
and can be changed later on
alter table mytable set tblproperties('serialization.null.format'='')
;

Postgres Data type conversion

I have this dataset that's in a SQL format. However the DATE type needs to be converted into a different format because I get the following error
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
ERROR: date/time field value out of range: "28-10-96"
LINE 58: ...040','2','10','','P13-00206','','','','','1-3-95','28-10-96'...
^
HINT: Perhaps you need a different "datestyle" setting.
I've definitely read the documentation on date format
http://www.postgresql.org/docs/current/static/datatype-datetime.html
But my question is how do I convert all of the dates in a proper format without going through all the 500 or so data rows and making sure each one is correct before inserting into a DB. Backend is handle by rails, but I figured going through SQL to cleaning it up will be best here.
I have a CREATE TABLE statement above this dataset, and mind you the data set was given to be via a DBF converter/external source
Here's part of my dataset
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
(1,'ACACIA WINERY','PROD','8000','34436','','0','50000','250000','APPT','75','525','27375','3612','63','30987','22','97','x','001_02169-MOD_AcaciaWinery','','','','','1-11-79','1-9-82','34436','x','125000','Los Carneros','1');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('2','AETNA SPRING CELLARS','PROD','2500','2500','','0','2000','20000','TST APPT','0','3','156','0','0','156','1','10','x','','','','','x','1-4-86','1-6-86','2500','','0','Napa Valley','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('3','ALTA VINEYARD CELLAR','PROD','480','480','','0','5000','5000','NO','0','4','208','0','0','208','4','6','x','003_U-387879','','','','','2-5-79','1-9-80','480','','0','Diamond Mountain District','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('4','BLACK STALLION','PROD','43600','43600','','0','100000','100000','PUB','50','350','18200','0','0','18200','2','45','x','P13-00391','','','','','1-5-80','1-9-85','43600','','0','Oak Knoll District of Napa Valley','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('5','ALTAMURA WINERY','PROD','11800','11800','x','3115','50000','50000','APPT','0','20','1040','0','0','1040','2','10','','P13-00206','','','','','1-3-95','28-10-96','14915','x','50000','Napa Valley','4');
The dates in your data set are in the form of a string. Since they are not in the default datestyle (which is YYYY-MM-DD) you should explicitly convert them to a date as follows:
to_date('1-5-80', 'DD-MM-YY')
If you store the data in a timestamp instead, use
to_timestamp('1-5-80', 'DD-MM-YY')
If you are given the data set in the form of the INSERT statements that you show, then first load all the data as simple strings into varchar columns, then add date columns and do an UPDATE (and similarly for integer and boolean columns):
UPDATE my_table
SET estab = to_date(ESTAB_DATE, 'DD-MM-YY'), -- column estab of type date
apprv = to_date(APPRV_DATE, 'DD-MM-YY'), -- etc
...
When the update is done you can ALTER TABLE to drop the text columns with dates (integers, booleans).