CSV upload into BigQuery partitioned table using a string field as partition - google-bigquery

I need to upload csv file into bigquery table. My question is if "datestamp_column" is STRING how I can use it as partition field?
An example value from "datestamp_column": 2022-11-25T12:56:48.926500Z
def upload_to_bq():
client = bigquery.Client())
job_config = bigquery.LoadJobConfig(
schema = client.schema_from_json("schemaa.json"),
skip_leading_rows=1,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="datestamp_column",
expiration_ms=7776000000, # 90 days.
),
)
This is failing as it is complaining datestamp_column is STRING and should be TIMESTAMP, DATE or DATETIME

To be able using the partition on the datestamp_column field, you have to use TIMESTAMP, DATE or DATETIME.
The format you indicated for this field corresponds to a timestamp : datestamp_column: 2022-11-25T12:56:48.926500Z
In your BigQuery schema, you have to change the column type from STRING to TIMESTAMP for datestamp_column field.
Then your ingestion should work correctly because the timestamp format 2022-11-25T12:56:48.926500Z should be ingested as TIMESTAMP in BigQuery.
input -> STRING with value 2022-11-25T12:56:48.926500Z
result column in BigQuery is TIMESTAMP

Related

Date and time string to Datetime Timestamp Type in Spark scala dataframe

I have a date and time field in Spark data frame. Both are String type columns.
Want to combine both dates and times and later want to store it in Hive column with Timestamp Type.
eg: Date as 2018-09-10 and time as 140554(HHMMSS) or sometimes 95200(HMMSS)
Now When I use concat(col("datefield"),lapd(col("timefield"),6,"0)), I get the result as String
2018-09-10140554, but later when I convert it to Timestamp type it is not working properly.
Tried the below steps, but did not get the correct result. I just want to combine both dates and time string columns and make it a timestamp type.
to_timestamp with format YYYY-MM-ddHHMMSS
from_unixtimestamp(unix_timestamp)

Google Analytics to Big Query

"Date" data from GA in BQ is "yyyymmdd" which is not able to convert to "date" data set.
Is there any way to make BQ recognize it as "date"?
Thank you,
According to the documentation, the date field is exported as String from your GA data.
However, it is possible to change that after you export your data to BigQuery. You can overwrite your current table or create a new one with the date format you desire. In order to achieve this, we will use PARSE_DATE() builtin method. It receives a String that will be casted to date according to the string format it has. Below is the StandardSQL syntax in BigQuery:
SELECT PARSE_DATE("%Y%m%d", date) as date FROM `project.dataset.table`
The date will be outputed as YYYY-MM-DD. In addition, if you want to change the date format, you can use FORMAT_DATE() builtin method using one of the formatting elements.
In your case that you want to replace the whole table with the date column with the desired format, you could use the following syntax:
CREATE OR REPLACE TABLE `project.dataset.table` AS
( SELECT * REPLACE(PARSE_DATE("%Y%m%d",date) as date) FROM `project.dataset.table`)
Therefore, your table will have all the same columns, but the date field will be formatted as DATE.

How to upload datetime string into "TIME" field bigquery

I'm trying to upload a bunch of data into Bigquery, and the column that fail is "TIME" type.
I'm tried to insert a datetime to that field with the following values:
"2020-03-23T00:00:00"
"2020-03-23 00:00:00"
"2020-03-23 00:00:00 UTC"
But with the three options, Bigquery job return the following answer:
{'reason': 'invalidQuery', 'location': 'query', 'message': 'Invalid time string "2020-03-23T00:00:00" Field: time; Value: 2020-03-23T00:00:00'}
How can I upload a datetime string into TIME column into BigQuery?
I'm using apache Airflow to upload data from a DAG.
None of those strings are a TIME. All of those strings are a DATETIME or TIMESTAMP.
To solve this problem without changing the underlying source data, make sure to load this data into a DATETIME or TIMESTAMP column (instead of TIME).

Hive table date column value converson

in Hive table value for one column is like 01/12/17.But I need the value in the format as 12-2017(month-year).How to convert it?
Convert the string to a unix_timestamp and output the required format using from_unixtime.
select from_unixtime(unix_timestamp(col_name,'MM/dd/yy'),'MM-yyyy')

db2 timestamp format not loading

I am trying to bulk-load data with timestamps into DB2 UDB version 9 and it is not working out.
My target table has been created thusly
CREATE TABLE SHORT
(
ACCOUNT_NUMBER INTEGER
, DATA_UPDATE_DATE TIMESTAMP DEFAULT current timestamp
, SCHEDULE_DATE TIMESTAMP DEFAULT current timestamp
...
0)
This is the code for my bulk load
load from /tmp/short.csv of del modified by COLDEL, savecount 500 messages /users/chris/DATA/short.msg INSERT INTO SHORT NONRECOVERABLE DATA BUFFER 500
No rows are loaded. I get an error
SQL0180N The syntax of the string representation of a datetime value is
incorrect. SQLSTATE=22007
I tried forcing the timestamp format when I create my table thusly:
CREATE TABLE SHORT
(
ACCOUNT_NUMBER INTEGER
, DATA_UPDATE_DATE TIMESTAMP DEFAULT 'YYYY-MM-DD HH24:MI:SS'
, SCHEDULE_DATE TIMESTAMP DEFAULT 'YYYY-MM-DD HH24:MI:SS'
...
0)
but to no avail: SQL0574N DEFAULT value or IDENTITY attribute value is not valid for column. Timestamp format in source my data file looks like this
2002/06/18 17:11:02.000
How can I format my timestamp columns?
Use the TIMESTAMPFORMAT file type modifier for the LOAD utility to specify the format present in your data files.
load from /tmp/short.csv
of del
modified by COLDEL, TIMESTAMPFORMAT="YYYY/MM/DD HH:MM:SS.UUUUUU"
savecount 500
messages /users/chris/DATA/short.msg
INSERT INTO SHORT
NONRECOVERABLE
It almost looks like ISO to me.
If you are able to edit the data file and can safely change the slashes to hyphens, (maybe clip the decimal fraction) and DB2 accepts ISO, you're home.