I'm trying to upload a bunch of data into Bigquery, and the column that fail is "TIME" type.
I'm tried to insert a datetime to that field with the following values:
"2020-03-23T00:00:00"
"2020-03-23 00:00:00"
"2020-03-23 00:00:00 UTC"
But with the three options, Bigquery job return the following answer:
{'reason': 'invalidQuery', 'location': 'query', 'message': 'Invalid time string "2020-03-23T00:00:00" Field: time; Value: 2020-03-23T00:00:00'}
How can I upload a datetime string into TIME column into BigQuery?
I'm using apache Airflow to upload data from a DAG.
None of those strings are a TIME. All of those strings are a DATETIME or TIMESTAMP.
To solve this problem without changing the underlying source data, make sure to load this data into a DATETIME or TIMESTAMP column (instead of TIME).
Related
I need to upload csv file into bigquery table. My question is if "datestamp_column" is STRING how I can use it as partition field?
An example value from "datestamp_column": 2022-11-25T12:56:48.926500Z
def upload_to_bq():
client = bigquery.Client())
job_config = bigquery.LoadJobConfig(
schema = client.schema_from_json("schemaa.json"),
skip_leading_rows=1,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="datestamp_column",
expiration_ms=7776000000, # 90 days.
),
)
This is failing as it is complaining datestamp_column is STRING and should be TIMESTAMP, DATE or DATETIME
To be able using the partition on the datestamp_column field, you have to use TIMESTAMP, DATE or DATETIME.
The format you indicated for this field corresponds to a timestamp : datestamp_column: 2022-11-25T12:56:48.926500Z
In your BigQuery schema, you have to change the column type from STRING to TIMESTAMP for datestamp_column field.
Then your ingestion should work correctly because the timestamp format 2022-11-25T12:56:48.926500Z should be ingested as TIMESTAMP in BigQuery.
input -> STRING with value 2022-11-25T12:56:48.926500Z
result column in BigQuery is TIMESTAMP
I have a date and time field in Spark data frame. Both are String type columns.
Want to combine both dates and times and later want to store it in Hive column with Timestamp Type.
eg: Date as 2018-09-10 and time as 140554(HHMMSS) or sometimes 95200(HMMSS)
Now When I use concat(col("datefield"),lapd(col("timefield"),6,"0)), I get the result as String
2018-09-10140554, but later when I convert it to Timestamp type it is not working properly.
Tried the below steps, but did not get the correct result. I just want to combine both dates and time string columns and make it a timestamp type.
to_timestamp with format YYYY-MM-ddHHMMSS
from_unixtimestamp(unix_timestamp)
Im working with Data actory this time this why i ask lot of question about that
My new problem is that my SOURCE(CSV file contains a column DeleveryDate full of Date dd/MM/YYYY) and my table SQl where i specify DElevry date as DateTime but when I map btw source and sink in Data preview source
duplicate columns like in the picture below but in data preview sink the columns always NULL the same in my table NULL.
Thanks
You said column DeleveryDate full of Date dd/MM/YYYY), can you tell me why the column DeleveryDate has the values like '3', '1' in your screenshot? String '3' or '1' are not the date string with format dd/MM/YYYY.
If you want to do some data convert in Data Factory, I still suggest your to learn more about Data Flow.
For now, we can not convert date format from dd/MM/YYYY to datetime yyyy-MM-dd HH:mm:ss.SSS directly, we must do some other converts.
Look at bellow, I have a csv file contained a column with date format dd/MM/YYYY string, I still using DerivedColumn this time:
Add DerivedColumn:
Firstly, using this bellow expression to substring and convert dd/MM/YYYY to YYYY-MM-dd:
substring(Column_2, 7, 4)+'-'+substring(Column_2, 4, 2)+'-'+substring(Column_2, 1,2)
Then using toTimestamp() to convert it:
toTimestamp(substring(Column_2, 7, 4)+'-'+substring(Column_2, 4, 2)+'-'+substring(Column_2, 1,2), 'yyyy-MM-dd')
Sink settings and preview:
My Sink table column tt data type is datetime:
Execute the pipeline:
Check the data in sink table:
Hope this helps.
Please try this-
This is a trick which was a blocker for me, but try this-
Go to sink
Mapping
Click on output format
Select the data format or time format you prefer to store the data into the sink.
I have a table with a column timestamp in type TIMESTAMP in BigQuery. When I display it on my console, I can see timestamps as follows: 2015-10-19 21:25:35 UTC
I then download my table using the BigQuery API, and when I display the result of the query, I notice that this timestamp has been converted in some kind of very big integer like 1.445289935E9
Therefore, in order to load this table as a pandas.DataFrame, I have to convert it back to a pandas compatible timestamp. How can I do that? In other words, what numpy or pandas dtype shall I use in my pandas.read_csv to load my bigquery timestamp in a numpy/pandas timestamp?
I have a JSON dataset in HDFS that contains a timestamp and a count. The raw data looks like this:
{"timestamp": "2015-03-01T00:00:00+00:00", "metric": 23}
{"timestamp": "2015-03-01T00:00:01+00:00", "metric": 17}
...
The format of the timestamp almost matches the Hive-friendly 'yyyy-mm-dd hh:mm:ss' format but has a couple of differences: there's a 'T' between the date and time. There's also a timezone offset. For example, a timestamp might be 2015-03-01T00:00:00+00:00 instead of 2015-03-01 00:00:00.
I'm able to create a table, providing that I treat the timestamp column as a string:
add jar hdfs:///apps/hive/jars/hive-json-serde-0.2.jar;
CREATE EXTERNAL TABLE `log`(
`timestamp` string,
`metric` bigint)
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" WITH SERDEPROPERTIES ("timestamp"="$.timestamp", "metric"="$.metric")
LOCATION 'hdfs://path/to/my/data';
This isn't ideal since, by treating it like a string, we lose the ability to use timestamp functions (e.g. DATE_DIFF, DATE_ADD, etc...) without casting from within the query. A possible workaround would be to CTAS and CAST the timestamp using a regular expression, but this entails copying the data into its new format. This seems inefficient and not in the spirit of 'schema-on-read'.
Is there a way to create a schema for this data without processing the data twice (i.e. once to load, once to convert the timestamp to a true timestamp)?
You will need to decide whether to:
do CTAS as you described
push the conversion work/logic into the consumers/clients of the table
For the second option this means including the string to timestamp conversion in the sql statements executed against your external table.