Importing CSV file but getting timestamp error - sql

I'm trying to import CSV files into BigQuery and on any of the hourly reports I attempt to upload it gives the code
Error while reading data, error message: Could not parse 4/12/2016 12:00:00 AM as TIMESTAMP for field SleepDay (position 1) starting at location 65 with message Invalid time zone: AM
I get that the format is trying to use AM as a timezone and causing an error but I'm not sure how best to work around it. All of the hourly entries will have AM or PM after the date-time and that will be thousands of entries.
I'm using the autodetect for my schema and I believe that's where the issue is coming up, but I'm not sure what to put in the edit as text schema option to fix it

To successfully parse an imported string to timestamp in Bigquery, the string must be in the ISO 8601 format.
YYYY-MM-DDThh:mm:ss.sss
If your source data is not available in this format, then try the below approach.
Import the CSV into a temporary table by providing explicit schema, where timestamp fields are strings.
2. Select the data from the created temporary table, use the BigQuery PARSE_TIMESTAMP function as specified below and write to the permanent table.
INSERT INTO `example_project.example_dataset.permanent_table`
SELECT
PARSE_TIMESTAMP('%m/%d/%Y %H:%M:%S %p',time_stamp) as time_stamp,
value
FROM `example_project.example_dataset.temporary_table`;

Related

Uploading csv file to Big Query Timestamp Parse error

I am building a node.js script to upload csv files to big query but I am running into a parse error:
Error while reading data, error message: Could not parse 'XXXX-X-XX XX:XX:XX' as TIMESTAMP for field date_time_received_utc (position 70) starting at location 1279 with message 'Could not parse 'XXXX-X-XX XX:XX:XX' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]]'
I believe that I have it in the correct format but I am not sure what the issue is. Has anyone run into this issue?
So far I have tried removing the seconds from the timestamp but I am still having the issue.
I've run into a similar issue using Python, in which it was necessary to append to the timestamp information the timezone. So inserting the timezone solved my problem.
You can check here for the data formats in node.js to BigQuery api. As I could read, there could be something with the data type, maybe it should be datetime.

How to solve error gdk-05030 when importing csv data in SQL Developer

I have a problem importing a CSV file using SQL Developer. I created a table to import a 'date' data by using the code below
CREATE TABLE DEPARTMENT (
DATECOLUMN DATE
);
I imported the CSV by right clicking and so on.
Interestingly, the CSV 'date' data has a format of 'YYYY-MM-DD', but when loaded in SQL developer (preview, before importing), it took the format of 'MM/DD/YYYY'.
I believe this is the problem, because when trying to import, it returned error "gdk-05030", meaning the data contains more than needed (to fit the format).
To me, it looks as follows:
date format in a CSV file is yyyy-mm-dd (as you said)
when you queried the table after insert, you found out that values look differently - mm/dd/yyyy
it is because current settings use that date format. If you want to change it, run
alter session set nls_date_format = 'yyyy-mm-dd';
and then select rows from the table.
alternatively, go to SQL Developer's Preferences, and set desired date format in its [Database - NLS] section.

Google BigQuery: Importing DATETIME fields using Avro format

I have a script that downloads data from an Oracle database, and uploads it to Google BigQuery. This is done by writing to an Avro file, which is then uploaded directly using BQ's python framework. The BigQuery tables I'm uploading the data to has predefined schemas, some of which contain DATETIME fields.
As BigQuery now has support for Avro Logical fields, import of timestamp data is no longer a problem. However, I'm still not able to import datetime fields. I tried using string, but then I got the following error:
Field CHANGED has incompatible types. Configured schema: datetime; Avro file: string.
I also tried to convert the field data to timestamps on export, but that produced an internal error in BigQuery:
An internal error occurred and the request could not be completed. Error: 3144498
Is it even possible to import datetime fields using Avro?
In Avro, the logical data types must include the attribute logicalType, it is possible that this field is not included in your schema definition.
Here there are a couple of examples like the following one. As far as I know the type can be int or long, but logicalType should be date:
{
'name': 'DateField',
'type': 'int',
'logicalType': 'date'
}
Once the logical data type is set, try again. The documentation does indicate it should work:
Avro logical type --> date
Converted BigQuery data type --> DATE
In case you get an error, it would be helpful to check the schema of your avro file, you can use this command to obtain its details:
java -jaravro-tools-1.9.2.jargetschema my-avro-file.avro
UPDATE
For cases where DATE alone doesn't work, please consider that the TIMESTAMP can store the date and time with a number of micro/nano seconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC (UTC seems to be the default for avro). Additionally, the values stored in an avro file (of type DATE o TIMESTAMP) are independent of a particular time zone, in this sense, it is very similar to BigQuery Timestamp data type.

In PostgreSQL, what's data type you pass to a create table call when dealing with timestamp values?

When creating a table how do you deal with a timestamp in csv file that has the following syntax - MM/DD/YY HH:MI? Here's an example: 1/1/16 19:00
I have tried the following script in PostgreSQL:
create table timetable (
time timestamp
);
copy table from '<path>' delimiter ',' CSV;
But, I receive an error message saying:
ERROR: ERROR: invalid input syntax for type timestamp:
"visit_datetime" Where: COPY air_reserve, line 16, column
visit_datetime: "visit_datetime"
One solution I have considered is first creating the timestamp column in char then run a separate query that converts it to the appropriate timestamp datatype using the function call 'to_char(time, MM/DD/YY HH:MI). But, I'm looking for a solution that would enable to load the data in the correct datatype in a single query.
You may find a datestyle that enables you to load the data you have, but sooner or later someone will deliver to you something that doesn't fit.
The solution you have considered is probably the best.
We use this as a standard pattern for loading data warehouses. We take today's data, load it into a staging table using varchar columns for any data that will not load directly into its target data type. We then run whatever scripts we need to to get the data into a good state, raising warnings for anything that is broken in a way we haven't seen before. Then we add the cleaned version of today's data into the table containing cleaned data for all previous days.
We don't mind if this takes several steps; we put them all in a script and run it as an automated job.
I'm working on documenting the techniques we use. You can see the beginnings of this at http://www.thedatastudio.net.

Getting error while retrieving columns on HIVE "TIMESTAMP" column

In Hive, I am trying create table on log file, I have data in the following format.
1000000000012311 1373346000 21.4 XX
1000000020017331 1358488800 16.9 YY
In this second field is Unix timestamp. I am writing following HIVE QUERY:
CREATE EXTERNAL TABLE log(user STRING, tdate TIMESTAMP, spend DOUBLE, state STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' LOCATION '/user/XXX/YYY/ZZZ';
Table is created. but when I am trying to get the data from table Select * form log limit 10';
I am getting following error.
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating tdate
I have checked the HIVE manual and also google it, but didn't get any solution.
For epoch, you can define as BIGINT and then use the built-in UDF, from_unixtime() to convert to a string representing the date. Some thing like "select from_unixtime(tdate) from log "
A similar post at this link : How to create an external Hive table with column typed Timestamp
Hive supports the datatype timestamp but when used with JDBC cannot accept Timestamp as a datatype. But this was a problem in earlier versions. From Hive version 0.8.0, this problem is fixed. You can checkout this JIRA ticket raised.
https://issues.apache.org/jira/browse/HIVE-2957