How can I change a date field from String to Date or DateTime? - google-bigquery

I an using Google Big Query and I have a field, named 'AsOfDate' which is set as a string datatype. I have a bunch of data in this field, which I really want to set as DateTime or just Date. Either is fine. I Googled for a solution, and I thought this would be pretty easy to do, but I can't seem to get the data type updated. I don't want to run a simple select statement; I want to permanently change the Schema. Has anyone run into this and figured out how to do this kind of thing? If so, please share your insights. Thanks!

To quote directly from the official documentation: 'Changing a column's data type is not supported by the BigQuery web UI, the command-line tool, or the API.'
https://cloud.google.com/bigquery/docs/manually-changing-schemas#changing_a_columns_data_type
There are two ways to manually change a column's data type:
Using a SQL query — Choose this option if you are more concerned about
simplicity and ease of use, and you are less concerned about costs.
Recreating the table — Choose this option if you are more concerned
about costs, and you are less concerned about simplicity and ease of
use.
You could use either of the approaches above along with the PARSE_DATE() function to transform your string into a date field.
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#parse_date

Related

How to handle potential data loss when performing comparisons across data types in different groups

Background:
Our group is going through a Cloudera upgrade to 6.1.1 and I have been tasked with determining how to handle the loss of the implicit data type conversion across data types. See link below for the relevant Release Note details.
https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_611_incompatible_changes.html#hive_union_all_returns_incorrect_data
Not only does this issue affect UNION ALL queries, but there is a function that performs comparisons on columns of different data types (i.e, STRING to BIGINT).
The group has decided that we do not want to change the underlying table meta data. So the solution is to allow for potential data loss by using the CAST() function to cast the data. In the case of UNION ALL, we cast to the destination table's meta data. But, when performing comparisons, I am trying to determine the simplest and easiest way to perform comparisons without getting erroneous results.
Question:
Can I simply cast everything to either STRING or VARCHAR() when performing the comparison? Are there any potential problems that might create incorrect results?
Update:
If there are problems with this approach, is there a correct solution to handle this?
Note: this is my first engagement working with Hadoop/HIVE and I have learned that everything I know in RDBMS land does not always apply.
It is possible that you will have problems. For instance, if comparing a string to an int, then:
'1.00' = 1 --> true, because the values are compared as numbers
But as strings:
'1.00' = '1' --> false, because the values are compared as strings
You can get similar issues with dates, I think.

Storing iso8601 string stored in ActiveRecord as string or datetime?

I'm trying to write a schema for an ActiveRecord object.
I've decided to use iso8601 format throughout my application, including for external api requests.
Should the column be a string or datetime?
Is there any performance impact or distinction between the two?
Storing the date in the database as a date or datetime means you can use the date functions like comparing dates in the database. And it gives you the freedom to present the date in whichever format you choose, making it easy to do so if the formatting requirements change in the future, without having to touch the database.
Whereas storing the date in the database as a string removes all these advantages. You no longer can use database date functions. Plus, If you decide to use another format (maybe in a newer version of the API or for mobile apps... etc), you will need to parse the string back into a date/datetime object, which is not very appealing to do.
As a general good practice: the way you store data should be agnostic to the way you present it, when possible.

Import PostgreSQL dump into SQL Server - data type errors

I have some data which was dumped from a PostgreSQL database (allegedly, using pg_dump) which needs to get imported into SQL Server.
While the data types are ok, I am running into an issue where there seems to be a placeholder for a NULL. I see a backslash followed by an uppercase N in many fields. Below is a snippet of the data, as viewed from within Excel. Left column has a Boolean data type, and the right one has an integer as the data type
Some of these are supposed to be of the Boolean datatype, and having two characters in there is most certainly not going to fly.
Here's what I tried so far:
Import via dirty read - keeping whatever datatypes SSIS decided each field had; to no avail. There were error messages about truncation on all of the boolean fields.
Creating a table for the data based on the correct data types, though this was more fun... I needed to do the same as in the dirty read, as the source would otherwise not load properly. There was also a need to transform the data into the correct data type for insertion into the destination data source; yet, I am getting truncation issues, when it most certainly shouldn't be.
Here is a sample expression in my derived column transformation editor:
(DT_BOOL)REPLACE(observation,"\\N","")
The data type should be Boolean.
Any suggestion would be really helpful!
Thanks!
Since I was unable to circumvent the SSIS rules in order to get my data into my tables without an error, I took the quick-and-dirty approach.
The solution which worked for me was to have the source data read each column as if it were a string, and the destination table had all fields be of the datatype VARCHAR. This destination table will be used as a staging table, once in SS, I can manipulate as needed.
Thank you #cha for your input.

How to change mileage representation forms in sql

I would like to change the manner in which the mileage is represented in the database. For example, right now the mileage is represented as 080+0.348; this would mean that this particular feature is at mileage point 80.348 along the roadway corridor. I would like to have the data represented in the database in the latter form, 80.348 and so on. This would save me from having to export the dataset to excel for the conversion. Is this even possible? The name of the column is NRLG_MILEPOINT.
Much appreciated.
One thing you could try is to pick the string value apart into its component pieces and then recombine them as a number. If your data is in a table called TEST you might do something like the following:
select miles, fraction,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from test);
SQLFiddle here.
Share and enjoy.
Using the answer provided above, I was able to expand it to get exactly the answer i needed. Thanks a ton to everyone who helped! Here is the query i ended up with:
select distinct nrlg_dept_route,corridor_code_rb,nrlg_county,next_county,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
nrlg_milepoint as nrlg_mile_point
nrlg_dept_route as nrlg_dept_route,
nrlg_county as nrlg_county,
next_county as next_county,
corridor_code_rb as corridor_code_rb,
corridor_code as corridor_code,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from corridor_county_intersect,south_van_data_view)
where nrlg_dept_route = corridor_code
order by 1,5;
There are a variety of ways to do this. Which one depends on your situation, how the data needs to be stored, and how it is being interacted with. Some of these options include:
Changing the datatype.
This option would potentially require you to change how the data is being stored currently. The conversion of the data would have to be done by whatever is writing the data to the schema currently.
Creating another column that stores the data in the correct format.
If you have an existing means of storing the data that would be broken by changing the datatype of NRLG_MILEPOINT and/or you have a requirement to store the data in that format; you could optionally add another column... say NRLG_MILEAGE_DISPLAY that is of a datatype number perhaps, and store the data there. You could make a trigger that updates/inserts NRLG_MILEAGE_DISPLAY appropriately, based on the data in NRLG_MILEPOINT.
If you are just wanting the data to be displayed differently in your select statement, you can convert the datatype in your SQL statement. Specifically how you would do this depends on the current datatype of NRLG_MILEPOINT.
Assuming that varchar2 is the type, based on the comments, here is an SQLFIDDLE link displaying a crude example of option 3. Your usage of this may vary depending on the actual datatype of NRLG_MILEPOINT. Regardless of its datatype... I am sure there is a means of converting how it is displayed in your query. You could take this further and create a view if you needed to. As an inline view or as a stored view, you can then use the converted value for doing your join later.

Why do SQLiteStudio (and others) not display a datetime in human-readable format by default?

Today I had to use a SQLite database for the first time and I really wondered about the display of a DATETIME column like 1411111200. Of course, internally it has to be stored as some integer value to be able to do math with it. But who wants to see that in a grid output, which is clearly for human eyes?
I even tried two programs, SQLiteStudio and SQLite Manager, and both don't even have an option to change this (at least I couldn't find it).
Of course with my knowledge about SQL it didn't take long to find out what the values mean - this query displays it like I expected:
select datetime(timestamp, 'unixepoch', 'localtime'), * from MyTable
But that's very uncomfortable when working with a GUI Tool. So why? Just because? Unix nerds? Or did I just get a wrong impression because I accidentally tried the only 2 Tools which are bad?
(I also appreciate comments on which tools to use or where I can find the hidden settings.)
Probably because sqlite doesn't have a first-class date type — how would a GUI tool know which columns are supposed to contain dates?
The question implies that a column of datatype DATETIME can only hold valid datetimes. But that's not true in SQLite: you can put any number or string value and it will be stored and displayed like it is.
To find out what the most "natural" way for a timestamp in SQLite would be, I created a table like this:
CREATE TABLE test ( timestamp DATETIME DEFAULT ( CURRENT_TIMESTAMP ) );
The result is a display in human readable format (2014-09-22 10:56:07)! But in fact it is saved as string, and I cannot imagine any serious software developer who would like that. Any comments?
That original database from the question, having datetimes as unixepoch, is not because of its table definition, but because the inserted data was like that. And that was probably the best possible option how to do it.
So, the answer is, those tools cannot display the datetime in human readable format, because they cannot know how it was encoded. It can be the number of seconds since 1970 or anything else, and it could even be different from row to row. What a mess.
From Wikipedia:
A common criticism is that SQLite's type system lacks the data
integrity mechanism provided by statically typed columns in other
products. [...] However, it can be implemented with constraints
like CHECK(typeof(x)='integer').
From the authors:
[...] most other SQL database engines are statically typed and so some
people feel that the use of manifest typing is a bug in SQLite. But
the authors of SQLite feel very strongly that this is a feature. The
use of manifest typing in SQLite is a deliberate design decision which
has proven in practice to make SQLite more reliable and easier to use,
especially when used in combination with dynamically typed programming
languages such as Tcl and Python.