Validate dates before conversion, aka. ISDATE() equivalent - sql

DB2 version is 9.7.0.7
I have a flat file, and need to validate fully prior to insert into a production table. For analysis, I've parsed it into a table where all columns are VARCHAR.
One of the tasks is to validate dates. I need to be able to locate the specific invalid dates, to report on the scope (frequency) and solution (reason).
I use ISDATE() in Sybase and SQL Server, which returns a 1 for a valid date, and a 0 for an invalid date. In Teradata, I left join to the SYS_CALENDAR table in the system catalog. It's been about 15 years since I've last been in a DB2 environment, but I believe analogs to either do not exist.
In this DB2 environment my role is limited to QA, meaning I cannot create T-SQL procedures or UDFs.
This thread is clever and makes me think there may be some Common Table Expression logic that could be employed in a query:
ISDATE equivalent of DB2
That one falls short as a solution, however, because it only considers format - the presence of an invalid (but properly formatted) date like '2016-04-31' or '2016-02-30' will raise an error and the query will return no rows.
I need to return all rows, identifying if each is valid or invalid (or just return the invalid rows for investigation, even) - so doing a CAST or CONVERT, or inserting into a formatted table in a test environment won't work.
Is there an analog to ISDATE(), SYS_CALENDAR, or another solution that gets to the same end product of a row-wise presentation of dates that can't be cast to DATE, prior to performing that conversion/insert?

You can do it with the PureXML extension as follows:
SELECT
XMLCAST(XMLQUERY('string($D) castable as xs:date' PASSING mycolumn as D ) AS INT)
FROM
mytable
which will return 1 or 0.

Related

SQLite - TZ format to Date time only using SQL queries

I have a SQLite database with a simple table in the format of:
ID,DateText,Name
1,2020-09-01T18:57:17Z,John
2,2022-12-01T04:00:09Z,Laurel
...
The DateText column is declared as TEXT on the "create table" statement. Using only SQL, I need to:
Create a new column with the DateText data.
Obtain the "oldest" date
Obtain the "newest" date
Note that I need to resolve this with a SQL query. I cannot read into a programming language, parse, and update table--I need to do everything on SQL. For example, SQL Server DateTime with timezone is the opposite, but they are using node.js, something I cannot do.
You can get the oldest and newest using min() and max():
SELECT ID, min(DateTime), Name FROM YourTable; -- Oldest
SELECT ID, max(DateTime), Name FROM YourTable; -- Newest
The nice thing about ISO-8601 date and time format strings is that they sort lexicographically without having to do anything special with them.
These queries would give an error on most SQL database engines because of the mix of non-grouped columns with an aggregate function, but SQLite explicitly will return the row that goes with the minimum or maximum column value. And of course if you don't want the other columns, just leave them out of the SELECT.

SSIS: Creating a variable expression but it is throwing error "DT_WSTR" and "DT_DATE" are incompatible

I'm creating an incremental load which would be pulling data from ORACLE to SQL Server. The incremental load will be based off a MODIFIED_DATE column.
I have created a result set variable that stores the MAX modified_date from the destination table. So the engine will only check the rows of the MODIFIED_DATES that are greater than the variable and perform a lookup to see if the row needs to be added, updated or deleted.
So I have my MAX MODIFIED DATE RESULT SET and I also have created another variable that will house the SOURCE QUERY which will be have a WHERE clause that see if the MODIFIED_DATE column is greater than the MAX MODIFIED_DATE variable.
Example:
Select column_name,column_name
From table
Where modified_date > '"+ #[User::LastModifiedDate]+ "'"
It is throwing me an error off:
The data types "DT_WSTR" and "DT_DATE" are incompatible for binary operator "+". The operand types could not be implicitly cast into compatible types for the operation. To perform this operation, one or both operands need to be explicitly cast with a cast operator.
Now, I have done a ton of searching but I cant seem to find a way to do this. The only solution that I found online is to ADD A (DT_WSTR, 25) in front of the variable which causes the variable expression to evaluate and this is the only way I can get the variable expression to evaluate.
Example:
(DT_WSTR, 25) #[User::LastModifiedDate]+ "'"
When I run it it is telling me it is NOT A VALID MONTH
The MODIFIED_DATE column in the DESTINATION table is in SQL Server and it has a DataTime as the date type which reads like this:
2008-06-10 22:22:25.000
YYYY-MM-DD
The MODIFIED_DATE column in the SOURCE table in oracle reads like:
6/10/2008 10:22:25 PM
MM/DD/YY HH:MM:SS
How would I be able to resolve this? Also what do you think is the best way to perform an incremental load based the MODIFIED_DATE column? Is my way one of the more efficient ways or is there another route I can take?
You need to make your SSIS component call the following statement:
Select column_name,column_name
From table
Where modified_date > to_date('whateverformat','"+ (DT_WSTR,25)#[User::LastModifiedDate]+ "')"
The problem is you are tangling the strings versus dates. Your lastmodifieddate must be a string for the expression builder to function, but i suspect Oracle is expecting modified_date to be a date, so just use the to_date function

When is the type of a column in a SQL query result determined?

When performing a select query from a data base the returned result will have columns of a certain type.
If you perform a simple query like
select name as FirstName
from database
then the type of the resulting FirstName column will be that of database.name.
If you perform a query like
select age*income
from database
then the resulting data type will be that of the return value from the age*income expression.
What happens you use something like
select try_convert(float, mycolumn)
from database
where database.mycolumn has type of nvarchar. I assume that the resulting column has type of float which is decided by the return type of the first call to try_convert.
But consider this example
select coalesce(try_convert(float, mycolumn), mycolumn)
from database
which should give a column with the values of mycolumn unchanged if try_convert fails, but mycolumn as a float when/if that is possible.
Is this determination made as the first row is handled?
Or will the type always be determined by the function called independently of the data in the rows?
Is it possible to conditionally perform a conversion?
I would like to convert to float in the case where this is possible for all rows and leave unchanged in case it fails for any row.
Update 1
It seems that the answer to the first part of the question is that the column type is determined by the expression at compile time which means that you cannot have a dynamic type of your column depending on the data.
I see two workaround for this
Option 1
For each column count the number of not null rows of try_convert(float, mycolumn) and if this number is 0 then do not perform conversion. This will of course read the rows many times and might be inefficient.
Option 2
Simple repeat all columns; once without conversion and once with conversion and then simply use the interesting one.
One could also perform another select statement where only columns with non-null values are included.
Background
I have a dynamically generated pivot table with many (~200 columns) of which some have string values and others have numbers.
I would like to cast all columns as float where this is possible and leave the other columns unchanged (or cast as nvarchar).
The data
The data is mostly NULL values with some columns having text string and other columns having numbers. There are no columns with "mixed" content.
The types are determined at compile time, not at execution. try_convert(float, ...) knows exactly the type at parse/compile time, because float here is a keyword, not a value. As for expressions like COALESCE(foo, bar) the type similarly determined at compile time, following the rules of data type precedence lad already linked.
When you build your dynamic pivot you'll have to know the result type, using the same inference rules the SQL parser/compiler uses. I understand some rules are counter intuitive, when in doubt, test it out.
For the detail oriented: some expressions types can be determined at parse time, eg. N'foo'. But most have to be resolved at compile time, when the names of tables and columns are bind to actual object in the database, because only then the type is discovered.

SQL Server DBLlink to oracle returns numbers as string

I have an Oracle database containing my data and an SQL Server database getting the data from Oracle through DBLink.
Problem is - all numbers from the Oracle tables are accepted at the SQL Server as nvarchar. As a result, when i try to filter the query in the SQL Server with some_number_field = 0 i get:
"Conversion failed when converting the nvarchar value '3.141' to data type int."
This also happens if i try to select "some_number_field * 1" or similar expressions.
Any idea ?
Today I ran into the same kind of problem. It seems that Oracle field with datatype NUMBER are shown as nvarchar where querying through a linked server. However, NUMBER(x,y) not.
E.g. colB is the NUMBER field from an Oracle View (or table)
Try this:
SELECT colA, CAST(colB AS DECIMAL(23,2)) colB
FROM OPENQUERY(LINKED_SERVER_NAME, 'select * from myView')
Note: the DECIMAL(xx,y) values depends of course on your data. Also, remember, if your NUMBER column is a repetitive fraction (eg. 33.33333333 etc), you need to place a round() on the oracle side otherwise the CAST will throw an error.

How to prevent CAST errors on SSIS?

The question
Is it possible to ask SSIS to cast a value and return NULL in case the cast is not allowed instead of throwing an error ?
My environment
I'm using Visual Studio 2005 and Sql Server 2005 on Windows Server 2003.
The general context
Just in case you're curious, here is my use case. I have to store data coming from somewhere in a generic table (key/value structure with history) witch contains some sort of value that can be strings, numbers or dates. The structure is something like this :
table Values {
Id int,
Date datetime, -- for history
Key nvarchar(50) not null,
Value nvarchar(50),
DateValue datetime,
NumberValue numeric(19,9)
}
I want to put the raw value in the Value column and try to put the same value
in the DateValue column when i'm able to cast it to Datetime
in the NumberValue column when i'm able to cast it to a number
Those two typed columns would make all sort of aggregation and manipulation much easier and faster later.
That's it, now you know why i'm asking this strange question.
============
Thanks in advance for your help.
You could also try a Derived Column component and test the value of the potential date/number field or simply cast it and redirect any errors as being the NULL values for these two fields.
(1) If you just simply cast the field every time with a statement like this in the Derived Column component: (DT_DATE)[MYPOTENTIALDATE] - you can redirect the rows that fail this cast and manipulate the data from there.
OR
(2) You can do something like this in the Derived Column component: ISNULL([MYPOTENTIALDATE]) ? '2099-01-01' : (DT_DATE)[MYPOTENTIALDATE]. I generally send through '2099-01-01' when a date is NULL rather than messing with NULL (works better with Cubes, etc).
Of course (2) won't work if the [MYPOTENTIALDATE] field comes through as other things other than a DATETIME or NULL, i.e., sometimes it is a word like "hello".
Those are the options I would explore, good luck!
In dealing with this same sort of thing I found the error handling in SSIS was not specific enough. My approach has been to actually create an errors table, and query a source table where the data is stored as varchar, and log errors to the error table with something like the below. I have one of the below statements for each column, because it was important for me to know which column failed. Then after I log all errors, I do a INSERT where I select those records in SomeInfo that do not have an errors. In your case you could do more advanced things based on the ColumnName in the errors table to insert default values.
INSERT INTO SomeInfoErrors
([SomeInfoId]
,[ColumnName]
,[Message]
,FailedValue)
SELECT
SomeInfoId,
'PeriodStartDate',
'PeriodStartDate must be in the format MM/DD/YYYY',
PeriodStartDate
FROM
SomeInfo
WHERE
ISDATE(PeriodStartDate) = 0 AND [PeriodStartDate] IS NOT NULL;
Tru using a conditional split and have the records where the data is a date go along one path and the other go along a different path where they are updated to nullbefore being inserted.