Why does Postgres convert an array of two integers into a timestamp? - sql

I'm working with JSON data that may contain formatted timestamps. I'm converting them to proper timestamps using cast when I came across something odd, a conversion that makes no sense:
select cast('[112,180]'::json#>>'{}' as timestamp with time zone);
Produces a result:
timestamptz
------------------------------
0112-06-28 00:00:00+00:53:28
The first number is interpreted as the year, and the second number the day of the year, but...
I played a bit an discovered the first integer needs to be >= 100, and the second integer needs to be from 100 to 366. Any other values, and other array lengths will fail.
I'm curious as to why this pattern is parsed as a timestamp?
I'd also be happy to know if there was a way to disable this behaviour.

It is parsed as a timestamp because that is what you explicitly told it to do.
It is not an array of two integers, it is the text string consisting of the sequence of characters [112,180], because that is what #>> yields.
It is parsed following the rules documented here (although it doesn't define what a token is, so that is a bit vague), specifically rule 3d followed by 3b.
Redefining date parsing sounds like a giant mess. I would think it be better to make a #>> variant that throws an ERROR (if that is what you want, you didn't say what you wanted to happen, only what you wanted not to happen) when the json_typeof from #> is not string.

Related

Conversion failing for date formatting

I have an nvarchar(100) column which has a value ' 8/11/2022'.
I receive and error when trying to convert it to date...
select convert(date,[date],103)
from [Source].[TableName] s_p
--Msg 241, Level 16, State 1, Line 96
--Conversion failed when converting date and/or time from character string.
I have tried a number of different ways to approach but I can't find one to give me '08/11/2022'
select Date = REPLACE(LEFT([Date],10),' ','0')
from [Source].[TableName] s_p
--Outcome 8/11/2022
select REPLACE([DATE],' 8/','08/')
from [Source].[TableName] s_p
--Outcome 8/11/2022
select convert(nvarchar,[date],103)
from [Source].[TableName] s_p
--Outcome 8/11/2022
The strange thing is when I copy and paste from the results grid then do a replace it works fine...
select REPLACE(' 8/11/2022',' 8/','08/')
--Outcome 08/11/2022
Please help me to get to '08/11/2022' or any single digit to having a leading 0.
Thanks, Will
Different languages and cultures have their own formatting preferences around date values. Some places like M/dd/yyyy. Some places like dd/MM/yyyy. Or perhaps d-M-YYYY (different separators and conventions around leading zeros). The point is it's not okay to go into a place and impose our own preferences and norms on that culture.
The SQL language is no different. It really is it's own language, and as such has it's own expectations around date handling. If you violate these expectations, you should not be surprised when there are misunderstandings as a result.
The first expectation is for date and datetime values to be stored in datetime columns. It's hard to understate how much of a difference this can make for performance and correctness.
But let's assume that's not an option, and you have no choice but to use a string column like varchar or nvarchar. In that situation, there is still an expectation around how date values should be formatted.
Any database will do better if you use a format which stores the date parts in order by descending length. For example, ISO-8601 yyyy-MM-ddTHH:mm:sss[.fff] This is important to allow greater than/less than comparisons to work, it can greatly help with indexes and performance, and it makes cast/convert operations to datetime values MUCH more likely to succeed and be accurate.
For SQL Server specifically, there are three acceptable formats:
yyyy-MM-ddTHH:mm:sss[.fff],
yyyyMMdd HH:mm:ss[.fff], and
yyyyMMdd.
Anything else WILL have date values that don't parse as expected. Any string manipulation done to call the CONVERT() method should focus on reaching one of these formats.
With that in mind, and assuming 8/11/2022 means November 8 and not August 11 (given the 103 convert format), you need something like this:
convert(datetime,
right([date], charindex('/', reverse([date]))-1) -- year
+ right('0' + replace(substring([date], charindex('/', [date])+1, 2), '/', ''), 2) -- month
+ right('0' + left([date], charindex('/',[date])-1),2) -- day
)
And you can see it work here:
https://dbfiddle.uk/lM8sVySh
Yes, that's a lot of code. It's also gonna be more than a little slow. And again, the reason why it's so slow and complicated is you jumped in with your own cultural expectations and failed to respect the language of the platform you're using.
Finally, I need to question the premise. As the fiddle above shows, SQL Server is perfectly happy to convert this specific value without error. This tells me you probably have more rows, and any error is in fact coming from a different row.
With that in mind, one thing to remember is a WHERE clause condition will not necessarily run or filter a table before a CONVERT() operation in the SELECT clause. That is, if you have many different kinds of value in this table, you cannot guarantee your CONVERT() expression will only run on the date values, no matter what kind of WHERE clause you have. Databases do not guarantee order of operations in this way.
The problem could also be some invisible unicode whitespace.
Another possibility is date formats. Most cultures that prefer a leading day, instead of month or year, tend to also strongly prefer to see the leading 0 in the first place. That the zero is missing here makes me wonder if you might have a number of dates in the column that were formatted by, say, Americans. So then you try to parse a column with values both like 02/13/2022 and 13/02/2022. Obviously those dates can't both use the same format, since there is no 13th month.
In that case, best of luck to you, because you no longer have any way to know for certain whether 2/3/2022 means March 2nd or February 3rd... and trying to guess (by say, assuming your own common format) is just exacerbating the same mistake that got you into this mess in the first place.
It's worth noting all three of these possibilities would be avoided had you used DateTime columns from the beginning.
You'll want to use LPAD to add 0 to string, then CAST() string as date if you want to change to date data type

IBM DB2 CAST AS VARCHAR versus Python Pandas to_datetime Function

I have the line
CAST(SURGERY.DTM AS VARCHAR(30)) AS appt_dt
in a SQL file hitting an IBM DB2 database. For various reasons, I have to convert to VARCHAR, so leaving out the CAST is not an option. The problem is that this casting is choosing a very poor format. The result comes out like this: 2020-06-09-13.15.00.000000. We have the four-digit year with century, month, day of the month. So far, so good. But then there is the really bad decimal-separated 24-hour hour, minute, and then seconds with microseconds. My goal is to read these dates quickly into a pandas dataframe in Python, and I can't get pandas to parse this kind of date, presumably because it grabs the 13.15 for the hour, 00.000000 for the minute, and then has nothing left over for the seconds. It errors out. My attempt at a parser was like this:
parser_ibm_db(date_str: str) -> pd.tslib.Timestamp:
return pd.to_datetime(date_str, format='$Y-%m-%d-%H.%M.%S')
but it doesn't work. Neither does the option infer_datetime_format or nothing at all.
So here is my question: is there a way either to control the
formatting of the CAST function better, or is there a way to read
the result into pandas? I'd be perfectly happy with either approach.
One idea I had with the second approach was to limit the %H and %M options somehow to look at only 2 characters, but I don't know how to do that and the documentation doesn't tell me how.
A brute force method would be to read the csv data in, search for these kinds of strings, and replace the first two periods with colons. The date parser would have no trouble with that. But that would involve an extra processing step that I'd rather avoid.
Thanks for your time!
Change your format string:
dt_string = '2020-06-09-13.15.00.000000'
pd.to_datetime(dt_string, format='%Y-%m-%d-%H.%M.%S.%f')
Correctly converts the string:
Timestamp('2020-06-09 13:15:00')

Sql function to turn character field into number field

I'm importing data from one system to another. The former keys off an alphanumeric field whereas the latter requires a numeric integer field. I'd like to find or write a function that I can feed the alphanumeric value to and have it return a number that would be unique to the value passed in.
My first thought was to do a hash, but of course the result of any built in hashes are going to contains letters and plus it's technically possible (however unlikely) that a hash may not be unique.
My first question is whether there is anything built in to sql that I'm overlooking, and short of that I'd like to hear suggestions on the easiest way to implement such a function.
Here is a function which will probably convert from base 10 (integer) to base 36 (alphanumeric) and back again:
https://www.simple-talk.com/sql/t-sql-programming/numeral-systems-and-numbers-conversion-in-sql/
You might find the resultant number is too big to be held in an integer though.
You could concatenate the ascii values of each character of your string and cast the result as a bigint.
If the original data is known to be integers you can use cast:
SELECT CAST(varcharcol AS INT) FROM Table

date handling in sqlite3 confusing.

I am from non database background. I have created a table with one of the field data type TEXT.
dataF TEXT
INSERTION:
Have inserted three records with the values :
'842-2-4'
'842-2-5'
'842-2-6'
SELECTION:
Tring to get the records based on date now.
... where dateF between '842-2-4' and '842-2-10'
It fails.
Whereas,
... where dateF between '842-2-4' and '842-2-8'
retrieves all the 3 records.
What am i doing wrong ? Why the first statment fails ?
Kindly suggest.
Because you are comparing strings not dates. The computer has no idea these are dates. You have to either store as date and do a date comparison or implement your own logic to analyze strings.
Simply put, it is looking at the 1 in 10 as being less than your values rather than 10 being more. It's string comparison, not date.
Although sqlite doesn't support date types, it does have functions for dealing with them. See here:
http://www.sqlite.org/lang_datefunc.html
When comparing strings, the values are compared left to right...
As one string is shorter that the other, you are kind of comparing this...
'842-2-4'
'842-2-1'
Well, nothing is >= '842-2-4' AND <= '842-2-1'.
Because '842-2-1' comes before '842-2-4'.
And, so, '842-2-10' comes before '842-2-4' too.
Just as 'Squiggled' comes before 'Squiggly'
And as 'xxx-y-az' comes before 'xxx-y-z'
To compare as you desire, make sure all your dates are padded with 0's.
BETWEEN '0842-02-04' AND '0842-02-10'
But that will only work after you have padded out the values in your table too.
EDIT:
Also, note that this assumes that your format is YYYY-MM-DD. As a string is compared left to right, you must have the highest magnitude values to the left.
(This means that you can't use YYYY-DD-MM and still have native string comparisons behave as you would want them.)

Can Long Integer store letters?

I ran 'Analyze Performance' feature in Access and it had an "idea" to improve performance; Access said I should convert items that are alphanumeric mixes that look like this 12BB1-DF740§ from text data type into long integer (the specific name from the idea). Whether Access is right that this would improve performance is secondary to whether long integer can store letters at all.
[§ About the data - the hyphen in the data provided to me is always present at that location; the letters are always A-F]
From what I can tell, w3schools is indicating that Long will only store numbers
Long - Allows whole numbers between -2,147,483,648 and 2,147,483,647
Am I conflating data types? (Further, when I pull up the design view, it only offers number as a data type; there is no long or long integer)
Can Long Integer store letters?
If my column is already populated, and I convert the data type, will I lose data?
You could store those values by splitting them into 2 Long Integer columns. Then when you need the original text form, concatenate their Hex() values with a dash between.
? Hex(76721) & "-" & Hex(915264)
12BB1-DF740
However I don't see why that would be worth doing. Occasionally a performance analyzer suggestion just doesn't make sense to me; this is such a case.
I've never run into this but it looks like it thinks your strings are hexadecimal numbers.
If you never have letters other than A-F then you could store them as longs and then convert back using the Hex() function but that seems mighty kludgy and something I'd avoid unless you're really desperate to eek out some performance.
If it is in fact hexadecimal data, and it always has the same format so that the dash could just be added at the same place, then it would be possible to store the data numeric, and convert it into the hexadecimal notation when needed.
Ten hexadecimal digits represent 40 bits of data, so the Long type described at the w3schools page wouldn't do, as it's only 32 bits. You would need a data type that is a 64 bits, like a double or bigint. (The latter one might not be available in Access.)
However, that would only be any real gain if you actually do any processing of the data in the numeric form. Otherwise you would only save a few bytes per record, and you would need extra processing to convert to and from the numeric format.
If your table is already populated, you would have to read out the values, convert them, and store them back in the numeric form.