IBM DB2 CAST AS VARCHAR versus Python Pandas to_datetime Function - sql

I have the line
CAST(SURGERY.DTM AS VARCHAR(30)) AS appt_dt
in a SQL file hitting an IBM DB2 database. For various reasons, I have to convert to VARCHAR, so leaving out the CAST is not an option. The problem is that this casting is choosing a very poor format. The result comes out like this: 2020-06-09-13.15.00.000000. We have the four-digit year with century, month, day of the month. So far, so good. But then there is the really bad decimal-separated 24-hour hour, minute, and then seconds with microseconds. My goal is to read these dates quickly into a pandas dataframe in Python, and I can't get pandas to parse this kind of date, presumably because it grabs the 13.15 for the hour, 00.000000 for the minute, and then has nothing left over for the seconds. It errors out. My attempt at a parser was like this:
parser_ibm_db(date_str: str) -> pd.tslib.Timestamp:
return pd.to_datetime(date_str, format='$Y-%m-%d-%H.%M.%S')
but it doesn't work. Neither does the option infer_datetime_format or nothing at all.
So here is my question: is there a way either to control the
formatting of the CAST function better, or is there a way to read
the result into pandas? I'd be perfectly happy with either approach.
One idea I had with the second approach was to limit the %H and %M options somehow to look at only 2 characters, but I don't know how to do that and the documentation doesn't tell me how.
A brute force method would be to read the csv data in, search for these kinds of strings, and replace the first two periods with colons. The date parser would have no trouble with that. But that would involve an extra processing step that I'd rather avoid.
Thanks for your time!

Change your format string:
dt_string = '2020-06-09-13.15.00.000000'
pd.to_datetime(dt_string, format='%Y-%m-%d-%H.%M.%S.%f')
Correctly converts the string:
Timestamp('2020-06-09 13:15:00')

Related

Why does Postgres convert an array of two integers into a timestamp?

I'm working with JSON data that may contain formatted timestamps. I'm converting them to proper timestamps using cast when I came across something odd, a conversion that makes no sense:
select cast('[112,180]'::json#>>'{}' as timestamp with time zone);
Produces a result:
timestamptz
------------------------------
0112-06-28 00:00:00+00:53:28
The first number is interpreted as the year, and the second number the day of the year, but...
I played a bit an discovered the first integer needs to be >= 100, and the second integer needs to be from 100 to 366. Any other values, and other array lengths will fail.
I'm curious as to why this pattern is parsed as a timestamp?
I'd also be happy to know if there was a way to disable this behaviour.
It is parsed as a timestamp because that is what you explicitly told it to do.
It is not an array of two integers, it is the text string consisting of the sequence of characters [112,180], because that is what #>> yields.
It is parsed following the rules documented here (although it doesn't define what a token is, so that is a bit vague), specifically rule 3d followed by 3b.
Redefining date parsing sounds like a giant mess. I would think it be better to make a #>> variant that throws an ERROR (if that is what you want, you didn't say what you wanted to happen, only what you wanted not to happen) when the json_typeof from #> is not string.

Pandas: A better/faster way to convert a column of strings to dates and back again?

I need a column of strings, 'yyyymmdd', converted to date so I can do an operation (subtract one day) and then return the format to string. The combined method works in the line below, but seems to take considerably longer than two separate functions would. Ideas?
df['yyyymmdd-0'] = (pd.to_datetime(df.yyyymmdd, format='%Y%m%d')+ pd.DateOffset(days=0)).dt.strftime('%Y%m%d')

convert pandas time column to datetime column

I have read-only access to data where time is stored as time object (date is irrelevant). I need to subtract a few seconds from each row. So the simplest way I know is to use timedelta, but first, I need to convert time column to datetime column. There should be a straight-forward way to do that; apparently there is not.
Ok, finally I found a solution that was obscured by many wired workarounds. It required format argument: pd.to_datetime(measures.time,format='%H:%M:%S'), where format must be exactly the same as time is formatted.
A one-line solution if anybody will come with similar question:
time = (pd.to_datetime(measures.time,format='%H:%M:%S') - delta).dt.time
where measures is a dataset and time is column name

flexible date parsing

I have a lot of different date format that one of my field can contain. And I'm trying to parse it but it some times doesn't understand the format at all and returns 1900-01-01.
Or sometimes, it invert months, days and year: 2023-12-11 instead of 2012-11-23.
The field is contained in a total of 1500-2500 excel files, that are produced by some kind of scanner. Dates and time are in different cases.
I've seen different formats such as these so far:
yyyy-mm-dd or mm/dd/yy and some others (that i cant find because i dont want to spend the day oppenning random excel files hoping to find a different format ^^')
So... I've tried parsing it at hand (Substring of the different fields), but it still has bugs, so:
Is there any date parsing tool for VB that works often?
I imagine there is a library or something that can parse dates from almost any format already coded, and if I could avoid to recode it I'd be quite happy :)
No, of course there is nothing that can parse dates in any (unknown) format. How should it know what to do with 9/10/11? That can be anything.
So you can use TryParse or TryParseExact (you can even pass a string[] for multiple allowed formats) and pass the correct CultureInfo.

change postgres date format

Is there a way to change the default format of a date in Postgres?
Normally when I query a Postgres database, dates come out as yyyy-mm-dd hh:mm:ss+tz, like 2011-02-21 11:30:00-05.
But one particular program the dates come out yyyy-mm-dd hh:mm:ss.s, that is, there is no time zone and it shows tenths of a second.
Apparently something is changing the default date format, but I don't know what or where. I don't think it's a server-side configuration parameter, because I can access the same database with a different program and I get the format with the timezone.
I care because it appears to be ignoring my "set timezone" calls in addition to changing the format. All times come out EST.
Additional info:
If I write "select somedate from sometable" I get the "no timezone" format. But if I write "select to_char(somedate::timestamptz, 'yyyy-mm-dd hh24:mi:ss-tz')" then timezones work as I would expect.
This really sounds to me like something is setting all timestamps to implicitly be "to_char(date::timestamp, 'yyyy-mm-dd hh24:mi:ss.m')". But I can't find anything in the documentation about how I would do this if I wanted to, nor can I find anything in the code that appears to do this. Though as I don't know what to look for, that doesn't prove much.
Never mind :'(
I found my problem. I was thinking that I was looking directly at the string coming back from the database. But I was overlooking that it was reading it as a Timestamp and then converting the Timestamp to a string. This was buried inside a function called "getString", which is what threw me off. I was thinking it was ResultSet.getString, but it was really our own function with the same name. Oops. What idiot wrote that function?! Oh, it was me ...
Thanks to all who tried to help. I'll give you each an upvote for your trouble.
I believe the table columns are specified differently. Try these variants:
timestamp
timestamp(0) no millis
timestamptz with timezone
timestamptz(0) with timezone, no millis
With which client are you running the select statements? Formatting the output is the application's responsibility, so without knowing which application you use to display the data, it's hard to tell.
Assuming you are using psql, you can change the date format using the SET command:
http://www.postgresql.org/docs/current/static/sql-set.html
Which is essentially a way to change the configuration parameters. The ones that are responsible for formatting data are documented here:
http://www.postgresql.org/docs/current/static/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-FORMAT
Daniel tells me to post my findings as an answer and accept it to close the question. Okay.
I found that the date format I was seeing that did not include a time zone was not what was coming directly from Postgres, but that there were a couple of function calls that I was missing that converted the incoming date to a java.util.Timestamp, and then from the java.util.Timestamp to a String. It was in this conversion from the Timestamp to the String that the time zone was defaulting to EST.
In my own humble defense, my mistake was not as dumb as it may sound. :-0 We had the execution of the query in a subclass that read the results into a List, which we do to allow modification of the query results before output. (In this case we are adding a coule of columns that are derived from the stored columns.) Then we have a set of functions that resemble the JDBC functions to pull the data out of the List, so a calling program can easily switch from processing a query directly to processing the List. When I was wrestling with the date format problem, it just didn't register on me that I wasn't looking at "real JDBC", but at "simulated JDBC" calls.