Import Date column in a particular format - pentaho

I have an excel file which has a date column with different datatypes.
For eg: few values have Date data-type while others have string format.
I tried to import the data and change the column metadata type to string but it changes the date values completely.
I have attached a few screenshots of the data, and I very new to Pentaho, so can anybody help me understand how to tackle this problem.
I tried changing the metadata type or using str2date function in js step but still, no use as data imported is different from the data from the file

When importing from an Excel sheet with some invalid dates, you can import as string format, then use a Select Values step.
Specify the date field on the Meta-data tab with the correct format (dd/MM/yyyy) and Date format Lenient? set to Y. This should change the 29/02/2017 to 01/03/2017, which is a decent option.
Also, don't use Excel to inspect the results, because it might be screwing up the conversion on re-import. Look at the preview data in Spoon or export to csv and look with a text editor to see if the format is correct first.

After changing the data type and importing the date values as string. Using SQL to store the date in string format and then formatting it while retrieving solved the issue.

Related

Determine column date format in csv file while reading to data frame in Pandas

I need to auto detect all column data types in any csv file that comes in. The csv file structure/column names are known ahead.
One of the challenges I am facing is to determine what original date format a column has if at all it is a date string.
E.g. if I have
ITEM,SALEDATE,SALEPRICE
IT01,05-JAN-20,100.00
IT02,14-FEB-21,79.00
For ITEM I need to return string
For SALEDATE I need to return 'DD-MM-YY' or %d-%b-%Y
For SALEPRICE I need to return float.
The main challenge is SALEDATE.
I have tried few libraries like dateutil.parser. Even though they convert the string to datetime, they do not return the original string date format.
Please let me know how to achieve this.

Importing excel file with swapped date formats python

I have an excel file where the data starts with format MM/DD/YYYY hh:mm:ss, later after few entries it switches back to DD/MM/YYYY hh:mm:ss. This sequence happens a couple of times in the data set. Is there a way to import this into python? The convert to datetime by specifying format doesn't work because there are different formats.
The dataframe looks like this:
If the datetimes are actual datetimes in Excel as well, re-format them first, and then read into Python. Select the whole column, open the format window (CTRL + 1 on Windows), go to the "Number" panel, choose "Date," and then choose an appropriate format for your data.
With the dates all in the same format, now read the file into Python using pandas or another library.

Pentaho date format issue

my input excel sheet has the field with two different types of values column in the format YYYY/MM/DD
Now, when I have added the excel sheet into Pentaho the columns along with datatype I got which shows string datatype in the date formats column. which you can see below
After this, I tried to integrate with postgres but I am unable to find the result the error which I got attached below
Updated
I tried with the given timestamp format yyyy/MM/dd HH:mm:ss this works fine for me but this format yyyy/MM/dd hh.00.00 is not present in the format column.
You have a column named Date in the field definition tab. Choose Date [data type] from the Dropdown box in the row date [column name] and timestamp [column name].
Try to get the data in the Excel Input step. If it does not work, try to write yyyy/MM/dd in the Format column for the date field and yyyy/MM/dd hh.00.00 for timestamp field. Note that the format is most probably unnecessary for the Excel Input.
Once you can get all the row in the Excel Input with a preview limit 0, and not before, try to put the data in the Postgres database.
Normally it should work. If not, use a Select Step which has a tab Meta-data, to change among other the format of the dates. Chose a format accepted by Postgres. Again, this change of format is most probably unnecessary.
To explain what happens under the wood, remember each field in the PDI has a type. You define time and timestamp as string. It is not an issue by itself, until you try to put those string in the database, which no not accept such date formats. The best way is to use the date type (which DO NOT have a format until you want to read or write them).
Select or put the corresponding format in [Format] column on [Fields] Tab.

Excel date convertion error

Thanks for checking.
I exported a DUMP from JDE(JD Edwards enterprise Resourse Planning Application) to an excel file. i intend importing the excel file to MS SQL Server so that each departments can run a report from my application which i will then export as an excel file.
But the issue i am having is that the date in my imported excel file is not in the same format. some are strings while some are date although they look like a valid date. the same applies to the field containing the total amount for each transaction. So when i try to import to MS SQL all the data that are not in date format are replaced with null.
Please does any have an idea of how i can make all the values in the date column to have the same data type. i have tried using the following suggestions but haven't gotten a result yet.
=DATEVALUE(10/03/2014)
`10/03/2014
Any suggestion would be appreciated. Even if it is a macro.
I solved a similar problem to the above that may help someone. Excel interpreted dates imported from CSV as text (mm/dd/yy) and some as dates (mm/dd/yyyy), but applied my local settings (dd/mm/yyyy). The following fixed the mix of text and dates:
=IFERROR(
DATE(
MID(B1,FIND("/", B1,FIND("/", B1)+1)+1,LEN(B1)-(FIND("/", B1,FIND("/", B1)+1)))+2000,
LEFT(B1,FIND("/", B1)-1),
MID(B1,FIND("/", B1,FIND("/", B1))+1,FIND("/", B1,FIND("/", B1)+1)-FIND("/", B1)-1)),
DATE(YEAR(B1),DAY(B1),MONTH(B1)))
The first "DATE" uses the "/" character in the text to extract the date.
NOTE: formula assumes yy to be 20yy.
The last "DATE" reverses the "DAY" and "MONTH" of the incorrectly interpreted dates.
The "IFERROR" defaults to the last date when "FIND" does not locate "/" in an Excel date (456789).
Could it be that your windows / excel regional settings for date format are different than the the JDE dump?
Check the "dates" that are recognized by changing the format to display the month as text to verify that it is actually interpreted correctly.

Importing data from excel and inserting into sql tables

I am Importing data from excel and inserting into a sql table. One of the fields in my Excel file is populated with a time value. I want to import that time value as a string into my sql table. When i do that i get a weird value in my sql table. Excel value is: 07:00 and after inserting that as a string into the sql table the time values looks like this: 0,29166666667.
The reason for importing it as a string value is that you have to be able to define Days in the same field. Like this : D2 10:30. When i import this kind of values it is inserted correctly.
can anyone help ?
Excel stores dates and times as number-values from 0 to 0.99999999 +x days.
0.29166666667 would be like 00.01.1900 07:00:00, which seems to be correct in your case.
So, you would have to use some reformatting or conversion of this value, before using it as a direct string-input.
In VBA you could use Format(myValue,"DD-MM-YYYY hh:mm:ss").
The equivalent worksheet function would be TEXT(A1,"DD-MM-YYYY hh:mm:ss").
The format-code depends on your regional settings. You might want to try something like this Format(myTime, "Long Time"), if you preffer to use excel-defined time-formats.
Because you did not post any code, I am not sure about how you import your excel-data. But I would say, the fastest way to get better results, would be setting up a new column, using the TEXT formula with a reference to the previously time-column and use this new formatted column as input for your sql-db.