I have an excel file where the data starts with format MM/DD/YYYY hh:mm:ss, later after few entries it switches back to DD/MM/YYYY hh:mm:ss. This sequence happens a couple of times in the data set. Is there a way to import this into python? The convert to datetime by specifying format doesn't work because there are different formats.
The dataframe looks like this:
If the datetimes are actual datetimes in Excel as well, re-format them first, and then read into Python. Select the whole column, open the format window (CTRL + 1 on Windows), go to the "Number" panel, choose "Date," and then choose an appropriate format for your data.
With the dates all in the same format, now read the file into Python using pandas or another library.
Related
I've been tasked with exporting record metadata from a documentum repo using DQMan.
However, I frequently encounter issues with date formats. They output in DQMan in "dd/mm/yyyy hh:ss" format, but when 'Exporting to Excel' the dates seem to output in "mm/dd/yyyy hh:ss", while my Excel is setup for "dd/mm/yyyy", and so the dates are either interpreted incorrectly or as text.
Edit: I have tried to establish the precise series of events, and I reckon the issue must be how DQMan is exporting into Excel. It must be telling Excel to expect a certain type of format but sending it something else.
SYSTEM
OPERATION
DATABASE
Stores 'DateValue' for Record Date Attributes; displays as local user date format.
DQMAN
Queries repo with DQL criteria and reports 'DateValue'; Reports in the default format for the windows user (UK Format for us!).
DQMAN
Runs 'Export to Excel' Function.
DQMAN
Sends delimited text data to Excel (unsure how this part works or is formatted) .
EXCEL
Excel is being told that this is a US DateValue, but raw text sent to Excel in UK format as it appears in DQMan, as per windows settings. (?)
EXCEL
Testing Value to determine datatype… "Is this string a DateValue?"
EXCEL
DateValue of 2015-12-21: Tests "21/12/2015 05:00:00"… NOT A US DATE
EXCEL
DateValue of 2014-08-08, Tests "08/08/2014 05:00:00"… IS A US DATE
EXCEL
DateValue of 2014-01-06, Tests "06/01/2014 05:00:00"… IS A US DATE
EXCEL
If the value is evaluated as NOT A US DATE , the value is added to Excel as plain text.
EXCEL
If the value is evaluated as IS A US DATE , the US format text string is added to Excel as a UK DateValue; this gives the wrong value.
I'm not sure if dqMan uses dfc.properties but if it is, you can use dfc.date_format parameter to adjust date to something that suits you more (ref)
However, that won't help you with exported date format. Maybe exporting to CSV as middle step?
I have an excel file which has a date column with different datatypes.
For eg: few values have Date data-type while others have string format.
I tried to import the data and change the column metadata type to string but it changes the date values completely.
I have attached a few screenshots of the data, and I very new to Pentaho, so can anybody help me understand how to tackle this problem.
I tried changing the metadata type or using str2date function in js step but still, no use as data imported is different from the data from the file
When importing from an Excel sheet with some invalid dates, you can import as string format, then use a Select Values step.
Specify the date field on the Meta-data tab with the correct format (dd/MM/yyyy) and Date format Lenient? set to Y. This should change the 29/02/2017 to 01/03/2017, which is a decent option.
Also, don't use Excel to inspect the results, because it might be screwing up the conversion on re-import. Look at the preview data in Spoon or export to csv and look with a text editor to see if the format is correct first.
After changing the data type and importing the date values as string. Using SQL to store the date in string format and then formatting it while retrieving solved the issue.
my input excel sheet has the field with two different types of values column in the format YYYY/MM/DD
Now, when I have added the excel sheet into Pentaho the columns along with datatype I got which shows string datatype in the date formats column. which you can see below
After this, I tried to integrate with postgres but I am unable to find the result the error which I got attached below
Updated
I tried with the given timestamp format yyyy/MM/dd HH:mm:ss this works fine for me but this format yyyy/MM/dd hh.00.00 is not present in the format column.
You have a column named Date in the field definition tab. Choose Date [data type] from the Dropdown box in the row date [column name] and timestamp [column name].
Try to get the data in the Excel Input step. If it does not work, try to write yyyy/MM/dd in the Format column for the date field and yyyy/MM/dd hh.00.00 for timestamp field. Note that the format is most probably unnecessary for the Excel Input.
Once you can get all the row in the Excel Input with a preview limit 0, and not before, try to put the data in the Postgres database.
Normally it should work. If not, use a Select Step which has a tab Meta-data, to change among other the format of the dates. Chose a format accepted by Postgres. Again, this change of format is most probably unnecessary.
To explain what happens under the wood, remember each field in the PDI has a type. You define time and timestamp as string. It is not an issue by itself, until you try to put those string in the database, which no not accept such date formats. The best way is to use the date type (which DO NOT have a format until you want to read or write them).
Select or put the corresponding format in [Format] column on [Fields] Tab.
Thanks for checking.
I exported a DUMP from JDE(JD Edwards enterprise Resourse Planning Application) to an excel file. i intend importing the excel file to MS SQL Server so that each departments can run a report from my application which i will then export as an excel file.
But the issue i am having is that the date in my imported excel file is not in the same format. some are strings while some are date although they look like a valid date. the same applies to the field containing the total amount for each transaction. So when i try to import to MS SQL all the data that are not in date format are replaced with null.
Please does any have an idea of how i can make all the values in the date column to have the same data type. i have tried using the following suggestions but haven't gotten a result yet.
=DATEVALUE(10/03/2014)
`10/03/2014
Any suggestion would be appreciated. Even if it is a macro.
I solved a similar problem to the above that may help someone. Excel interpreted dates imported from CSV as text (mm/dd/yy) and some as dates (mm/dd/yyyy), but applied my local settings (dd/mm/yyyy). The following fixed the mix of text and dates:
=IFERROR(
DATE(
MID(B1,FIND("/", B1,FIND("/", B1)+1)+1,LEN(B1)-(FIND("/", B1,FIND("/", B1)+1)))+2000,
LEFT(B1,FIND("/", B1)-1),
MID(B1,FIND("/", B1,FIND("/", B1))+1,FIND("/", B1,FIND("/", B1)+1)-FIND("/", B1)-1)),
DATE(YEAR(B1),DAY(B1),MONTH(B1)))
The first "DATE" uses the "/" character in the text to extract the date.
NOTE: formula assumes yy to be 20yy.
The last "DATE" reverses the "DAY" and "MONTH" of the incorrectly interpreted dates.
The "IFERROR" defaults to the last date when "FIND" does not locate "/" in an Excel date (456789).
Could it be that your windows / excel regional settings for date format are different than the the JDE dump?
Check the "dates" that are recognized by changing the format to display the month as text to verify that it is actually interpreted correctly.
I have an input spreadsheet that needs to get sorted by date. The current format of the date is in the UK format (dd/mm/yyyy) but I need it in yyyy-mm-dd (actually I don't, I just need to sort it and that format is the most foolproof way of sorting). This all needs to be done in VBA as it's part of a bigger project that allows a bunch of data collation at once. The other problem is that the input sheet can be quite large (150,000+ rows). So, while I could parse through each row of data and change it around to the way I need, this would be horrifically slow and is NOT an option.
Currently I'm using this bit of code to format the date to yyyy-mm-dd:
inputGADRSheet.Columns(7).NumberFormat = "yyyy-mm-dd"
But, Excel outsmarts me and assumes that the date format of the column is originally in the US format (mm/dd/yyyy) which messes everything up and half of the values in the column don't meet that requirement (days above the 12th) so they don't get formatted at all. Is there any way to tell Excel what format the current data is in? That way it won't just assume that it's in the US date format...
Is the solution to change my Excel region to the UK. I assume this could be done using VBA, but it seems risky...
If your data is already in an Excel column, you can't reinterpret the values: Excel date values are (internally) number, 1 representing 1900-01-01. After the data has been (mis-)interpreted by Excel there's no way back.
The question is: Where do you get the input data sheet from? If the dates are entered correctly, reformatting is possible without any problem and does not affect sorting (which depends only on the numeric value of the date). If your data comes from a text file (probably .csv-kind), be sure to read ii as text and use Excel worksheet functions or VBA to interpret the values.