I have a csv, and one column is date, format dd/mm/yyyy.
I read it using z=pd.read_csv('property_scrape.csv')
My raw data is:
After I read it in, some of the values are kept in the format I downloaded (dd/mm/yyyy), while somewhere in the middle, the dates are converted to yyyy-mm-dd:
27 01/10/2019
28 01/10/2019
29 01/10/2019
...
21092 2020-08-22
21093 2020-08-22
21094 2020-08-22
Name: Date, Length: 21122, dtype: object
Does anyone know why this happens?
Also, is there a way to ensure that this date column is always read the correct/constant way?
The problem is Pandas samples the first row and thinks it is MM/DD/YYYY instead of DD/MM -- there isn't a real way to know this. Then later when it finds 22, which not a valid MM it defaults to object/string.
You can add flag infer_datetime_format=False and it should read as strings then you can parse it -- you can pass a lambda to read_csv as well, not sure if an easier way to just pass a format string -- see article article dsexchange 34357
Related
I'm currently doing this to generate a dataframe:
dataframe = pd.read_sql("select date_trunc('minute', log_time) as date, .....
my output is a time that looks like this:
"date":"2020-06-01 00:08:00.000"
What I want to do is have a time output that looks like this in the json file that it is outputted to:
"date":"2020-06-08T23:01:00.000Z
I found documents that show you how to remove it but not sure how to add it. do I have to do this after the dataframe is made or is there something in my date_trun( command that should put it in this format
Based off our conversation in the comments section, I have edited your question and added in the JSON file that it is outputted to to the line What I want to do is have a time output that looks like this in the JSON file that it is outputted to: At the end of the data, the only thing that matters is the raw value to be accurate in your JSON file. Don't worry about what it looks like in your Jupyter Notebook. I think this is a common mistake that people make and one that I have made in the past as well.
I would suggest not worrying about the datetime format in pandas. Just go with pandas default date/time until the very end.
THEN, as a final step, just before exporting to a JSON, change the format of the field to:
df['TIME'] = pd.to_datetime(df['TIME']).dt.strftime('%Y-%m-%dT%H:%M:%S.%f').str[:-3] + ['Z']
That will change to a format of 2020-06-08T23:01:00.000Z .
Note .str[:-3] is required because strftime doesn't support milliseconds (3 decimals) according to the documentation and only micorseconds (6 decimals). As such, you need to truncate the last 3 decimals to change to millisecond format.
That specific format is not directly supported with T and Z, so I did a little bit of string manipulation.
I have a csv file, the one from https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba/downloads/waves-measuring-buoys-data-mooloolaba.zip/1. The first entries look like this:
The first column has dates which I'm trying to read with this command:
matrix = dlmread ('waves-measuring-buoys-data/WavesMooloolabaJan2017toJun2019.csv',',',1,0);
(If referring to file on Kaggle, note that I slightly modified the directory and file names for ease of reading)
Then when I check a date by printing matrix(2,1), I get 1 instead of 01/01/2017 00:00.
How do I get the correct format?
csvread is only for numeric inputs.
Use csv2cell from the io package instead, to obtain your data as a string, and then perform any necessary string operatios and conversions accordingly.
I have an excel created from a comma-delimited text file originally from a .sql file with an SQL INSERT query.
In one of the columns I have: "Cast(0x123456AB...) As TIME
Obviously this is NOT the jsondate format... so no help from that question...
I replaced the Cast( and replaced the ") As TIME" with empty strings.
So now I have the time values in hexadecimal.
How do I convert them into Excel Time or Datetime?
OK Playing around with it showed me that it's exactly the same as the jquery date answer. You take the numeric portion starting with 0x.
Take the 10 digits AFTER the 0x. e.g. in A2: =MID(A1, 3, 10)
Turn it into hexadecimal e.g. in A3: = HEX2DEC(A2)
Divide by 86400 e.g. A4: =A3/86400
And add the result to 1/1/1970 date. e.g. = A5: =A4 + Date(1970, 1, 1)
Or in short:
=(hex2dec(mid(a1,numstart,10))/86400) + date(1970,1,1)
Replace numstart with the 1-starting index of the number.
e.g. 3 if you have a 12 or 13 digit number like 0x12345678AB and you'll get 12345678AB
This is similar to the Convert JSON Date /Date(1388624400000)/ to Date in Excel
Except that:
a. The question was answered wrong and wouldn't work. (I edited it)
b. The .sql file was retrieved in a stored procedure from the database via SQL. While in the question they were using jquery returned ajax data, which seemed to differ. Turns out they're the same number with a different format.
As an added remark, I had a space mark at the beginning of my hex number. Until I did the MID on it, I didn't see that.
Note: When using ajax returned formatted dates like /date:0x12345678ab/ you'll set numstart to 8. If hex2dec fails, try turning the hex string into uppercase
before calling hex2dec. To debug just put each formula in a separate cell, so you see what works and what doesn't.
I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000
I have banged my head against this for too long now!
I have a string: 2012-09-27T18:00:00.000-04:00
I have a format: [dateFormatter setDateFormat:#"yyyy-MM-dd'T'HH:mm:ss'Z'"];
I get a null result converting the string.
Can someone help with the correct date format?
According to the documentation for NSDateFormatter:
The format string uses the format patterns from the Unicode Technical Standard #35. The version of the standard varies with release of the operating system:
Formatters in OS X v10.8 and iOS 6.0 use [version tr35-25].
Following that link:
s 1..2 12 Second. Use one or two for zero padding.
In other words, s for seconds means only the integral part.
But right underneath that, there's:
S 1..n 3456 Fractional Second - truncates (like other time fields) to the count of letters. (example shows display using pattern SSSS for seconds value 12.34567)
Then, the reason you aren't matching the timezone is that you're quoting the Z, so it matches a literal Z rather than a timezone format.
To match the ISO8601 timezone format you're seeing, the same documentation says you want ZZZZZ.
So, it looks like, at least for 10.8/iOS 6, you want:
[dateFormatter setDateFormat:#"yyyy-MM-dd'T'HH:mm:ss'.'SSSZZZZZ"]
For earlier versions, you've got the links to the docs; you should be able to figure it out now.
Testing this (in Python, to save a few lines of code):
>>> import Cocoa
>>> df = Cocoa.NSDateFormatter.alloc().init()
>>> df.setDateFormat_("yyyy-MM-dd'T'HH:mm:ss'.'SSSZZZZZ")
>>> print df.dateFromString_(2012-09-27T18:00:00.000-04:00')
2012-09-27 22:00:00 +0000