pandas exporting to TZ time format in .json file - pandas

I'm currently doing this to generate a dataframe:
dataframe = pd.read_sql("select date_trunc('minute', log_time) as date, .....
my output is a time that looks like this:
"date":"2020-06-01 00:08:00.000"
What I want to do is have a time output that looks like this in the json file that it is outputted to:
"date":"2020-06-08T23:01:00.000Z
I found documents that show you how to remove it but not sure how to add it. do I have to do this after the dataframe is made or is there something in my date_trun( command that should put it in this format

Based off our conversation in the comments section, I have edited your question and added in the JSON file that it is outputted to to the line What I want to do is have a time output that looks like this in the JSON file that it is outputted to: At the end of the data, the only thing that matters is the raw value to be accurate in your JSON file. Don't worry about what it looks like in your Jupyter Notebook. I think this is a common mistake that people make and one that I have made in the past as well.
I would suggest not worrying about the datetime format in pandas. Just go with pandas default date/time until the very end.
THEN, as a final step, just before exporting to a JSON, change the format of the field to:
df['TIME'] = pd.to_datetime(df['TIME']).dt.strftime('%Y-%m-%dT%H:%M:%S.%f').str[:-3] + ['Z']
That will change to a format of 2020-06-08T23:01:00.000Z .
Note .str[:-3] is required because strftime doesn't support milliseconds (3 decimals) according to the documentation and only micorseconds (6 decimals). As such, you need to truncate the last 3 decimals to change to millisecond format.
That specific format is not directly supported with T and Z, so I did a little bit of string manipulation.

Related

Pandas incorrectly reading csv file

I was trying to read one csv file using;
df= pd.read_csv('Diff_Report.csv',on_bad_lines='skip',encoding='cp1252',index_col=None)
Input Example
But the code outputs as in the following screenshot. Why is it happening like this?
Output
#Ammu07
Try using pd.read_excel()
Solution 1: -
It looks like you are displaying the first 5 rows of df2
Solution 2: -
Check if it is in another encoding or just utf encoding.
Solution 3: -
CSV files are separated by commas, but maybe your data contains a comma, which should be cleared.
Solution 4: -
Check if your data is exactly like the input or if it is separated by commas.
Tip: - Try adding index_col as id.

Octave dlmread won't read date format

I have a csv file, the one from https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba/downloads/waves-measuring-buoys-data-mooloolaba.zip/1. The first entries look like this:
The first column has dates which I'm trying to read with this command:
matrix = dlmread ('waves-measuring-buoys-data/WavesMooloolabaJan2017toJun2019.csv',',',1,0);
(If referring to file on Kaggle, note that I slightly modified the directory and file names for ease of reading)
Then when I check a date by printing matrix(2,1), I get 1 instead of 01/01/2017 00:00.
How do I get the correct format?
csvread is only for numeric inputs.
Use csv2cell from the io package instead, to obtain your data as a string, and then perform any necessary string operatios and conversions accordingly.

Converting a massive JSON file to CSV

I have a JSON file which is 48MB (collection of tweets I data mined). I need to convert the JSON file to CSV so I can import it into a SQL database and cleanse it.
I've tried every JSON to CSV converter but they all come back with the same result of "file exceeds limits" / the file is too large. Is there a good method of converting such a massive JSON file to CSV within a short period of time?
Thank you!
A 48mb json file is pretty small. You should be able to load the data into memory using something like this
import json
with open('data.json') as data_file:
data = json.load(data_file)
Dependending on how you wrote to the json file, data may be a list, which contains many dictionaries. Try running:
type(data)
If the type is a list, then iterate over each element and inspect it. For instance:
for row in data:
print(type(row))
# print(row.keys())
If row is a dict instance, then inspect the keys and within the loop, start building up what each row of the CSV should contain, then you can either use pandas, the csv module or just open a file and write line by line with commas yourself.
So maybe something like:
import json
with open('data.json') as data_file:
data = json.load(data_file)
with open('some_file.txt', 'w') as f:
for row in data:
user = row['username']
text = row['tweet_text']
created = row['timestamp']
joined = ",".join([user, text, created])
f.write(joined)
You may still run into issues with unicode characters, commas within your data, etc...but this is a general guide.

Date in Excel from SQL

I have an excel created from a comma-delimited text file originally from a .sql file with an SQL INSERT query.
In one of the columns I have: "Cast(0x123456AB...) As TIME
Obviously this is NOT the jsondate format... so no help from that question...
I replaced the Cast( and replaced the ") As TIME" with empty strings.
So now I have the time values in hexadecimal.
How do I convert them into Excel Time or Datetime?
OK Playing around with it showed me that it's exactly the same as the jquery date answer. You take the numeric portion starting with 0x.
Take the 10 digits AFTER the 0x. e.g. in A2: =MID(A1, 3, 10)
Turn it into hexadecimal e.g. in A3: = HEX2DEC(A2)
Divide by 86400 e.g. A4: =A3/86400
And add the result to 1/1/1970 date. e.g. = A5: =A4 + Date(1970, 1, 1)
Or in short:
=(hex2dec(mid(a1,numstart,10))/86400) + date(1970,1,1)
Replace numstart with the 1-starting index of the number.
e.g. 3 if you have a 12 or 13 digit number like 0x12345678AB and you'll get 12345678AB
This is similar to the Convert JSON Date /Date(1388624400000)/ to Date in Excel
Except that:
a. The question was answered wrong and wouldn't work. (I edited it)
b. The .sql file was retrieved in a stored procedure from the database via SQL. While in the question they were using jquery returned ajax data, which seemed to differ. Turns out they're the same number with a different format.
As an added remark, I had a space mark at the beginning of my hex number. Until I did the MID on it, I didn't see that.
Note: When using ajax returned formatted dates like /date:0x12345678ab/ you'll set numstart to 8. If hex2dec fails, try turning the hex string into uppercase
before calling hex2dec. To debug just put each formula in a separate cell, so you see what works and what doesn't.

Convert a string into a time interval (?)

I am loading in strings that represent time frames with the format DD days, HH:MM:SS, and I want to create a graph in which these time frames are the expression. That means that I need to convert them into some number-like format. Interval() doesn't seem to be working. Any ideas? Should I reformat the string prior to loading to something other than DD days, HH:MM:SS so that it is more easily usable?
There are lots of ways you could go about solving this. Your idea of reformatting prior to loading sounds like a good option, since QlikView doesn't support regular expressions.
I think the best way to solve your problem is to write a simple Python script that replaces the spaces, comma, and letters with a colon. Then, you'll have a much more workable string in DD:HH:MM:SS format. Of course, this assumes you have a flat file that you can easily tweak:
import re
myreg = re.compile('(\s(day|days),\s)') ## we'll look for " day, " and " days, "
newfilestr = ""
with open('myfile.txt', 'r') as myfile:
for line in myfile:
newfilestr += re.sub(myreg, ':', line)
outputf = open('fixedtimeformat.txt', 'w')
outputf.write(newfilestr)
outputf.close()