Is there a way of using Pandas or Matplotlib to plot Pandas Time Series density? - pandas

I am having a hard time of plotting the density of Pandas time series.
I have a data frame with perfectly organised timestamps, like below:
It's a web log, and I want to show the density of the timestamp, which indicates how many visitors in certain period of time.
My solution atm is extracting the year, month, week and day of each timestamp, and group them. Like below:
But I don't think it would be a efficient way of dealing with time. And I couldn't find any good info on this, more of them are about plot the calculated values on a date or something.
So, anybody have any suggestions on how to plot Pandas time series?
Much appreciated!

The best way to compute the values you want to plot is to use Series.resample; for example, to aggregate the count of dates daily, use this:
ser = pd.Series(1, index=dates)
ser.resample('D').sum()
The documentation there has more details depending on exactly how you want to resample & aggregate the data.
If you want to plot the result, you can use Pandas built-in plotting capabilities; for example:
ser.resample('D').sum().plot()
More info on plotting is here.

Related

How to plot timeseries with many NaNs?

Originally I had a dataframe containing power consumption of some devices like this:
and I wanted to plot power consumption vs time for different devices, one plot per one of 6 possible dates. After grouping by date I got plots like this one (for each group = date):
Then I tried to create similar plot, but switch date and device roles so that it is grouped by device and colored by date. In order to do it I prepared this dataframe:
It is similar to the previous one, but has many NaN values due to differing measurement times. I thought it won't be a problem, but then after grouping by device, subplots look like this one (ex is just a name of sub-dataframe extracted from loop going through groups = devices):
This is the ex dataframe (mean lag between observations is around 20 seconds)
Question: What should I do to make plot grouped by device look like ones grouped by date? (I'd like to use ex dataframe but handle NaNs somehow.)
I found solution in answer to similar question: ex.interpolate(method='linear').plot(). This line will fill gaps between data points via interpolation between plotting. This is the result:
Another thing that can help is adding .plot(marker='o', ms = 3) which won't fill gaps between points, but at least will make points visible (previously some points, mainly the peaks in energy consumption were too small in scale of whole plot). This is the result:

pandas -- label a DatetimeIndex by interval center?

Is there is a way to label a pandas DatetimeIndex by its center value, e.g. the middle of the month for a monthly interval? For me, this would be much more intuitive than the default labels at the beginning or end of the interval.
Unfortunately, my understanding from the pandas documentation is that the only default label values are 'left' and 'right'.
This question does provide a way to calculate the middle of an interval:
Pandas computer hourly average and set at middle of interval
However, I am wondering if there might be more direct approach available.

How to extract values from a dataframe in Julia

I would like to plot the energy per spin < E >/N against the temperature T.
However, I am not sure how to "extract" the values from the table below and plot them.
Just do:
using Plots
plot(data.T, data.Emean)
This is the simplest way to get a column from a data frame.
You might also want to check out this notebook: https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/02_basicinfo.ipynb in the section "
Most elementary get and set operations".

Bin and sum over time for precipitation data in CSV form

I have a large CSV file with this formatting: month, year, lat, long, rainfall
How do I bin and sum over time by year? Also, different Q, how do I separate the data out into three bins of rainfalls in different basins?
it will better if you can post a example of the data frame you have, but something like this might work:
# read into data frame
df = pd.read_csv('your_csv_path')
# groupby year and get sum
df.groupby('year')['rainfall'].sum().reset_index(name='rainfall_sum)
for grouping into basins, I assume you need to make a scatter plot first, you might need a clustering algorithm. take a look at various algorithm in sklearn: https://scikit-learn.org/stable/modules/clustering.html

Setting the axis custom limits matplotlib dataframe

Across a list of dataframes (dflist), each showing some sensor readings in a 24 hour window, I am setting the y axis limits for these readings in matplotlib.
axes[3].set_ylim(dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].min(),dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].max())
So for each df in my list, a graph is produced. Unfortunately the first 10 minutes of readings throws of the scale dramatically, and I can't interpret the readings.
Now, for each df, instead of setting the minimum sensor reading as the ymin, could I tell the df to ignore the first 10 minutes (which is the first 10 readings, as I have 1 minute a reading) and take the min in the rest of the data?
You can use a boolean mask in pandas that filters out undesired values.
You didn't provide the structure of your dataframe, so I'm just writing something that gives you the right idea:
dflist[day[day['minute'] > 10]]['AS_%s_WE_%d(mv)' %(gas,sensor)].min()
Essentially you are indexing each row of day with a boolean value that is mapped to the dataframe using a conditional expression.