Matplotlib output line chart looks "box like" (for lack of a better word) for monthly data sampled over a 30 year period - pandas

I am doing a very simple chart with Matplotlib and Python. It is 30 years worth of monthly sampled data (PMI - US Purchasing Manager Index). All up it has around 400 monthly observation.
Sample monthly data:
Date
PMI
1/03/2022
57.1
1/02/2022
58.6
1/01/2022
57.6
1/12/2021
58.8
1/11/2021
60.6
1/10/2021
60.8
1/09/2021
60.5
I produced a very simple line chart with Matplotlib. Dataframe name is pmi. Date is not set to index but dates are set to pandas.datetime.
plt.plot(pmi.Date, pmi.PMI, c='mediumblue', lw=0.8)
Output:
Why does the output look so box like. It seems to me like it just doesn't capture all the data available in the dataframe. I'm sure it does though, so is this a formatting issue? How do you smooth this output line out so to remove sharp, edge like breaks?

Related

Style specific rows in multiindex dataframe

I have a pandas dataframe that looks like:
Year 2019 2020
Decision Applied Admitted % Applied Admitted %
Class Residency
Freshmen Resident 1143.0 918.0 80.3 1094.0 1003.0 91.7
Non-Resident 1371.0 1048.0 76.4 1223.0 1090.0 89.1
Total 2514.0 1966.0 78.2 2317.0 2093.0 90.3
Transfer Resident 404.0 358.0 88.6 406.0 354.0 87.2
Non-Resident 371.0 313.0 84.4 356.0 288.0 80.9
Total 775.0 671.0 86.6 762.0 642.0 84.3
Grad/Postbacc Total 418.0 311.0 74.4 374.0 282.0 75.4
Grand Total 3707.0 2948.0 79.5 3453.0 3017.0 87.4
note: Full MWE is in this question.
I'm trying to italicize the total rows (here that's rows 3,6,7,8) and bold the grand total row (row 8) in a way that doesn't rely on actual row numbers.
I can do that with:
df_totals.style.apply(lambda x:["font-style: italic;"]*len(x),subset=((slice(None),"Total"),))\
.applymap_index(lambda x:"font-style: italic;" if x in ("Grand","Total") else "")
That just seems super unpythonic, ugly, and unmaintainable to me, especially the call to applymap_index. Is there a more fluent way of doing this?
First part should be simplify by Styler.set_properties, second part is good in my opinion, there is only small change by example in Styler.applymap_index:
df_totals.style.set_properties(**{'font-style': 'italic'}, subset=((slice(None),"Total")))
.applymap_index(lambda x:"font-style: italic;" if x in ("Grand","Total") else None)

how to show numeric data in seaborn

I am analyzing the covid 19 data in Seaborn, I have taken a specific state Maharashtra I have given the x='Dates' and for y = "Deaths" data = maha and color = g like this
but when I run my output of Date becomes messed up. like this
How do I show "date" as in date format like 2020-05-03,
Please provide a solution on how I can achieve this format

Pandas variable using lagged values

I am trying to write Pandas code to calculate accounting depreciations. The math is very simple:
depreciation(t)=depr_rate * [cumsum(investments(t)) - cumsum(depreciation(t-1))]
For example (with a depreciation rate of 30%):
Year 1: Investments 100 and depreciations 30
Year 2: Investments 100 and depreciations 51
Year 3: Investments 0 and depreciations 35.7
I am looking for a way to do this in Pandas.
My fix so far to convert the column to a list and then iterate over the list as follows:
inv_cum=list(df_inv.investment_cumsum)
depr_list=[]
for num in range(len(inv_cum)):
depr_list.append(depr_rate*(inv_cum[num]-sum(depr_list[:max(0,num)])))
df_inv['depreciations']=depr_list
The fix works fine but I would prefer to use Pandas functionality. All help will be appreciated.

Unable to slice year from date column using negative indexing with pandas

I have a simple data set, where we have a Dates column from which I want to extract the year.
I am using the negative index to get the year
d0['Year'] = d0['Dates'].apply(lambda x: x[-1:-5])
This normally works, however, not on this. A blank column is created.
I sampled the column for some of the data and saw no odd characters present.
I have tried the following variations
d0['Year'] = d0['Dates'].apply(lambda x: str(x)[-1:-5]) # column is created and it is blank.
d0['Year'] = d0.Dates.str.extract('\d{4}') # gives an error "ValueError: pattern contains no capture groups"
d0['Year'] = d0['Dates'].apply(lambda x: str(x).replace('[^a-zA-Z0-9_-]','a')[-1:-5]) # same - gives a blank column
Really not sure what other options I have and where is the issue.
What possibly can be the issue?
Below is a sample dump of the data I have
Outbreak,Dates,Region,Tornadoes,Fatalities,Notes
2000 Southwest Georgia tornado outbreak,"February 13–14, 2000",Georgia,17,18,"Produced a series of strong and deadly tornadoes that struck areas in and around Camilla, Meigs, and Omega, Georgia. Weaker tornadoes impacted other states."
2000 Fort Worth tornado,"March 28, 2000",U.S. South,10,2,"Small outbreak produced an F3 that hit downtown Fort Worth, Texas, severely damaging skyscrapers and killing two. Another F3 caused major damage in Arlington and Grand Prairie."
2000 Easter Sunday tornado outbreak,"April 23, 2000","Oklahoma, Texas, Louisiana, Arkansas",33,0,
"2000 Brady, Nebraska tornado","May 17, 2000",Nebraska,1,0,"Highly photographed F3 passed near Brady, Nebraska."
2000 Granite Falls tornado,"July 25, 2000","Granite Falls, Minnesota",1,1,"F4 struck Granite Falls, causing major damage and killing one person."
To extract year from "Dates" column , as object type use
da['Year'] = da['Dates'].apply(lambda x: x[-4:])
If you want to use it as int then , you could do following operations after doing the step above
da['Year']=pd.to_numeric(da['Year'])

Strange behaviour of timeslice in Sumo Logic

I have this query in sumo:
_sourceCategory=my_product
| timeslice 1h
| count by _timeslice
In my aggregates list I have a 1h difference between the data:
but in my graph I have a 15 mins segment like this:
So my question is: where does this 15 min segments come from?
Every search result includes a histogram that shows the number of results over time -- this is what your screenshot shows.
The actual results of your query will be shown below that, in the Messages/Aggregates area. Choose the Bar Chart visualisation to see the search results with your hour timeslice.
https://help.sumologic.com/Search/Search-Query-Language/Search-Operators/timeslice