How to avoid initial data changing when plotting additional data in plot - matplotlib

I want to plot two data series in one plot, but when plotting both data series, one of the series are changing. Matplotlib draws lines between the wrong data.
Firsty_values and secondy_values are lists of timestamps sorted and stretching one 24h interval.
Firstx_values and secondx_values are values in the range 18-21.
The first plot shows the two series together while the last plot shows one of the series alone.
#Firsty_values and secondy_values looks like this:
#['2019-05-04 00:00:03',
# '2019-05-04 00:02:03',
# ...
# '2019-05-04 23:56:03',
# '2019-05-04 23:58:02']
#Firstx_values and secondx_values looks like this:
#[18.32,18.34 ..... 19.32,19.31]
plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length more than 1053
plt.show()
#plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length less than 1053
plt.show()
Output:
Output with both data series :
Output with one data series :
First plot draw lines between data points that does not lie next to each other. The problem seems to be that some of the data points from the second series are put out of order after the points from the first series. This is reflected by the "xticks" showing three lables when ploting both and two lables when ploting one series.

Related

Bar plot from two different datasets with different data range

I have the following datasets:
df1 = {'lower':[3.99,4.99,5.99,1700], 'percentile':[1,2,5,10,50,100]}
df2 = {'lower':[2.99,4.50,5,1850], 'percentile':[2,4,7,15,55,100]}
The data:
The percentile refers to the percentage of the data that corresponds
to a particular price e.g: 3.99 would represent 1% of the data while
all values under 5.99 would represent 5% of the data.
The length of the two datasets is 100 given that we are showing percentiles, but they vary between the two datasets as the price.
What I have done so far:
What I need help with:
As you see in the third graph, I can plot the two datasets overlayed, which is what I need, but I have been unsuccessful trying to change the legend and the weird tick x values on the third graph. It is not showing the percentile, or other metrics I might use the x axis with.
Any help?

How to adjust Python linear regression y axis

I have been having Problems with price column every time I try to plot graphs on it and all my graphs have this problem and I want to change it to its actual values instead of decimals
Example of of linear graph
This is the dataframe containing the information of the dataset
Train is the name of dataframe.
Column contains the selected
columns = ['Id', 'year', 'distance_travelled(kms)', 'brand_rank', 'car_age']
for i in columns:
plt.scatter(train[i], y, label='Actual')
plt.xlabel(i)
plt.ylabel('price')
plt.show()

How to visualize 'suicides_no' w.r.t 'gdp_per_capita ($)' for a given country over the years, in the following data frame

The DataFrame can be viewed here: Global Suicide Dataset
I have made a pivot table with country and year as indices using the following code:
df1 = pd.pivot_table(df, index = ['country', 'year'],
values=['suicides_no','gdp_per_capita ($)', 'population', 'suicides/100k pop'],
aggfunc = {"suicides_no" : np.sum
,"gdp_per_capita ($)" : np.mean
,"population" : np.mean
,"suicides/100k pop" : np.mean})
Output:
Now for my project, i want to visualize how does the suicides_no vary with the gdp_per_capita for a country over the years. But I am unable to plot it. Can somebody please help me out?
First lets convert indexes to columns using df1.reset_index(inplace=True)
Now, you can draw this in a scatter plot where the main features are - Year (preferably on x-axis) and suicides_no (on y-axis). The gdp_per_capita will go as size of the dots.
In this case you have two options:
Draw different plots for each country. (gdp will be shown as hue)
sns.catplot(x='year', y='suicides_no', row='country', hue='gdp_per_capita ($)', data=df1)
Draw everything in a single plot. Scatter plot with GDP as dot size, and Country as Color (hue)
sns.scatterplot(x='year', y='suicides_no', hue='country', size='gdp_per_capita ($)', data=df1)

How to plot a stacked bar using the groupby data from the dataframe in python?

I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.
from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)

How to force a time series to plot on the whole year, when data is only a few months

I have multiple time series each having a different beginning and end time. When I plot them using pandas and matplotlib I get nice graphs beginning from t0 and ending at tx for each individual series. I know that I cannot plot different length series in one plot, but i would like to at least view them with the months lining up.
For example, say I have two series: 1, begins April and ends September, 2 begins February and ends December.
How do visualize them so that each series is plotted on a yearly graph (Jan to Dec) even though the data does not span those dates? I want to see them one above the other they lining up according to months.
I have it like this so far, with xlim=('jan', 'dec'), but I just get blank plots
for dfl in dfl_list[0:2]:
dfl.plot(x='DateTime', y=['VWCmax', 'VWCmin'],
ax=p1, fontsize=15, xlim=('Jan', 'Dec'))
p1.set_title('Time vs VWC', fontsize=15)
p1.set_ylabel('VWC (%) ' + '{}'.format(imei), fontsize=15)
p1.set_xlabel('Time Stamp', fontsize=15)
I've also tried xticks instead of xlim, but I also get blank plots.
The problem that I was having was that I thought that the argument for xlim could be be the strings 'Jan', and 'Dec', this ended up returning blank graphs because pyplot did not know how to fit a graph on string type. the solution is that xlim has to be passed datetime arguments:
for dfl in dfl_list[0:2]:
dfl.plot(x='DateTime', y=['VWCmax', 'VWCmin'],
ax=p1, fontsize=15, xlim=(datetime(2017,1,1), datetime(2017,12,31))
p1.set_title('Time vs VWC', fontsize=15)
p1.set_ylabel('VWC (%) ' + '{}'.format(imei), fontsize=15)
p1.set_xlabel('Time Stamp', fontsize=15)