I am trying to plot a scatterplot using a dataset that has three columns, first column being dates and the second column being the xaxis data, and third column being the yaxis data corresponding to those dates.
Is there a way to plot the datapoints such that they are plotted light to dark as we plot from the very first date in the dataset to today?
Related
I have been having Problems with price column every time I try to plot graphs on it and all my graphs have this problem and I want to change it to its actual values instead of decimals
Example of of linear graph
This is the dataframe containing the information of the dataset
Train is the name of dataframe.
Column contains the selected
columns = ['Id', 'year', 'distance_travelled(kms)', 'brand_rank', 'car_age']
for i in columns:
plt.scatter(train[i], y, label='Actual')
plt.xlabel(i)
plt.ylabel('price')
plt.show()
I am trying to make a plotly line chart that shows team member progression with the following excel data:
For the life of me I cannot figure out how to set the team member names as the color for the lines, the x-axis as the months, and the y-axis as the numeric values. Closest I've gotten is a blank graph, and I've tried about 400 combinations of parameters for
px.line(
df,
color="Team_Member",
x = "the row of months... something with iloc maybe?",
y = df.columns,
title="Average Estimated Daily Working Hours by Team Member"
)
I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.
from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)
I want to plot two data series in one plot, but when plotting both data series, one of the series are changing. Matplotlib draws lines between the wrong data.
Firsty_values and secondy_values are lists of timestamps sorted and stretching one 24h interval.
Firstx_values and secondx_values are values in the range 18-21.
The first plot shows the two series together while the last plot shows one of the series alone.
#Firsty_values and secondy_values looks like this:
#['2019-05-04 00:00:03',
# '2019-05-04 00:02:03',
# ...
# '2019-05-04 23:56:03',
# '2019-05-04 23:58:02']
#Firstx_values and secondx_values looks like this:
#[18.32,18.34 ..... 19.32,19.31]
plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length more than 1053
plt.show()
#plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length less than 1053
plt.show()
Output:
Output with both data series :
Output with one data series :
First plot draw lines between data points that does not lie next to each other. The problem seems to be that some of the data points from the second series are put out of order after the points from the first series. This is reflected by the "xticks" showing three lables when ploting both and two lables when ploting one series.
I am having an issue where the logarithmic function is behaving differently depending on the type of graph I use with the same data. When I generate the equation by hand, it returns the scatterplot linear trendline, but the slope function and linear graph produce a different trendline.
Linear vs Scatter
The equation for the scatter plot logarithmic line is:
y = -0.079ln(x) + 0.424
The equation for the linear plot trenline is:
y = -0.052ln(x) + 0.3138
I can generate the linear plot trenline slope using this equation:
=SLOPE(B2:B64,LN(A2:A64)) = -0.052
But using the general slope equation, I get the scatter plot trendline (using SQL):
SELECT SUM(multipliedresiduals) / SUM(xresidsquared)
FROM (
SELECT *
,log(x.x) - l.avgx xresiduals
,x.y - l.avgy yresiduals
,power(log(x.x) - l.avgx, 2) xresidsquared
,((log(x.x) - l.avgx) * (x.y - l.avgy)) multipliedresiduals
FROM ##logtest x
CROSS JOIN (
SELECT avg(log(x)) avgx
,avg(y) avgy
FROM ##logtest l
) l
) z = -0.0789746757495071 (Scatter Plot Slope)
What's going on? I'm mainly interested in replicating the linear plot trenline equation in SQL
Here is the data:
https://docs.google.com/spreadsheets/d/1sOlyXaHnUcCuD9J28cKHnrhhcr2hvYSU1iCNWXcTqEA/edit?usp=sharing
Here is the Excel File:
https://www.dropbox.com/s/6hpd4bzvmbxe5pu/ScatterLinearTest.xlsx?dl=0
Line and Scatter graphs in Excel are quite different with regard to the X-axis. In the case of a scatter graph, the x-axis represents actual values. In the case of a line graph, the x-axis are labels. If you try to compute a slope, with a line graph, the x-axis will have the values of 1,2,3,4, ... no matter what the label shows (e.g: even if it shows 7..69). With a scatter graph, the x-axis will have the value of the label.
In your case, the difference between the two slopes can be explained by the x-axis line graph values starting at 1 (even though it is labelled 7); and the x-axis scatter graph values starting at 7 -- the actual value.
So, in fact, the real slope for the the data you present, with "X" starting at a value of "7", is the slope you get from the scatter graph data, which is the same as you are getting in your SQL.
In order for the SQL equation to match the linear plot trendline equation, you would need to replace the x-axis values with a series [1..n] instead of the actual x-axis values.
I don't have SQL, but the results of these two SLOPE formulas should clarify what I am writing:
Scatter plot: =SLOPE(B2:B64,LN(ROW(INDIRECT("7:69")))) -0.078974676
Line Plot: =SLOPE(B2:B64,LN(ROW(INDIRECT("1:63")))) -0.051735504
The first is the Scatter plot, the second is the Line plot