Need help plot 1 to n y-series with matplotlib

Need help plot 1 to n y-series with matplotlib - matplotlib

I have a problem that the user of my script want to be able to print 1 - n graphs of the type account (ex 1930,1940 etc) and the sum for every account for every year.
The graph I want to plot should look like this (in this ex 2 accounts(1930 and 1940) and sum for every account for every year):
The input for the graph printing is like this (The user of the script should be able to choose as many accounts as the user wants 1-n):
How many accounts to print graphs for? 2
Account 1 :
1930
Account 2 :
1940
The system will store the Accounts in an array (accounts = [1930,1940]
) and look up the sum for every account for every year. The years and sum for the accounts are placed in a matrix ([[2008, 1, 12], [2009, 7, 30], [2010, 13, 48], [2011, 19, 66], [2012, 25, 84], [2013, 31, 102]]).
When this is done I want to plot 1 - n graphs (in this case 2 graphs). But I can't figure out how to plot with 1 - n accounts...
For the moment I just use this code to print the graph and it's just static :(:
#fix the x serie
x_years = []
for i in range (nrOfYearsInXML):
x_years.append(matrix[x][0])
x = x + 1
plt.xticks(x_years, map(str,x_years))
#fix the y series, how to solve the problem if the user shows 1 - n accounts?
1930_sum = [1, 7, 13, 19, 25, 31]
1940_sum = [12, 30, 48, 66, 84, 102]
plt.plot(x_years, konto1_summa, marker='o', label='1930')
plt.plot(x_years, konto2_summa, marker='o', label='1940')
plt.xlabel('Year')
plt.ylabel('Summa')
plt.title('Sum for account per year')
plt.legend()
plt.show()
Ok, so I have tried with for loops etc, but I have not been able to figure it out with 1-n accounts and an unique account label to 1-n accounts..
My scenario is that the user choose 1 - n accounts. Specify the accounts (ex 1930,1940,1950..). Store the accounts to an array. System calculate the sum for 1-n account for every year and place this data to the matrix. System when reads from the accounts array and the matrix and plot 1-n graphs. Every graph with account label.
A shorter version of the problem...
For example if I have the x values (the years 2008-2013) and the y values (the sum for the accounts for every year) in a matrix and the accounts(should also be used as label) in an array like this:
accounts = [1930,1940]
matrix = [[2008, 1, 12], [2009, 7, 30], [2010, 13, 48], [2011, 19, 66], [2012, 25, 84], [2013, 31, 102]]
Or I can explain x and y like this:
x y1(1930 graph1) y2(1940 graph2)
2008 1 12
2009 7 30
2010 13 48
etc etc etc
The problem for me is that the user can choose one to many accounts (accounts [1..n]) and this will result in 1 to many account graphs.
Any idea how to solve it.. :)?
BR/M

I don't quite understand what you are asking, but I think this is what you want:
# set up axes
fig, ax = plt.subplots(1, 1)
ax.set_xlabel('xlab')
ax.set_ylabel('ylab')
# loop and plot
for j in range(n):
x, y = get_data(n) # what ever function you use to get your data
lab = get_label(n)
ax.plot(x, y, label=lab)
ax.legend()
plt.show()
More concretely, assuming you have the matrix structure you posted above:
# first, use numpy, you have it installed anyway if matplotlib is working
# and it will make your life much nicer
data = np.array(data_list_of_lists)
x = data[:,0]
for j in range(n):
y = data[:, j+1]
ax.plot(x, y, lab=accounts[j])
A better way to do this is to store your data in a dict
data_dict[1940] = (x_data_1940, y_data_1940)
data_dict[1930] = (x_data_1930, y_data_1930)
# ...
for k in acounts:
x,y = data_dict[k]
ax.plot(x, y, lab=k)

Related

TF broadcast along first axis

Say I have 2 tensors, one with shape (10,1) and another one with shape (10, 11, 1)... what I want is to multiply those broadcasting along the first axis, and not the last one, as used to
tf.zeros([10,1]) * tf.ones([10,12,1])
however this is not working... is there a way to do it without transposing it using perm?

You cannot change the broadcasting rules, but you can prevent broadcasting by doing it yourself. Broadcasting takes effect if the ranks are different.
So instead of permuting the axes, you can also repeat along a new axis:
import tensorflow as tf
import einops as ops
a = tf.zeros([10, 1])
b = tf.ones([10, 12, 1])
c = ops.repeat(a, 'x z -> x y z', y=b.shape[1]) * b
c.shape
>>> TensorShape([10, 12, 1])

For the above example, you need to do tf.zeros([10,1])[...,None] * tf.ones([10,12,1]) to satisfy broadcasting rules: https://numpy.org/doc/stable/user/basics.broadcasting.html#general-broadcasting-rules
If you want to do this for any random shapes, you can do the multiplication with the transposed shape, so that the last dimensions of both the matrices match, obeying broadcasting rule and then do the transpose again, to get back to the required output,
tf.transpose(a*tf.transpose(b))
Example,
a = tf.ones([10,])
b = tf.ones([10,11,12,13,1])
tf.transpose(b)
#[1, 13, 12, 11, 10]
(a*tf.transpose(b))
#[1, 13, 12, 11, 10]
tf.transpose(a*tf.transpose(b)) #Note a is [10,] not [10,1], otherwise you need to add transpose to a as well.
#[10, 11, 12, 13, 1]
Another approach is to expanding the axis:
a = tf.ones([10])[(...,) + (tf.rank(b)-1) * (tf.newaxis,)]

How to do an advanced multiplication with panda dataframe

I have a dataframe1 of 1802 rows and 29 columns (in code as df) - each row is a person and each column is a number representing their answer to 29 different questions.
I have another dataframe2 of 29 different coefficients (in code as seg_1).
Each column needs to be multiplied by the corresponding coefficient and this needs to be repeated for each participant.
For example - 1802 iterations of q1 * coeff1, 1802 iterations of q2 * coeff2 etc
So I should end up with 1802 * 29 = 52,258
but the answer doesn't seem to be this length and also the answers aren't what I expect - I think the loop is multiplying q1-29 by coeff1, then repeating this for coeff2 but that's not what I need.
questions = range(0, 28)
co = range(0, 28)
segment_1 = []
for a in questions:
for b in co:
answer = df.iloc[:,a] * seg_1[b]
segment_1.append([answer])

Proper encoding of the coefficients as a Pandas frame makes this a one-liner
df_person['Answer'] = (df_person * df_coeffs.values).sum(1)
and circumvents slow for-loops. In addition, you don't need to remember the number of rows in the given table 1802 and can use the code without changes even if you data grows larger.
For a minimum viable example, see:
# answer frame
df_person = pd.DataFrame({'Question_1': [10, 20, 15], 'Question_2' : [4, 4, 2], 'Question_3' : [2, -2, 1]})
# coefficient frame
seg_1 = [2, 4, -1]
N = len(df_person)
df_coeffs = pd.DataFrame({'C_1': [seg_1[0]] * N, 'C_2' : [seg_1[1]] * N, 'C_3' : [seg_1[2]] * N})
# elementwise multiplication & row-wise summation
df_person['Answer'] = (df_person * df_coeffs.values).sum(1)
giving
for the coefficient table df_coeffs
and answer table df_person

Numpy Random Choice with Non-regular Array Size

I'm making an array of sums of random choices from a negative binomial distribution (nbd), with each sum being of non-regular length. Right now I implement it as follows:
import numpy
from numpy.random import default_rng
rng = default_rng()
nbd = rng.negative_binomial(1, 0.5, int(1e6))
gmc = [12, 35, 4, 67, 2]
n_pp = np.empty(len(gmc))
for i in range(len(gmc)):
n_pp[i] = np.sum(rng.choice(nbd, gmc[i]))
This works, but when I perform it over my actual data it's very slow (gmc is of dimension 1e6), and I would like to vary this for multiple values of n and p in the nbd (in this example they're set to 1 and 0.5, respectively).
I'd like to work out a pythonic way to do this which eliminates the loop, but I'm not sure it's possible. I want to keep default_rng for the better random generation than the older way of doing it (np.random.choice), if possible.

The distribution of the sum of m samples from the negative binomial distribution with parameters (n, p) is the negative binomial distribution with parameters (m*n, p). So instead of summing random selections from a large, precomputed sample of negative_binomial(1, 0.5), you can generate your result directly with negative_binomial(gmc, 0.5):
In [68]: gmc = [12, 35, 4, 67, 2]
In [69]: npp = rng.negative_binomial(gmc, 0.5)
In [70]: npp
Out[70]: array([ 9, 34, 1, 72, 7])
(The negative_binomial method will broadcast its inputs, so we can pass gmc as an argument to generate all the samples with one call.)
More generally, if you want to vary the n that is used to generate nbd, you would multiply that n by the corresponding element in gmc and pass the product to rng.negative_binomial.

How to plot outliers with regard to unique ids

I have item_code column in my data and another column, sales, which represents sales quantity for the particular item.
The data can have a particular item id many times. There are other columns tell apart these entries.
I want to plot only the outlier sales for each item (because data has thousands of different item ids, plotting every entry can be difficult).
Since I'm very new to this, what is the right way and tool to do this?

you can use pandas. You should choose a method to detect outliers, but I have an example for you:
If you want to get outliers for all sales (not in groups), you can use apply with function (example - lambda function) to have outliers indexes.
import numpy as np
%matplotlib inline
df = pd.DataFrame({'item_id': [1, 1, 2, 1, 2, 1, 2],
'sales': [0, 2, 30, 3, 30, 30, 55]})
df[df.apply(lambda x: np.abs(x.sales - df.sales.mean()) / df.sales.std() > 1, 1)
].set_index('item_id').plot(style='.', color='red')
In this example we generated data sample and search indexes of points what are more then mean / std + 1 (you can try another method). And then just plot them where y is count of sales and x is item id. This method detected points 0 and 55. If you want search outliers in groups, you can group data before.
df.groupby('item_id').apply(lambda data: data.loc[
data.apply(lambda x: np.abs(x.sales - data.sales.mean()) / data.sales.std() > 1, 1)
]).set_index('item_id').plot(style='.', color='red')
In this example we have points 30 and 55, because 0 isn't outlier for group where item_id = 1, but 30 is.
Is it what you want to do? I hope it helps start with it.

Python plot_surface set Y limits to center at 0

I'm using cylindric coordinates, created a meshgrid and tried to plot using plot_surface. My full dataset doesn't get properly drawn because the scaling on the Y-axis is not correct.
I'm trying to plot the magnetic field values versus Z and P(rho) values. Z can be both negative and positive. P(rho) can only be positive.
The problem is that p (Y-axis) always starts from 0 to 10 (giving 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) whereas I'd like to have it like: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 so that its centered at 0.
So this is my example code. I won't add the calculations for the magnetic field. These are not relevant for this issue.
fig = plt.figure()
ax = Axes3D(fig)
# Define the plane over which fields are computed.
# N must be odd to include the point (0,0).
M = 26 # No. of points along the rho axis.
N = 51 # No. of points along the z axis.
p1 = np.linspace(0, a, M)
p = np.concatenate([p1[::-1][:-1], p1]) # Make it symmetric.
z = np.linspace(-d, d, N)
p, z = np.meshgrid(p, z) # Create grid of (p,z).
# CALCULATE magnetic field bt. So assume this is done here...
bt = np.sqrt(np.power(bz, 2) + np.power(bp, 2))
ax.plot_surface(z, p, bt, cmap=plt.cm.YlGnBu_r)
plt.show()
So when making adding the line,
p = np.concatenate([p1[::-1][:-1], p1])
the dimensions are broken and plot_surface is rightfully complaining. I'd expect this line would center the 0 in the middle of the array.
Thanks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Need help plot 1 to n y-series with matplotlib - matplotlib

Related

TF broadcast along first axis

How to do an advanced multiplication with panda dataframe

Numpy Random Choice with Non-regular Array Size

How to plot outliers with regard to unique ids

Python plot_surface set Y limits to center at 0

Categories

Resources