How to randomly assign variables - variables

In Stata, I want to create a new variable with values associated with probabilities from a know distribution.
Say that the distribution pdf looks like:
Blue - .2
Red - .3
Green - .5
I can use code like the following to get the exact distribution as above. First, is there a quicker way to accomplish this?
gen Color = ""
replace Color = "Blue" if _n <= _N*.2
replace Color = "Red" if _n > _N*.2 & _n <= _N*.5
replace Color = "Green" if Color==""
To simulate random draws, I think I can do:
gen rand = runiform()
sort rand
gen Color = ""
replace Color = "Blue" if rand <= .2
replace Color = "Red" if rand > .2 & rand <= .5
replace Color = "Green" if Color==""
Is this technique best practice?

When producing the data, you could use the more efficient in instead of if. But to be honest, I believe the data set would have to be very big for time differences to be perceivable. You can do some experimenting to check for that.
The second issue on random draws is already addressed by a series of posts authored by Bill Gould (StataCorp's president). Some code below with inline comments. You can run the whole thing and check the results.
clear
set more off
*----- first question -----
/* create data with certain distribution */
set obs 100
set seed 23956
gen obs = _n
gen rand = runiform()
sort rand
gen Color = ""
/*
// original
replace Color = "Blue" if _n <= _N*.2
replace Color = "Red" if _n > _N*.2 & _n <= _N*.5
replace Color = "Green" if Color==""
*/
// using -in-
replace Color = "Blue" in 1/`=floor(_N*.2)'
replace Color = "Red" in `=floor(_N*.2) + 1'/`=floor(_N*.5)'
replace Color = "Green" in `=floor(_N*.5) + 1'/L
/*
// using -cond()-
gen Color = cond(_n <= _N*.2, "Blue", cond(_n > _N*.2 & _n <= _N*.5, "Red", "Green"))
*/
drop rand
sort obs
tempfile allobs
save "`allobs'"
tab Color
*----- second question -----
/* draw without replacement a random sample of 20
observations from a dataset of N observations */
set seed 89365
sort obs // for reproducibility
generate double u = runiform()
sort u
keep in 1/20
tab obs Color
/* If N>1,000, generate two random variables u1 and u2
in place of u, and substitute sort u1 u2 for sort u */
/* draw with replacement a random sample of 20
observations from a dataset of N observations */
clear
set seed 08236
drop _all
set obs 20
generate long obsno = floor(100*runiform()+1)
sort obsno
tempfile obstodraw
save "`obstodraw'"
use "`allobs'", clear
generate long obsno = _n
merge 1:m obsno using "`obstodraw'", keep(match) nogen
tab obs Color
These and other details can be found in the four-part series on random-number
generators, by Bill Gould: http://blog.stata.com/2012/10/24/using-statas-random-number-generators-part-4-details/
See also help sample!

Related

How to calculate the number of scatterplot data points in a particular 'region' of the graph

As my questions says I'm trying to find a way to calculate the number of scatterplot data points (pink dots) in a particular 'region' of the graph or either side of the black lines/boundaries. Open to any ideas as I don't even know where to start. Thank you!!
The code:
################################
############ GES ##############
################################
p = fits.open('GES_DR17.fits')
pfeh = p[1].data['Fe_H']
pmgfe = p[1].data['Mg_Fe']
pmnfe = p[1].data['Mn_Fe']
palfe = p[1].data['Al_Fe']
#Calculate [(MgMn]
pmgmn = pmgfe - pmnfe
ax1a.scatter(palfe, pmgmn, c='thistle', marker='.',alpha=0.8,s=500,edgecolors='black',lw=0.3, vmin=-2.5, vmax=0.65)
ax1a.plot([-1,-0.07],[0.25,0.25], c='black')
ax1a.plot([-0.07,1.0],[0.25,0.25], '--', c='black')
x = np.arange(-0.15,0.4,0.01)
ax1a.plot(x,4.25*x+0.8875, 'k', c='black')
Let's call the two axes x and y. Any line in this plot can be written as
a*x + b*y + c = 0
for some value of a,b,c. But if we plug in a points with coordinates (x,y) in to the left hand side of the equation above we get positive value for all points of the one side of the line, and a negative value for the points on the other side of the line. So If you have multiple regions delimited by lines you can just check the signs. With this you can create a boolean mask for each region, and just count the number of Trues by using np.sum.
# assign the coordinates to the variables x and y as numpy arrays
x = ...
y = ...
line1 = a1*x + b1*y + c1
line2 = a2*x + b2*y + c2
mask = (line1 > 0) & (line2 < 0) # just an example, signs might vary
count = np.sum(mask)

Pinescript conditional line end/delete

I’m trying to create lines that auto plot at certain intervals using the line.new() function. So for example every new monthly open there will be lines plotted 5% and 10% above and beneath price. They’re then extended to the right indefinitely.
I then want to have the lines end once high/low has breached the line. I can’t seem to figure how to do this using the line.delete() function, although I doubt this is the correct path to take anyway due to the fact this deletes the entire line rather than just end it at breach point.
Due to the fact lines are extended indefinitely/until price has breached, there may be instances in which lines are never touched and are only removed once the 500 max line limit is reached. So I haven’t figured a way to use array references for lines to find a solution - and the 40 plot limit for pine plots isn’t really a sufficient amount of lines.
If this isn’t possible then just deleting the entire line upon breach is a backup option but I haven’t figure how to do this either!
Any help is much appreciated, thanks in advance!
You can use additional arrays to track the price values and their crossed state more easily. Each element of the arrays corresponds to the values associated with the same line. We add, remove or modify them based on whether a particular line's price value has been crossed or the line limit has been exceeded.
//#version=5
indicator("Monthly Interval Lines", overlay = true, max_lines_count = 500)
var float[] interval_prices = array.new_float()
var line[] interval_lines = array.new_line()
var bool[] intervals_crossed = array.new_bool()
new_month = timeframe.change("M")
if new_month
array.unshift(interval_lines, line.new(x1 = bar_index, y1 = open * 1.05, x2 = bar_index + 1, y2 = open * 1.05, extend = extend.right, color = color.green))
array.unshift(interval_prices, open * 1.05)
array.unshift(intervals_crossed, false)
array.unshift(interval_lines, line.new(x1 = bar_index, y1 = open * 1.10, x2 = bar_index + 1, y2 = open * 1.10, extend = extend.right, color = color.green))
array.unshift(interval_prices, open * 1.10)
array.unshift(intervals_crossed, false)
array.unshift(interval_lines, line.new(x1 = bar_index, y1 = open * 0.95, x2 = bar_index + 1, y2 = open * 0.95, extend = extend.right, color = color.red))
array.unshift(interval_prices, open * 0.95)
array.unshift(intervals_crossed, false)
array.unshift(interval_lines, line.new(x1 = bar_index, y1 = open * 0.90, x2 = bar_index + 1, y2 = open * 0.90, extend = extend.right, color = color.red))
array.unshift(interval_prices, open * 0.90)
array.unshift(intervals_crossed, false)
size = array.size(intervals_crossed)
if size > 0
if size > 500
for i = size - 1 to 500
line.delete(array.pop(interval_lines))
array.pop(intervals_crossed)
size := array.size(intervals_crossed)
for i = 0 to size - 1
price_val = array.get(interval_prices, i)
already_crossed = array.get(intervals_crossed, i)
crossed_price_val = low < price_val and high > price_val
gapped_price_val = (close[1] < price_val and open > price_val) or (close[1] > price_val and open < price_val)
if not already_crossed and (crossed_price_val or gapped_price_val)
array.set(intervals_crossed, i, true)
interval_line = array.get(interval_lines, i)
line.set_extend(interval_line, extend.none)
line.set_x2(interval_line, bar_index)

Can't get dimensions of arrays equal to plot with MatPlotLib

I am trying to create a plot of arrays where one is calculated based on my x-axis calculated in a for loop. I've gone through my code multiple times and tested in between what exactly the lengths are for my arrays, but I can't seem to think of a solution that makes them equal length.
This is the code I have started with:
import numpy as np
import matplotlib.pyplot as plt
a = 1 ;b = 2 ;c = 3; d = 1; e = 2
t0 = 0
t_end = 10
dt = 0.05
t = np.arange(t0, t_end, dt)
n = len(t)
fout = 1
M = 1
Ca = np.zeros(n)
Ca[0] = a; Cb[0] = b
Cc[0] = 0;
k1 = 1
def rA(Ca, Cb, Cc, t):
-k1 * Ca**a * Cb**b * dt
return -k1 * Ca**a * Cb**b * dt
while e > 1e-3:
t = np.arange(t0, t_end, dt)
n = len(t)
for i in range(1,n-1):
Ca[i+1] = Ca[i] + rA(Ca[i], Cb[i], Cc[i], t[i])
e = abs((M-Ca[n-1])/M)
M = Ca[n-1]
dt = dt/2
plt.plot(t, Ca)
plt.grid()
plt.show()
Afterwards, I try to calculate a second function for different y-values. Within the for loop I added:
Cb[i+1] = Cb[i] + rB(Ca[i], Cb[i], Cc[i], t[i])
While also defining rB in a similar manner as rA. The error code I received at this point is:
IndexError: index 200 is out of bounds for axis 0 with size 200
I feel like it has to do with the way I'm initializing the arrays for my Ca. To put it in MatLab code, something I'm more familiar with, looks like this in MatLab:
Ca = zeros(1,n)
I have recreated the code I have written here in MatLab and I do receive a plot. So I'm wondering where I am going wrong here?
So I thought my best course of action was to change n to an int by just changing it in the while loop.
but after changing n = len(t) to n = 100 I received the following error message:
ValueError: x and y must have same first dimension, but have shapes (200,) and (400,)
As my previous question was something trivial I just kept on missing out on, I feel like this is the same. But I have spent over an hour looking and trying fixes without succes.

SSRS Color Gradient

I've been able to figure out how to make certain values the certain colors I would like. However, I'd really like to be able to create a color gradient so that it's more of a gradual change between each value.
0 = white
from white to green between 1 and 15,
gradient from green to yellow between 16 and 25,
and gradient from yellow to red between 26 and 35,
anything above 35 is red.
This is the code I have in the background fill expression:
=SWITCH(
(Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) = 0, "White",
((Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) >= 1 and
(Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) <= 15), "Green",
((Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) >= 16 and
(Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) <= 25), "Yellow",
((Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) >= 26 and
(Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value))) <= 35, "Orange",
(Sum(Fields!Total_Transaction_Count.Value) / CountDistinct(Fields!TransUserNumber.Value)) > 35, "Red")
This is the matrix I have so far
Take a look at this answer I wrote a while back. It's for a chart but the principle should be the same.
the basic idea is to calculate the colour in SQL and then use that to set the color properties in SSRS
Applying Custom Gradient to Chart Based on Count of Value
Keeping it all in SSRS
If you want to keep this within the report you could write a function to do the calculation.
For a very simple red gradient, it might look something like this..
Public Function CalcRGB (minVal as double, maxVal as double, actualVal as double) as String
Dim RedValue as integer
Dim GreenValue as integer
Dim BlueValue as integer
RedValue = ((actualVal - minVal) / (maxVal - minVal)) * 256
GreenValue = 0
BlueValue = 0
dim result as string
result = "#" & RedValue.ToString("X2") & greenValue.ToString("X2") & BlueValue.ToString("X2")
Return result
End Function
In this function I have set green and blue to 0 but these could be calculated too based on requirements.
To use this function as a background colour, set the Background Color property to something like
=Code.CalcRGB(
MIN(Fields!myColumn.Value),
MAX(Fields!myColumn.Value),
Fields!myColumn.Value
)

Gnuplot for loop with continous variables

I have a plot with many objects and labels. So I wanted to simplify the srcipt using loops. But I don't know how to adress the variables. I define the variables as followed
V1 = 10
V2 = 20
V3 = 23
...
LABEL1 = a
LABEL2 = b
...
The loop should look something like that
set for [i=1:15] label i at V(i),y_label LABEL(i)
This notation leads to errors compiling the script. Is it possiple at all to define such a loop in gnuplot? If so how can I do it?
Thanks for your help!
You can define a function which formats the label-definition as string and use a do loop to evaluate the strings:
y_label = 0
V1 = 10
V2 = 20
V3 = 23
LABEL1 = "a"
LABEL2 = "b"
LABEL3 = "c"
do for [i=1:3] {
eval(sprintf('set label %d at V%d,y_label LABEL%d', i, i, i))
}
Alternatively, you can use two string with whitespace-separated words for the iteration:
V = "10 20 23"
LABEL = "a b c"
set for [i=1:words(V)] label i at word(V, i),y_label word(LABEL, i)
Note, that gnuplot 5.0 also has some limited support to use quote marks to hold several words together as one item:
V = "10 20 25"
LABEL = "'first label' 'second label' 'third one'"
set for [i=1:words(V)] label i at word(V, i),y_label word(LABEL, i)