HAC: Children in a dendrogram - hierarchical-clustering

I ran the HAC on a dataset containing 10 samples as follows:
X_train
>>array([[ 0.97699105, 0.22532681],
[-0.73247801, 0.60953553],
[-0.99434933, 0.03124842],
[-0.82325963, 0.57988328],
[ 0.50084964, -0.26616097],
[ 1.94969804, 0.42602413],
[ 1.0254459 , -0.54057545],
[-0.57115945, 0.8495053 ],
[ 1.39201222, -0.34835877],
[ 0.02372729, 0.52339387]])
Here is the result I get by applying HAC using the scipy library:
linkage(X_train, method='single')
>>array([[ 1. , 3. , 0.09550162, 2. ],
[ 7. , 10. , 0.2891525 , 3. ],
[ 6. , 8. , 0.41390592, 2. ],
[ 2. , 11. , 0.57469287, 4. ],
[ 4. , 12. , 0.59203425, 3. ],
[ 9. , 13. , 0.67840909, 5. ],
[ 0. , 14. , 0.6843032 , 4. ],
[15. , 16. , 0.92251969, 9. ],
[ 5. , 17. , 0.95429679, 10. ]])
Here is the resulting dendrogram
dendrogram(linkage(X_train, method='single'), labels=np.arange(X_train.shape[0]))
In the output matrix of the linkage(X_train, method='single'), the first two columns represent the children in our hierarchy.
I would like to know how we do to calculate these children?
For example :
the first fusion of our algorithm involves singleton clusters containing points {1} and {3}. And as children we have [1, 3]
The second merge involves the previously calculated cluster containing the points {1, 3} and the singleton cluster {7}. And like children we have [7, 10]. How was the value 10 obtained?

According to the docs, at the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n+i, where n is the number of input samples and Z is the linkage matrix. https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html.
Thus 10 is just is just 10+0 where 10 is total number of points and 0 is the row where the cluster is combined.
In other words, all cluster indices i>=n actually refer to the cluster formed in Z[i - n].
If that's still unclear you can read the detailed description here https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/#Perform-the-Hierarchical-Clustering

Related

Value of local variables in a function seems not be released post function calling in Red/Rebol language

I construct a function named find-all to find all indexes of a given item in a series by "recursive".
The first calling of find-all gives the right output. However from the second calling, all outputs are appended together.
find-all: function [series found][
result: []
either any [empty? series none? s-found: find series found]
[result]
[append result index? s-found
find-all next s-found found]
]
;; test:
probe find-all "abcbd" "b" ;; output [2 4] as expected
probe find-all [1 2 3 2 1] 2 ;; output [2 4 2 4]
Since variables inside a function created with function are local, why does the value of variable result is still there during later funtion callings, which cause the result of the sencond calling of find-all does not begin with []?
And what is the correct recursive way to achieve this funciton?
The answer is evident if you inspect find-all after making these two calls:
>> ?? find-all
find-all: func [series found /local result s-found][
result: [2 4 2 4]
either any [empty? series none? s-found: find series found]
[result]
[append result index? s-found
find-all next s-found found
]
]
result is an indirect value, and its data buffer is stored on a heap. The data gets preserved between the calls and accumulated, because you do not re-create it with copy — result being local to function's context is unrelated to that.
Thanks to #9214's help, especially the description about indirect value. I give a solution like this:
find-all: function [series found][
either any [empty? series none? s-found: find series found]
[[]]
[append
reduce [index? s-found]
find-all next s-found found
]
]
;; test:
probe find-all "abcbd" "b" ;; output [2 4] as expected
probe find-all [1 2 3 2 1] 2 ;; output [2 4] as expected

Why there are negative values coming up when I create a random array - "np.random.rand"

Why there is negative numbers when I create the below array?
a12 = np.random.randn(3, 5)
a12
Output:-
array([[-1.43586215, 1.16316375, 0.01023306, -0.98150865, 0.46210347],
[ 0.1990597 , -0.60021688, 0.06980208, -0.3853136 , 0.11351735],
[ 0.66213067, 1.58601682, -1.2378155 , 2.13303337, -1.9520878 ]])
np.random.randn() draws a sample from the Standard Normal Distribution i.e N(0,1). Passing in the dimensions returns an array of the given shape i.e np.random.randn(3,5) will return an array with shape (3,5) with all elements drawn from the standard normal distribution. Hence, we can get negative numbers, and infact all numbers in R.

Move and Resize Subplots in Octave

I'd like to move and resize four subplots in Octave so that they are bigger with less white space between them. The minimal code below only moves and resizes the first subplot (221) whilst leaving the remaining three untouched.
sp_hand1 = subplot(221);plot(sinewave(20,20)) ;
set( sp_hand1 , 'OuterPosition' , [ -0.11 , 0.4 , 0.6 , 0.65 ] ) ;
sp_hand2 = subplot(222);plot(sinewave(20,20)) ;
set( sp_hand2 , 'OuterPosition' , [ -0.11 , 0.4 , 0.6 , 0.65 ] ) ;
sp_hand3 = subplot(223);plot(sinewave(20,20)) ;
set( sp_hand3 , 'OuterPosition' , [ -0.11 , 0 , 0.6 , 0.65 ] ) ;
sp_hand4 = subplot(224);plot(sinewave(20,20)) ;
set( sp_hand4 , 'OuterPosition' , [ -0.11 , 0 , 0.6 , 0.65 ] ) ;
How can I resize them all to be the same size and moved appropriately?
When I faced a similar issue during my thesis, I found that the solution that worked best for me was to use axes directly rather than subplots, and specify position. Some manual adjustment may be unavoidable in the beginning, but it's typically pretty straightforward, and can be automated easily for predictable graph placements, especially if the figure size is pre-specified too.
E.g.
h1 = axes('position', [0.04, 0.54, 0.45, 0.45]); plot( sinewave( 20, 20 ) );
h2 = axes('position', [0.54, 0.54, 0.45, 0.45]); plot( sinewave( 20, 20 ) );
h3 = axes('position', [0.04, 0.04, 0.45, 0.45]); plot( sinewave( 20, 20 ) );
h4 = axes('position', [0.54, 0.04, 0.45, 0.45]); plot( sinewave( 20, 20 ) );
In theory subplots and independent axes should behave more or less the same; the big difference being that in case of overlap, subplot deletes the overlapped plot, whereas axes overlaps happily. This would include 'invisible' overlaps.
I am not 100% sure if there is a way to obtain the same result using 'outerposition', but for me outerposition tends to behave a bit oddly, and I've always managed to get the desired results with 'position' directly, so I've never had a need for it.
I have also found that often plotting more things or changing other aspects of the plot resets some axes properties, so such size adjustments are best done as the last step for each axes object.

How do I create a randomly distributed boolean variable for a <breed> that will change in the model?

I am writing a model with two breeds:
sexworkers and officers
where sexworkers possess a boolean variable that is randomly distributed at the setup, but then changes at the go according to the behavior of and interaction with officers.
I use sexworkers-own [ trust? ]
in the preamble, but then I am not sure how to distribute y/n of the variable randomly across the sexworkers population. Really appreciate any input!
Thank you so much!
If I understand your question correctly, you're just wanting sexworkers to randomly choose between true and false for the trust? variable on setup. If that's right, then maybe one-of will do the trick for you- for an example, run this simple setup:
breed [ sexworkers sexworker ]
sexworkers-own [ trust? ]
to setup
ca
create-sexworkers 1000 [
set trust? one-of [ true false ]
]
print word "% Trusting: " ( ( count sexworkers with [ trust? ] ) /
count sexworkers * 100 )
reset-ticks
end
If you're looking for some kind of uneven distribution you can do simple ones using the random or random-float primitives. For example, if I want 25% of the sexworkers to start with trust? = true, I can do something like:
to setup-2
ca
create-sexworkers 1000 [
ifelse random-float 1 < 0.25 [
set trust? true
] [
set trust? false
]
]
print word "% Trusting: " ( ( count sexworkers with [ trust? ] ) /
count sexworkers * 100 )
reset-ticks
end
For specific distributions, have a look at the various random reporters
For weighted randomness, have a look at the rnd extension

Set a dynamic stride for tf.nn.conv2d layer

I want to pass a layer, say 9 x 1 through a kernel of size, say 2 x 1
Now what I want to do is convolve the following values together ->
1 and 2, 2 and 3, 4 and 5, 5 and 6, 7 and 8, 8 and 9
and then offcourse padd it.
What you can see from this example is that I am trying to make the stride in width dimension of the pattern ->
1, 2, 1, 2, 1, 2, ...
and after every '1' I want to padd it so that finally the size doesnt change.
To simply see it I want to slice the main matrix into smaller matrices along a dimension, pass each of them separately through conv2d layers, padd them, and then concat them again along the same dimension but I want to do all this without actually cutting it up. I hope you understand what I am trying to ask. Is it possible?
Edit : Sorry should have mentioned this, I am using tensorflow libraries and I am talking about the tf.nn.conv2d function