Suppose I have following three factors: Factor A: 6 possible values Factor B: 6 possible values Factor C: 4 possible values - orthogonal

How can I construct an Orthogonal array for these? Which one did I should follow to create the orthogonal array table?
I already tried with Minitab but it did not worked out the way I want it.

Related

How to sample rows from a table with a specific probability?

I'm using BigQuery at my new position, and I'm totally new to SQL/BigQuery.
I'm testing a machine learning model and monitoring an A/B test with a different ratio, e.g., 3 vs. 10. To compare the A/B results, e.g., # of page view, I want to make the ratios equal first so that I can compare easily. For example, say we have a table with 13 records (3 are from A and 10 are from B). In addition, each row contains an id field that is identical. What I want to do is to extract only 3 samples out of 10 for B to match the sample number to A.
I'm trying to use the FARM_FINGERPRINT function to map fields to integers. Then I'm taking ABS and then calculating MOD to convert the integer numbers to a specific range, e.g., [0, 10). Eventually, I would like to get 3 in 10 items using the following line:
MOD(ABS(FARM_FINGERPRINT(field)), 10) < 3
However, I found that even if I run A/B with exactly the same ML model with different A/B ratio, the result is different between A and B (The results should be same because A and B are running the same ML model with just the different ratio). This made me doubt that the above implementation may bring some biased data sampling. I also read this post and confirmed the FARM_FINGERPRINT might not bring a randomly distributed result.
*There's a critical reason why I cannot simply multiply 3/10 to B, which is confidential and cannot disclose here.
Is there a better way to accomplish the equally distributed sampling?
Thank you in advance. (I'm sorry if the question is vague, as I'm hiding the confidential parts.)

Extra Row in Dataframe

I am attempting to create a dataframe comprised of two vectors.
The two vectors are comprised of SIX elements which were the products of previous steps here they are:
CLsummary<-c(MaxCL, MinCL, MeanCL, MedianCL, RangeCL, SDCL)
PRsummary<-c(MaxPR, MinPR, MeanPR, MedianPR, RangePR, SDPR)
But when I create the data frame, like below, I get SEVEN rows of data:
FHsummary<-data.frame(CLsummary, PRsummary)
Specifically, the fifth row seems to be a duplicate of the MinCL and MinPR data.
What am I doing wrong?
Thank you!
When just replicating your code, making the objects in the vector into strings, the final dataframe is 6 rows as expected. You should take a look at your objects that comprise the vectors. What type are they? How were they made? My guess is that one of the objects in each vector is a vector itself, with two elements.

pandas dataframe for matrix of values

I have 3 things:
A time series 1D array of certain length.
A matrix of stellar flux values of equal column length as the time series (as each star in the field was observed according to the time array) but ~3000 rows deep as there are ~3000 observed stars in this field.
An array of ~3000 star ID's to go with the ~3000 time-series flux recordings mentioned above.
I'm trying to turn all of this into a pandas.DataFrame for extracting timeseries features using the module 'tsfresh'. Link here.
Does anyone have an idea on how to do this? It should read somewhat like a table with a row of ID's as headers, a column of time values and ~3000 columns of flux values for the stars.
I've seen examples of it being done on the page I've linked i.e. multiple 'value' columns (in this case they would be flux values). But no indication of how to construct them.
This data frame will then be used for machine learning if that makes any difference.
Many thanks for any help that can be offered!

How is insertion for a Singly Linked List and Doubly Linked List constant time?

Thinking about it, I thought the time complexity for insertion and search for any data structure should be the same, because to insert, you first have to search for the location you want to insert, and then you have to insert.
According to here: http://bigocheatsheet.com/, for a linked list, search is linear time but insertion is constant time. I understand how searching is linear (start from the front, then keep going through the nodes on the linked list one after another until you find what you are searching for), but how is insertion constant time?
Suppose I have this linked list:
1 -> 5 -> 8 -> 10 -> 8
and I want to insert the number 2 after the number 8, then would I have to first search for the number 8 (search is linear time), and then take an extra 2 steps to insert it (so, insertion is still linear time?)?
#insert y after x in python
def insert_after(x, y):
search_for(y)
y.next = x.next
x.next = y
Edit: Even for a doubly linked list, shouldn't it still have to search for the node first (which is linear time), and then insert?
So if you already have a reference to the node you are trying to insert then it is O(1). Otherwise, it is search_time + O(1). It is a bit misleading but on wikipedia there is a chart explains it a bit better:
Contrast this to a dynamic array, which, if you want to insert at the beginning is: Θ(n).
Just for emphasis: The website you reference is referring to the actual act of inserting given we already know where we want to insert.
Time to insert = Time to set three pointers = O(3) = constant time.
Time to insert the data is not the same as time to insert the data at a particular location. The time asked is the time to insert the data only.

How to use a look up table in MATLAB

I need to perform an exponential operation of two parameters (one set: t, and the other comes from the arrays) on a set of 2D arrays (a 3D Matrix if you want).
f(t,x) = exp(t-x)
And then I need to add the result of every value in the 3rd dimension. Because it takes too much time using bsxfun to perform the entire operation I was thinking of using a look up table.
I can create the table as a matrix LUT (2 dimensional due to the two parameters), then I can retrieve the values using LUT(par1,par2). But accessing on the 3rd dimension using a loop is expensive too.
My question is: is there a way to implement such mechanism (a look up table) to have a predefined values and then just using them accessing from the matrix elements (kind of indexing) without loops. Or, how can I create a look up table that MATLAB handles automatically to speed up the exponential operation?
EDIT:
I actually used similar methods to create the LUT. Now, my problem actually is how to access it in an efficient way.
Lets said I have a 2 dimensional array M. With those values that I want to apply the function f(t,M(i,j)) for fixed value t. I can use a loop to go through all the values (i,j) of M. But I want a faster way of doing it, because I have a set of M's, and then I need to apply this procedure to all the other values.
My function is a little bit complex than the example I gave:
pr = mean(exp(-bsxfun(#rdivide,bsxfun(#minus,color_vals,double(I)).^2,m)./2),3);
That is my actual function, as you can see is more complex than the example I presented. But the idea is the same. It does an average in the third dimension of the set of M's of the exponential of the difference of two arrays.
Hope that helps.
I agree that the question is not very clear, and that showing some code would help. I'll try anyway.
In order to have a LUT make sense at all, the set of values attained by t-x has to be limited, for example to integers.
Assuming that the exponent can be any integer from -1000 to 1000, you could create a LUT like this:
LUT = exp(-1000:1000);
Then you create your indices (assuming t is a 1D array, and x is a 2D array)
indexArray = bsxfun(#minus,reshape(t,[1,1,3]), x) + 1001; %# -1000 turns into 1
Finally, you create your result
output = LUT(indexArray);
%# sum along third dimension (i.e. sum over all `t`)
output = sum(output,3);
I am not sure I understand your question, but I think this is the answer.
x = 0:3
y = 0:2
z = 0:6
[X,Y,Z] = meshgrid(x,y,z)
LUT = (X+Y).^Z