Sample size calculation for PKPD Modelling - size

I am trying to find a code for sample size calculation for a PKPD modelling analysis. I found an article that seems to answer what I need but does not have the code: 10.1007/s10928-005-0078-3
J. Can anyone help me please?
Only found a reference. I am expecting to be able to calculate a sample size for a PKPD model study.

Related

Providing own imputed data into Mice

I would like to provide my own imputed data into Mice. Is it possible to do this and just run the fit routine [with(data = ..)]? I searched but was unable find anything on this. Thanks.

How to do sampling in sql query to get dataframe with pandas

Note my question is a bit different here:
I am working with pandas on a dataset that has a lot of data (10M+):
q = "SELECT COUNT(*) as total FROM `<public table>`"
df = pd.read_gbq(q, project_id=project, dialect='standard')
I know I can do with pandas function with a frac option like
df_sample = df.sample(frac=0.01)
however, I do not want to generate the original df with that size. I wonder what is the best practice to generate a dataframe with data already sampled.
I've read some sql posts showing the sample data was generated from a slice, that is absolutely not accepted in my case. The sample data needs to be evenly distributed as much as possible.
Can anyone shed me with more light?
Thank you very much.
UPDATE:
Below is a table showing how the data looks like:
Reputation is the field I am working on. You can see majority records have a very small reputation.
I don't want to work with a dataframe with all the records, I want the sampled data also looks like the un-sampled data, for example, similar histogram, that's what I meant "evenly".
I hope this clarifies a bit.
A simple random sample can be performed using the following syntax:
select * from mydata where rand()>0.9
This gives each row in the table a 10% chance of being selected. It doesn't guarantee a certain sample size or guarantee that every bin is represented (that would require a stratified sample). Here's a fiddle of this approach
http://sqlfiddle.com/#!9/21d1ee/2
On average, random sampling will provide a distribution the same as that of the underlying data, so meets your requirement. However if you want to 'force' the sample to be more representative or force it to be a certain size we need to look at something a little more advanced.

Lucene tfidf does not have square of idf?

In Lucene8.2.0 source code TFIDFSimilarity.TFIDFScorer#score(float freq, long norm),
I cannot find squre of idf, but according to the documentation Lucene Practical Scoring Function,
there should involve squre of idf when calculate score.
This mismatching really confuses me a lot, does the documentation not match the source code or I've just misunderstood the source code? Could someone explain it please? Thanks in advance

Pairwise cosine similarity

I'm a little confused when I read this paper:Pairwise Document Similarity in Large Collections with MapReduce
http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf
In this paper, the author seems didn't consider word only appears in one document, but according to the definition of cosine similarity, we need to consider this situation, right?
The material I used is this: https://www.dropbox.com/s/nctb66hh84ab32c/postings-Reuters-data
The java code I used is this: https://www.dropbox.com/s/aklviixup4uulmu/CosineSimilarity.java
And the results I generated is this: https://www.dropbox.com/s/ea6ov7l7yut7yfj/part-00000
In the results, I see a lot of 1's and even number bigger than 1. I think it's kind of weird, could someone help me find out the reason? Thanks.

Fourier Transformation -

I've been doing a lot of research on this topic and I'm finally getting somewhere. Below is two complex numbers from the java code I'm using:
-9771.0 - j2125.0
-16184.09634718744 - j53968.71008512241
I know the amplitude/magnitude can be computed by doing the sqrt(a^2 + b^2) and this as far as I've gotten with this. I've read about sample rate but I'll need a better explanation of this alone and would like to be pointed in the right direction to obtain the knowledge. I've done the powerspectum graph but I need to do this on paper so I'll know how to obtain the frequency.
Applying Fourier Transformation to two values is pretty meaningless. You apply it to series of values (signal), then frequency starts to make sense. You can't speak about frequency in series of two values.