Tabulate Command Stata - frequency

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.

According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+

I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?

Related

how to apply multiple addition in Splunk

Hi Have below data from below query ..
(index=abc OR index=def) |rex field=index "(?<Local_Market>[^cita]\w.*?)_" | chart count by blocked , Local_Market
blocked dub rat mil
0 10 20 21
1 02 03 09
2 9 2 1
Now i want the data as below
total bolocked(sumof 0 and sumof 2) dub rat mil total found(Sumof 1)
(10+20+21+9+2+1)=63 10 20 21 (02+03+09)=14
The question could be better formatted, but I think what you want is the addcoltotals command.
This run-anywhere example is ugly, but I believe it produces the desired results.
| makeresults
| eval _raw="blocked dub rat mil
0 10 20 21
1 02 03 09
2 9 2 1"
| multikv forceheader=1
| fields - _time _raw linecount
```Skip the above - it just creates test data```
```Compute the total_bolocked field for blocked=0 and blocked=2```
| eval total_bolocked=if(blocked!=1,dub+mil+rat,0)
```Compute the total_found field for blocked=1```
| eval total_found=if(blocked=1, dub+mil+rat,0)
```Add up the total_bolocked fields. This will include blocked=1, but we'll fix that below```
| eventstats sum(total_bolocked) as total_bolocked
```Set total_bolocked=0 if blocked is 1```
| eval total_bolocked=if(blocked=1,0, total_bolocked)

CodeChef C_HOLIC2 Solution Find the smallest N whose factorial produces P Trailing Zeroes

For CodeChef problem C_HOLIC2, I tried iterating over elements: 5, 10, 15, 20, 25,... and for each number checking the number of trailing zeros using the efficient technique as specified over here, but got TLE.
What is the fastest way to solve this using formula method?
Here is the Problem Link
As we know for counting the number of trailing zeros in factorial of a number, the trick used is:
The number of multiples of 5 that are less than or equal to 500 is 500÷5=100
Then, the number of multiples of 25 is 500÷25=20
Then, the number of multiples of 125 is 500÷125=4
The next power of 5 is 625, which is > than 500.
Therefore, the number of trailing zeros of is 100+20+4=124
For detailed explanation check this page
Thus, this count can be represented as:
Using this trick, given a number N you can determine the no. of trailing zeros count in its factorial. Codechef Problem Link
Now, suppose we are given the no. of trailing zeros, count and we are asked the smallest no. N whose factorial has count trailing zeros Codechef Problem Link
Here the question is how can we split count into this representation?
This is a problem because in the following examples, as we can see it becomes difficult.
The count jumps even though the no is increasing by the same amount.
As you can see from the following table, count jumps at values whose factorials have integral powers of 5 as factors e.g. 25, 50, ..., 125, ...
+-------+-----+
| count | N |
+-------+-----+
| 1 | 5 |
+-------+-----+
| 2 | 10 |
+-------+-----+
| 3 | 15 |
+-------+-----+
| 4 | 20 |
+-------+-----+
| 6 | 25 |
+-------+-----+
| 7 | 30 |
+-------+-----+
| 8 | 35 |
+-------+-----+
| 9 | 40 |
+-------+-----+
| 10 | 45 |
+-------+-----+
| 12 | 50 |
+-------+-----+
| ... | ... |
+-------+-----+
| 28 | 120 |
+-------+-----+
| 31 | 125 |
+-------+-----+
| 32 | 130 |
+-------+-----+
| ... | ... |
+-------+-----+
You can see this from any brute force program for this task, that these jumps occur frequently i.e. at 6, 12, 18, 24 in case of numbers whose factorials have 25.(Interval = 6=1×5+1)
After N=31 factorials will also have a factor of 125. Thus, these jumps corresponding to 25 will still occur with the same frequency i.e. at 31, 37, 43, ...
Now the next jump corresponding to 125 will be at 31+31 which is at 62. Thus jumps corresponding to 125 will occur at 31, 62, 93, 124.(Interval =31=6×5+1)
Now the jump corresponding to 625 will occur at 31×5+1=155+1=156
Thus you can see there exists a pattern. We need to find the formula for this pattern to proceed.
The series formed is 1, 6, 31, 156, ...
which is 1 , 1+5 , 1+5+52 , 1+5+52+53 , ...
Thus, nth term is sum of n terms of G.P. with a = 1, r = 5
Thus, the count can be something like 31+31+6+1+1, etc.
We need to find this tn which is less than count but closest to it. i.e.
Say the number is count=35, then using this we identify that tn=31 is closest. For count=63 we again see that using this formula, we get tn=31 to be the closest but note that here, 31 can be subtracted twice from count=63. Now we go on finding this n and keep on subtracting tn from count till count becomes 0.
The algorithm used is:
count=read_no()
N=0
while count!=0:
n=floor(log(4*count+1,5))
baseSum=((5**n)-1)/4
baseOffset=(5**n)*(count/baseSum) // This is integer division
count=count%baseSum
N+=baseOffset
print(N)
Here, 5**n is 5n
Let's try working this out for an example:
Say count = 70,
Iteration 1:
Iteration 2:
Iteration 3:
Take another example. Say count=124 which is the one discussed at the beginning of this page:
Iteration 1:
PS: All the images are completely owned by me. I had to use images because StackOverflow doesn't allow MathJax to be embedded. #StackOverflowShouldAllowMathJax

Microsoft Report Builder - Row totals percent of column total

I'd really appreciate some help with Report Builder. As seen below, I have a report that shows the number of items. In my SQL query I have used a CASE statement to tag some of the items with a y or a n.
What I want to do is add a calculated cell that sums all the values of the items tagged with y and divide by the total and * 100 to find the percent of the rows tagged y of the total amount.
Answer looking for is -
Apple | Y | 100
Pear | Y | 200
Orange| N | 500
Total | 800
Percent of Ys = 37.5% (100+200/800*100)
I'm new to report builder so please let me know if this doesn't make sense.
Many thanks.
You could add two more columns to your query, using similar logic as your CASE statement for the Y/N column. The first column is populated with the value only when the condition for "Y" is true, otherwise it is zero. The second column is populated with the value only when the condition for "N" is true, otherwise it is zero. This would give you a result set similar to this:
All Y N
Apple | Y | 100 | 100 | 0
Pear | Y | 200 | 200 | 0
Orange| N | 500 | 0 | 500
Total | 800 | 300 | 500
Then your calculation is something like this:
Percent of Ys = (Sum(Y) / Sum(All)) * 100
i.e.
Percent of Ys = (300 / 800) * 100 = 37.5%

Ransack search- select rows whose sum adds up to a given value

Im using ransack search with ruby on rails and trying to output random rows between 1-6, whose time adds up to a given value specified by the search.
For example search for rows whose time value adds up to 40. In this case id 12 and 14 will be returned. Any combination between 1-6 can be randomly outputted.
If a combination of 3 rows meet the criteria then 3 rows should be outputted. likewise 1,2,3,4,5,6. If no single row or combination can be found then the output should return nil
id | title | time
----+-------------------------+-----------
26 | example | 10
27 | example | 26
14 | example | 20
28 | example | 50
12 | example | 20
20 | example | 6
Note - Not sure if ransack search is the best to perform this type of query
Thanks in advance

Date Join Query with Calculated Fields

I'm creating an Access 2010 database to replace an old Paradox one. Just now getting to queries, and there is no hiding that I am a new to SQL.
What I am trying to do is set up a query to be used by a graph. The graph's Y axis is to be a simple percentage passed, and the X axis is a certain day. The graph will be created on form load and subsequent new records entered with a date range of "Between Date() And Date()-30" (30 days, rolling).
The database I'm working with can have multiple inspections per day with multiple passes and multiple fails. Each inspection is a separate record.
For instance, on 11/26/2012 there were 7 inspections done; 5 passed and 2 failed, a 71% ((5/7)*100%) acceptance. The "11/26/2012" and "71%" represent a data point on the graph. On 11/27/2012 there were 8 inspections done; 4 passed and 4 failed, a 50% acceptance. Etc.
Here is an example of a query with fields "Date" and "Disposition" of date range "11/26/2012 - 11/27/2012:"
SELECT Inspection.Date, Inspection.Disposition
FROM Inspection
WHERE (((Inspection.Date) Between #11/26/2012# And #11/27/2012#) AND ((Inspection.Disposition)="PASS" Or (Inspection.Disposition)="FAIL"));
Date | Disposition
11/26/2012 | PASS
11/26/2012 | FAIL
11/26/2012 | FAIL
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/27/2012 | PASS
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | FAIL
*NOTE - The date field is of type "Date," and the Disposition field is of type "Text." There are days where no inspections are done, and these days are not to show up on the graph. The inspection disposition can also be listed as "NA," which refers to another type of inspection not to be graphed.
Here is the layout I want to create in another query (again, for brevity, only 2 days in range):
Date | # Insp | # Passed | # Failed | % Acceptance
11/26/2012 | 7 | 5 | 2 | 71
11/27/2012 | 8 | 4 | 4 | 50
What I think needs to be done is some type of join on the record dates themselves and "calculated fields" in the rest of the query results. The problem is
that I haven't found out how to "flatten" the records by date AND maintain a count of the number of inspections and the number passed/failed all in one query. Do I need multiple layered queries for this? I prefer not to store any of the queries as tables as the only use of these numbers is in graphical form.
I was thinking of making new columns in the database to get around the "Disposition" field being Textual by assigning a PASS "1" and a FAIL "0," but this seems like a cop-out. There has to be a way to make this work in SQL, just I haven't found applicable examples.
Thanks for your help! Any input or suggestions are appreciated! Example databases with forms, queries, and graphs are also helpful!
You could group by Date, and then use aggregates like sum and count to calculate statistics for that group:
select Date
, count(*) as [# Insp]
, sum(iif(Disposition = 'PASS',1,0)) as [# Passed]
, sum(iif(Disposition = 'FAIL',1,0)) as [# Failed]
, 100.0 * sum(iif(Disposition = 'PASS',1,0)) / count(*) as [% Acceptance]
from YourTable
where Disposition in ('PASS', 'FAIL')
group by
Date