Ignore missing values when creating dummy variable

Ignore missing values when creating dummy variable - variables

How can I create a dummy variable in Stata that takes the value of 1 when the variable pax is above 100 and 0 otherwise?
Missing values should be labelled as 0.
My code is the following:
generate type = 0
replace type = 1 if pax > 100
The problem is that Stata labels all missing values as 1 instead of keeping them as 0.

This occurs because Stata views missing values as large positive values. As such, your variable type is set equal to 1 when you request this for all values of pax > 100 (which includes missings).
You can avoid this by explicitly indicating that you do not want missing values replaced as 1:
generate type = 0
replace type = 1 if pax > 100 & pax != .

Consider the toy example below:
clear
input pax
20
30
40
100
110
130
150
.
.
.
end
The following syntax is in fact sufficient:
generate type1 = pax > 100 & pax < .
Alternatively, one can use the missing() function:
generate type2 = pax > 100 & !missing(pax)
Note the use of ! before the function, which tells Stata to focus on the non-missing values.
In both cases, the results are the same:
list
+---------------------+
| pax type1 type2 |
|---------------------|
1. | 20 0 0 |
2. | 30 0 0 |
3. | 40 0 0 |
4. | 100 0 0 |
5. | 110 1 1 |
|---------------------|
6. | 130 1 1 |
7. | 150 1 1 |
8. | . 0 0 |
9. | . 0 0 |
10. | . 0 0 |
+---------------------+

Related

how to apply multiple addition in Splunk

Hi Have below data from below query ..
(index=abc OR index=def) |rex field=index "(?<Local_Market>[^cita]\w.*?)_" | chart count by blocked , Local_Market
blocked dub rat mil
0 10 20 21
1 02 03 09
2 9 2 1
Now i want the data as below
total bolocked(sumof 0 and sumof 2) dub rat mil total found(Sumof 1)
(10+20+21+9+2+1)=63 10 20 21 (02+03+09)=14

The question could be better formatted, but I think what you want is the addcoltotals command.

This run-anywhere example is ugly, but I believe it produces the desired results.
| makeresults
| eval _raw="blocked dub rat mil
0 10 20 21
1 02 03 09
2 9 2 1"
| multikv forceheader=1
| fields - _time _raw linecount
```Skip the above - it just creates test data```
```Compute the total_bolocked field for blocked=0 and blocked=2```
| eval total_bolocked=if(blocked!=1,dub+mil+rat,0)
```Compute the total_found field for blocked=1```
| eval total_found=if(blocked=1, dub+mil+rat,0)
```Add up the total_bolocked fields. This will include blocked=1, but we'll fix that below```
| eventstats sum(total_bolocked) as total_bolocked
```Set total_bolocked=0 if blocked is 1```
| eval total_bolocked=if(blocked=1,0, total_bolocked)

ABL Progress 4gl : For Each with Count in Output-Stream

Progress-Procedure-Editor:
DEFINE STREAM myStream.
OUTPUT STREAM myStream TO 'C:\Temp\BelegAusgangSchnittstelle.txt'.
FOR EACH E_BelegAusgang
WHERE E_BelegAusgang.Firma = '000'
AND E_BelegAusgang.Schnittstelle = '$Standard'
NO-LOCK:
PUT STREAM myStream UNFORMATTED
STRING(E_BelegAusgang.Firma)
'|'
STRING(E_BelegAusgang.BelegNummer)
'|'
STRING(E_BelegAusgang.Schnittstelle)
'|'
SKIP
.
END.
I get this (extraction):
Firma | BelegNr | Schnittstelle
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 8 | $Standard
000 | 8 | $Standard
What I need is to COUNT the BelegNr. So I import the data of the TXT to SQL Server.
On Server my query is:
SELECT [BelegNr]
,COUNT(*) AS [Anzahl]
FROM [TestDB].[dbo].[Beleg_Ausgang]
GROUP BY [BelegNr]
ORDER BY [Anzahl]
With that query I got (extraction):
BelegNr Anzahl
3 | 5
8 | 2
Is there a way to put the COUNT directly into the Progress-Code? I mean, I want my result directly from the Progress-Procedure-Editor.

In ABL you use BREAK BY instead of GROUP BY. One limit is that BREAK BY groups AND sorts.
You could for instance have another "FOR EACH" for this:
DEFINE VARIABLE iCount AS INTEGER NO-UNDO.
FOR EACH E_BelegAusgang NO-LOCK
WHERE E_BelegAusgang.Firma = '000'
AND E_BelegAusgang.Schnittstelle = '$Standard'
BREAK BY BelegNr:
iCount = iCount + 1.
IF LAST-OF(BelegNr) THEN DO:
DISPLAY BelegNr iCount.
iCount = 0.
END.
END.
You could also incorporate that code in the export but note: that will change the order of the file rows. Maybe that's a problem for you, maybe not!

Create counter variable

I am using Stata 13.
I want to create a variable that equals 0 when none of a bunch of other variables equals 0; this variable is 1 when one variable of a bunch of other variables equals 1; it is 2 when two variables of a bunch of other variables are 1; it is 3 when three variables of a bunch of other variables are 1; and so on.
Any suggestions?

Your conditions are not mutually exclusive. Two criteria need to be separated.
A variable that is 0 when none of a bunch of other variables equals 0.
A variable is 1 when one variable of a bunch of other variables equals 1; 2 when two variables are 1; 3 when three variables are 1; etc.
Condition #2 is just counting 1s, as here:
clear
input x1 x2 x3
0 0 1
0 1 1
1 1 1
end
egen count1 = anycount(x1 x2 x3), value(1)
list
+-----------------------+
| x1 x2 x3 count1 |
|-----------------------|
1. | 0 0 1 1 |
2. | 0 1 1 2 |
3. | 1 1 1 3 |
+-----------------------+
Condition #1 could be done this way for a modest number of variables:
gen none0 = inlist(0, x1, x2, x3)
list
+-------------------------------+
| x1 x2 x3 count1 none0 |
|-------------------------------|
1. | 0 0 1 1 1 |
2. | 0 1 1 2 1 |
3. | 1 1 1 3 0 |
+-------------------------------+
The rowtotal() method of counting 1s in your comment only works for values that are only ever 1, 0 or missing, which may be true of your data but it is not a stated condition.

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.

According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+

I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?

Postgres Aggregator which checks for occurrences

Does there exist a Postgres Aggregator such that, when used on the following table:
id | value
----+-----------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 3
6 | 3
7 | 3
8 | 4
9 | 4
10 | 5
in a query such as:
select agg_function(4,value) from mytable where id>5
will return
agg_function
--------------
t
(a boolean true result) because a row or rows with value=4 were selected?
In other words, one argument specifies the value you are looking for, the other argument takes the column specifier, and it returns true if the column value was equal to the specified value for one or more rows?
I have successfully created an aggregate to do just that, but I'm wondering if I have just re-created the wheel...

select sum(case when value = 4 then 1 else 0 end) > 0
from mytable
where id > 5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ignore missing values when creating dummy variable - variables

Related

how to apply multiple addition in Splunk

ABL Progress 4gl : For Each with Count in Output-Stream

Create counter variable

Tabulate Command Stata

Postgres Aggregator which checks for occurrences

Categories

Resources