How to exclude rows with multiple conditions and "not in" in SAS? - sql

I have a doubt in a query, I thought the solution would be simple but I realized that it is not.
I have table A and the cod field is the main key.
COD
CATEGORY
PRODUCT
IND
SOURCE
1
Two
black
Y
ANEXO8
2
Two
black
Y
ANEXO8
3
Two
black
N
ANEXO8
4
Two
red
Y
ANEXO8
5
Two
red
Y
ANEXO8
6
Two
red
N
ANEXO8
7
Two
yellow
Y
ANEXO8
8
Two
yellow
N
ANEXO8
9
Two
green
N
ANEXO8
10
Two
green
N
ANEXO8
11
Two
pink
Y
ANEXO8
12
Two
pink
Y
ANEXO8
13
Two
pink
N
ANEXO8
14
Two
gray
N
SAS
15
Two
gray
N
SAS
16
Two
gray
N
SAS
What I am trying to get is to first filter out all rows that have the field "ANEXO8", then to exclude all rows that have the field PRODUCTequal to "black", finally to exclude all rows that have the field product equal to "red" only if the field IND is equal to "Y".
The resulting table would be equal to:
COD
CATEGORY
PRODUCT
IND
SOURCE
6
Two
red
N
ANEXO8
7
Two
yellow
Y
ANEXO8
8
Two
yellow
N
ANEXO8
9
Two
green
N
ANEXO8
10
Two
green
N
ANEXO8
11
Two
pink
Y
ANEXO8
12
Two
pink
Y
ANEXO8
13
Two
pink
N
ANEXO8
14
Two
gray
N
SAS
15
Two
gray
N
SAS
16
Two
gray
N
SAS
I have tried to perform a single query:
proc sql;
create table test as
select * from A
where SOURCE = "ANEXO8"
and PRODUCT not in ("black")
and (PRODUCT not in ("red") and IND ne "Y"));
run;
But I don't get the result I want, do you know what I could do, or maybe where am I going wrong?

Try this
proc sql;
create table test as
select * from A
where SOURCE = "ANEXO8"
and PRODUCT not in ("black")
and not (PRODUCT in ("red") and IND = "Y"));
run;
You have to be careful with how the parenthesis are actually influencing the negation

Your logic is close but the last and should be or:
create table test as
select * from A
where SOURCE = 'ANEXO8' and
PRODUCT <> 'black' and
(PRODUCT <> 'red' or IND <> 'Y');
This is simply a logic error. Do note other differences:
NOT IN seems over kill when "not equals" is sufficient.
The SQL Standard string delimiter is a single quote not a double quote.
The SQL Standard not-equals operator is <>.

Related

Multilevel Indexing with Groupby

Being new to python I'm struggling to apply other questions about the groupby function to my data. A sample of the data frame :
ID Condition Race Gender Income
1 1 White Male 1
2 2 Black Female 2
3 3 Black Male 5
4 4 White Female 3
...
I am trying to use the groupby function to gain a count of how many black/whites, male/females, and income (12 levels) there are in each of the four conditions. Each of the columns, including income, are strings (i.e., categorical).
I'd like to get something such as
Condition Race Gender Income Count
1 White Male 1 19
1 White Female 1 17
1 Black Male 1 22
1 Black Female 1 24
1 White Male 2 12
1 White Female 2 15
1 Black Male 2 17
1 Black Female 2 19
...
Everything I've tried has come back very wrong so I don't think I'm anywhere near right, but I"m been using variations of
Data.groupby(['Condition','Gender','Race','Income'])['ID'].count()
When I run the above line I just get a 2 column matrix with an indecipherable index (e.g., f2df9ecc...) and the second column is labeled ID with what appear to be count numbers. Any help is appreciated.
if you would investigate the resulting dataframe you would see that the columns are inside the index so just reset the index...
df = Data.groupby(['Condition','Gender','Race','Income'])['ID'].count().reset_index()
that was mainly to demonstrate but since you what you want you can sepcify the argument 'as_index' as following:
df = Data.groupby(['Condition','Gender','Race','Income'],as_index=False)['ID'].count()
also since you want the last column to be 'count' :
df = df.rename(columns={'ID':'count'})

Excel VBA - Group Data by Column A, Get the Range Value from C - Copy results to New Sheet

I've been trying to search for an example of this grouping and tested few code snippets but haven't been able to adapt it to what I need as I'm just getting to know Excel vba.
What I'm trying to do is to group by column A then get the range of the values used in that category which are in column C and get the results in a new worksheet.
Main Sheet.
A B C D
3 Baseball 4 Blue
2 Football 1 Red
2 Football 3 Red
3 Baseball 4 Blue
1 Soccer 2 Green
3 Baseball 4 Blue
1 Soccer 3 Green
1 Soccer 5 Green
2 Football 2 Red
Expected Results:
New Sheet.
A B C D
1 Soccer 2-5 Green
2 Football 1-3 Red
3 Baseball 4 Blue
If you need column C to be a range of value, eg 2 - 5, then it's text in Excel. Pivot table only able to return Min, Max, Sum, Average, but not range of the value.
You will need using VBA to solve the problem.
First, copy column A,B,D to some where, then using Remove Duplicate.
To find out the Unique combination.
Eg: (Assuming you have some new records in future)
A B C D
3 Baseball 4 Blue
2 Football 1 Red
2 Football 3 Red
3 Baseball 4 Blue
1 Soccer 2 Green
3 Baseball 4 Blue
1 Soccer 3 Green
1 Soccer 5 Green
2 Football 2 Red
4 Tennis 3 Yellow
Then you should have something like below:
A B D
1 Soccer Green
2 Football Red
3 Baseball Blue
4 Tennis Yellow
Then using Loop, to find out the range of the value for each of the Unique Combination (here we have 4 unique records).
*** assume that you know how to use loop to find out the Range of each combination.
I've actually figure this out:
For Each key In fCatId.Keys
'Debug.Print fCatId(key), key
With wshcore
llastrow = wshcore.Range("A" & Rows.Count).End(xlUp).Row
.Range("A1:N" & llastrow).AutoFilter
.Range("A1:N" & llastrow).AutoFilter Field:=1, Criteria1:=fCatId(key)
lwmin = WorksheetFunction.Subtotal(5, Range("H:H"))
lwmax = WorksheetFunction.Subtotal(4, Range("H:H"))
Im getting column a: fcatid, b: key, lwmin: lowest value and lwmax: highest.

Excel VBA function to solve impossible round-robin tournament roster with venue constraint

I am really having difficulty generating a round-robin tournament roster with the following conditions:
10 Teams (Teams 1 - 10)
5 Fields (Field A - E)
9 Rounds (Round 1 - 9)
Each team must play every other team exactly once.
Only two teams can play on a field at any one time. (i.e. all 5 fields always in use)
No team is allowed to play on any particular field more than twice. <- This is the problem!
I have been trying on and off for many years to solve this problem on paper without success. So once and for all, I would like to generate a function in Excel VBA to test every combination to prove it is impossible.
I started creating a very messy piece of code that generates an array using nested if/while loops, but I can already see it's just not going to work.
Is there anyone out there with a juicy piece of code that can solve?
Edit: Thanks to Brian Camire's method below, I've been able to include further desirable constraints and still get a solution:
No team plays the same field twice in a row
A team should play on all the fields once before repeating
The solution is below. I should have asked years ago! Thanks again Brian - you are a genius!
Round 1 2 3 4 5 6 7 8 9
Field A 5v10 1v9 2v4 6v8 3v7 4v10 3v9 7v8 1v2
Field B 1v7 8v10 3v6 2v9 4v5 6v7 1v8 9v10 3v5
Field C 2v6 3v4 1v10 5v7 8v9 1v3 2v5 4v6 7v10
Field D 4v9 2v7 5v8 3v10 1v6 2v8 4v7 1v5 6v9
Field E 3v8 5v6 7v9 1v4 2v10 5v9 6v10 2v3 4v8
I think I've found at least one solution to the problem:
Round Field Team 1 Team 2
1 A 3 10
1 B 7 8
1 C 1 9
1 D 2 4
1 E 5 6
2 A 8 10
2 B 1 5
2 C 2 6
2 D 3 7
2 E 4 9
3 A 1 4
3 B 2 3
3 C 8 9
3 D 5 7
3 E 6 10
4 A 6 7
4 B 4 10
4 C 2 8
4 D 5 9
4 E 1 3
5 A 2 9
5 B 3 8
5 C 4 7
5 D 1 6
5 E 5 10
6 A 3 9
6 B 4 5
6 C 7 10
6 D 6 8
6 E 1 2
7 A 5 8
7 B 6 9
7 C 1 10
7 D 3 4
7 E 2 7
8 A 4 6
8 B 2 10
8 C 3 5
8 D 1 8
8 E 7 9
9 A 2 5
9 B 1 7
9 C 3 6
9 D 9 10
9 E 4 8
I found it using the OpenSolver add-in for Excel (as the problem was too large for the built-in Solver feature). The steps were something like this:
Set up a table with 2025 rows representing the possible matches -- that is, possible combinations of round, field, and pair of teams (with columns like the table above), plus one extra column that will be a binary (0 or 1) decision variable indicating if the match is to be selected.
Set up formulas to use the decision variables to calculate: a) the number matches at each field in each round, b) the number of matches between each pair of teams, c) the number of matches played by each team in each round, and, d) the number of matches played by each team at each field.
Set up a formula to use the decision variables to calculate the total number of matches.
Use OpenSolver to solve a model whose objective is to maximize the result of the formula from Step 3 by changing the decision variables from Step 1, subject to the constraints that the decision variables must be binary, the results of the formulas from Steps 2.a) through c) must equal 1, and the results of the formulas from Step 2.d) must be less than or equal to 2.
The details are as follows...
For Step 1, I set up my table so that columns A, B, C, and D represented the Round, Field, Team 1, and Team 2, respectively, and column E represented the decision variable. Row 1 contained the column headings, and rows 2 through 2026 each represented one possible match.
For Step 2.a), I set up a vertical list of rounds 1 through 9 in cells I2 through I10, a horizontal list of fields A through E in cells J1 through N1, and a series of formulas to calculate the number of matches in each field in each round in cells J2 through N10 by starting with =SUMIFS($E$2:$E$2026,$A$2:$A$2026,$I2,$B$2:$B$2026,J$1) in cell J2 and then copying and pasting.
For Step 2.b), I set up a vertical list of teams 1 through 9 in cells I13 through I21, a horizontal list of opposing teams 2 through 10 in cells J12 through R12, and a series of formulas to calculate the number of matches between each pair of teams in the "upper right triangular half" of cells J13 through R21 (including the diagonal) by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I13,$D$2:$D$2026,J$12) in cell J13 and then copying and pasting.
For Step 2.c), I set up a vertical list of teams 1 through 10 in cells I24 through I33, a horizontal list of rounds 1 through 9 in cells J23 through R23, and a series of formulas to calculate the number of matches played by each team in each round in cells J24 through R33 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I24,$A$2:$A$2026,J$23)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I24,$A$2:$A$2026,J$23) in cell J24 and then copying and pasting.
For Step 2.d), I set up a vertical list of teams 1 through 10 in cells I36 through I45, a horizontal list of fields A through B in cells J35 through N45, and series of formulas to calculate the number of matches played by each team at each field in cells J36 through N45 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I36,$B$2:$B$2026,J$35)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I36,$B$2:$B$2026,J$35) in cell J36 and then copying and pasting.
For Step 3, I set up a formula to calculate the total number of matches in cell G2 as =SUM($E$2:$E$2026).
For Step 4, in the OpenSolver Model dialog (available from Data, OpenSolver, Model) I set the Objective Cell to $G$2, the Variable Cells to $E$2:$E$2026, and added constraints as described above and detailed below (sorry that the constraints are not listed in the order that I described them):
Note that, for the constraints described in Step 2.b), I needed to add the constraints separately for each row, since OpenSolver raised an error message if the constraints included the blank cells in the "lower left triangular half".
After setting up the model, OpenSolver highlighted the objective, variable, and constraint cells as shown below:
I then solved the problem using OpenSolver (via Data, OpenSolver, Solve). The selected matches are the ones with a 1 in column E. You might get a different solution than I did, as there might be many feasible ones.
come on ... that's an easy one for manual solution ;-)
T1 T2 VE
1 2 A
1 3 A
1 4 B
1 5 B
1 6 C
1 7 C
1 8 D
1 9 D
1 10 E
2 3 A
2 4 B
2 5 B
2 6 C
2 7 C
2 8 D
2 9 D
2 10 E
3 4 C
3 5 C
3 6 D
3 7 D
3 8 E
3 9 E
3 10 B
4 5 C
4 6 D
4 7 D
4 8 E
4 9 E
4 10 A
5 6 E
5 7 E
5 8 A
5 9 A
5 10 D
6 7 E
6 8 A
6 9 A
6 10 B
7 8 B
7 9 B
7 10 A
8 9 B
8 10 C
9 10 C
As far as I have checked no team more then twice on the same venue. Please double check.
To divide it into rounds should be a easy one.
Edit: this time with only 5 venues :-)
Edit 2: now also with allocated rounds :-)
Edit 3: deleted the round allocation again because it was wrong.

Qlikview - Scatter chart dot colors dimension setup not working

I have some data that I want to display in scatter chart. I have the following two dimensions:
Dimension1: This is each record in the table - say unique id for each row. So the number of dots should be equal to number of records.
Dimension2: This is a combination of 2 columns. tp and vc. Colors of each dot is based on these 2 columns.
tp vc
1 a 1
2 b 2
3 c 1
So there will be dots of 3 colors based on the above tp and vc combinations. Then there are 3 expressions representing X and Y and Size of dot. I am not sure how to configure the dimensions to achieve the goal.
Thanks
You will need a calculated dimmension which is the concatanation expression defined as =tp & vc in your case.
Then this will be your single dimmension. Then your x,y,size expressions make up the remaining requirements for this chart.
This will give you three colors, one for each unique record combination and they will be labled a1 and b2 and c1.
id tp vc x y size
1 | a | 1 | 3 | 5 | 7
2 | b | 2 | 1 | 2 | 10
3 | c | 1 | 9 | 5 | 5

count the instances of value in sub-query, update table

I am trying to count the number of times a value (mytype) appears within a distinct id value, and update my table with this count (idsubtotal) for each row. The table I have:
id | mytype | idsubtotal
-----+--------+-----------
44 red
101 red
101 red
101 blue
101 yellow
494 red
494 blue
494 blue
494 yellow
494 yellow
I need to calculate/update the idsubtotal column, so it is like:
id | mytype | idsubtotal
-----+--------+-----------
44 red 1
101 red 2
101 red 2
101 blue 1
101 yellow 1
494 red 1
494 blue 2
494 blue 2
494 yellow 2
494 yellow 2
When I try this below, it is counting how many times the mytype value appears in the entire table, but I need to know how many times it appears within that sub-group of id values (e.g. How many times does "red" appear within id 101 rows, answer = 2).
SELECT id, mytype,
COUNT(*) OVER (PARTITION BY mytype) idsubtotal
FROM table_name
I know storing this subtotal in the table itself (versus calculating it live when needed) constitutes a bad data model for the table, but I need to do it this way in my case.
Also, my question is similar to this question but slightly different, and nothing I've tried to tweak using my very primitive understanding of SQL from the previous responses or other posts have worked. TIA for any ideas.
UPDATE table_name a
SET idsubtotal=( SELECT COUNT(1)
FROM table_name b
WHERE a.id=b.id
AND a.mytype=b.mytype
)
When I try this below, it is counting how many times the mytype value appears in the entire table, but I need to know how many times it appears within that sub-group of id values (e.g. How many times does "red" appear within id 101 rows, answer = 2).
SELECT id, mytype, COUNT(*)
FROM table_name
GROUP BY id, mytype