SPSS Chi-square test on raw data - testing

I want to run a chi square test on my raw data but SPSS is not reading the raw data as I want it to.
The raw data summarizes several test conditions, locations, groups of participants and results in one file. Example data:
# country gender Question 1 Question 2
1 chile f yes no
2 chile f no no
3 chile m no yes
4 chile m yes no
5 chile f yes no
6 chile f yes yes
7 chile m no yes
8 france f yes no
9 france f no no
10 france f no no
11 france m yes yes
12 france m yes no
13 france f yes no
14 france f no yes
15 china m yes yes
16 china f no yes
17 china f no yes
18 china f yes yes
19 china m yes yes
20 china m yes no
.....
I want to do do a chi-square test on comparing French men and Chilean women and their responses to question 1 through 14.
Problem 1: A filter will not make SPSS ignore the remaining rows such as Chinese participants.
Problem 2: Furthermore if I just try and compare male and female participants SPSS produces the following error message after calculating:
No statistics are computed because (Question) is a constant.<
I cannot accept that I am forced to summarize the data manually into a frequency table, so how do I make SPSS automatically do this?
EDIT: Allow me to rephrase. (Protip: don't ask online when you are frustrated by the problem)
Problem 1 again: I could just move the do the rows I need to a different file but if/when I need to do so many times it becomes increasingly cumbersome. Is there a way to select a combination of filters (for example) for the two columns in the contingency table?

I was using filters ("Select Cases...") wrong!
Solution: I first make a filter for the data I want to compare. (Pseudo code the menu Data>Select Cases...) Filter: IF French AND Male.
Now the filter (I label it 'FilterA' in the Variable View) and it creates a new filter column in the Data View with row variable = 1 for French and male, and zero for the rest.
Now the women from Chile are zero as I want them in the filter (this is important) but so are all the Chinese, the French women and so on.
So I define a second filter: IF Chile AND Female OR French AND Male
Now all the data is filtered out as shown by a diagonal line through the row-counter column on the far left in SPSS. Now when a statistical test is run this data will not be included in the test.
Now I can finally perform the test chi-square: I choose Descriptive Statistics>Crosstabs.
For column I choose "FilterA" which I created. For row I choose the QuestionX/whatever data I want calculated.
Fin.

Related

SQL where NOT EQUAL TO isn't working as intended - SQL Server 2016

I have the following query that is working differently when using != (or <>) compared to =.
The example I'm trying this with works as follows.
SELECT COUNT(*) FROM table1
returns 100.
SELECT COUNT(*)
FROM table1
WHERE year = '2017/2018' AND SOURCE = 'CSV'
returns 30.
Why then would the below not return a count of 70?
SELECT COUNT(*)
FROM table1
WHERE year != '2017/2018' AND SOURCE != 'CSV'
It appears to treat it like an OR instead of AND when updating to != and I'm struggling to figure out why. I have got around the issue by changing my query to use NOT EXISTS with a subselect, but the initial question remains.
Because there are rows where year is 2017/18 but source is not CSV, or source is CSV but year is not 2017/18. See De Morgan's laws.
Perhaps what you want is:
WHERE NOT (year = '2017/2018' AND SOURCE = 'CSV')
Personally I'm not normally a big fan of the NOT () wrapper, but in this case it best describes the semantics you're after. You could also say:
WHERE year <> '2017/2018' OR SOURCE <> 'CSV'
---- important difference ^^
Example db<>fiddle
Think about it another way. The parking lot has 10 cars, and we want to find out how many cars are red BMWs (let's call it x), and how many cars are NOT red BMWs (which should be 10 - x):
Color
Make
Red?
BMW?
Red ANDBMW?
NotRed?
NotBMW?
Not Red ANDNot BMW?
Not Red ORNot BMW?
Red
BMW
Yes
Yes
Yes
No
No
No
No
Red
BMW
Yes
Yes
Yes
No
No
No
No
Red
BMW
Yes
Yes
Yes
No
No
No
No
Silver
BMW
No
Yes
No
Yes
No
No
Yes
Red
Honda
Yes
No
No
No
Yes
No
Yes
Purple
Toyota
No
No
No
Yes
Yes
Yes
Yes
Green
Lexus
No
No
No
Yes
Yes
Yes
Yes
Green
Toyota
No
No
No
Yes
Yes
Yes
Yes
Black
Ford
No
No
No
Yes
Yes
Yes
Yes
Orange
Audi
No
No
No
Yes
Yes
Yes
Yes
Sum of Yes:
4
4
3
6
6
5
7
If you count up all the cars that are both red and BMW, you get 3. If you want the inverse of that (the 7 remaining cars), you can't count up all the cars that are both NOT red and NOT BMW. This would skip the silver BMW and the red Honda, giving 5, not 7. You need to count all the cars that are either not red OR not BMW. Or that are NOT both red AND BMW.

How to merge two Spreadsheets into one in google spreadsheets

There are two Spreadsheets namely main and secondary. Main has 3 sheets in it and Secondary has 1 sheet under it. For simplicity I'll show modified version of them both under
Main spreadsheet
Name Qrating Brating Crating
A 8 5 4
B 5 2 1
C 6 7 9
Secondary spreadsheet includes personal details of same people present in Main spreadsheet as seen below.
Name Age Rank qualified
A 23 2 yes
B 45 22 yes
C 44 23 yes
I have looked up ways to merge and combine but all seem to be ways in combining data vertically, but what i want is all the data in secondary to be added in Main file in accordance to the name. I don't seem to find the solution online where data is filled horizontally after tallying a common column between two files. if anyone can give solution or guide me to a direction that would be very helpful. Thankyou.
end result should be like this when merged
Name Qrating Brating Crating
A 8 5 4
B 5 2 1
C 6 7 9
Name Age Rank qualified
A 23 2 yes
B 45 22 yes
C 44 23 yes
Name Age Rank qualified Qrating Brating Crating
A 23 2 yes 8 5 4
B 45 22 yes 5 2 1
C 44 23 yes 6 7 9
It is very easy to do this with click on feature in Microsoft Excel, I was expecting the same in Google sheet but it seems the add-ons help for this as I didn't find any tool in sheets itself. There are many add-ons to achieve this, I am using Merge Sheets add on for this.
You can merge multiple sheets form different spreadsheets like this
=query({
IMPORTRANGE("ID_1"; "SheetX!A2:E");
IMPORTRANGE("ID_2"; "SheetX!A2:E");
IMPORTRANGE("ID_3"; "SheetX!A2:E")
};"SELECT * WHERE Col1 IS NOT NULL")
and then adapt query as you need. Wait for an example from your side. Sincerely yours.

Multilevel Indexing with Groupby

Being new to python I'm struggling to apply other questions about the groupby function to my data. A sample of the data frame :
ID Condition Race Gender Income
1 1 White Male 1
2 2 Black Female 2
3 3 Black Male 5
4 4 White Female 3
...
I am trying to use the groupby function to gain a count of how many black/whites, male/females, and income (12 levels) there are in each of the four conditions. Each of the columns, including income, are strings (i.e., categorical).
I'd like to get something such as
Condition Race Gender Income Count
1 White Male 1 19
1 White Female 1 17
1 Black Male 1 22
1 Black Female 1 24
1 White Male 2 12
1 White Female 2 15
1 Black Male 2 17
1 Black Female 2 19
...
Everything I've tried has come back very wrong so I don't think I'm anywhere near right, but I"m been using variations of
Data.groupby(['Condition','Gender','Race','Income'])['ID'].count()
When I run the above line I just get a 2 column matrix with an indecipherable index (e.g., f2df9ecc...) and the second column is labeled ID with what appear to be count numbers. Any help is appreciated.
if you would investigate the resulting dataframe you would see that the columns are inside the index so just reset the index...
df = Data.groupby(['Condition','Gender','Race','Income'])['ID'].count().reset_index()
that was mainly to demonstrate but since you what you want you can sepcify the argument 'as_index' as following:
df = Data.groupby(['Condition','Gender','Race','Income'],as_index=False)['ID'].count()
also since you want the last column to be 'count' :
df = df.rename(columns={'ID':'count'})

Grid of counts by row and column in SQL

I'm really struggling to create a simple grid which I can provide to non-technical team members so they can easily look up the number of people in our SQL customer database based on any two demographic characteristics.
For example, we frequently get questions like "how many people in the database are Male aged 36-45" or "how many people aged 18-25 on our database are emailable". With a lookup grid they will be able to answer these for themselves.
The actual data has lots of variables so the real grid is likely to be large but a simplified example of the source data table has 1 row per customer (~3m rows) with the following columns:
party_id -- Customer ID #
, gender -- 'Male'/'Female'/'Unknown gender'
, age -- '18-25'/'26-35'/'36-45'/'46-55'/'56-65'/'66-80'/'Unknown age'
, emailability -- 'Emailable'/'Not Emailable'
The results grid will have the same rows and columns with a count of customers in each cell (i.e. the count of people who satisfy both the row and the column criteria). In this example the rows and columns would be:
Male
Female
Unknown gender
18-25
26-35
36-45
46-55
56-65
66-80
Unknown age
Emailable
Not emailable
To look up the number of people who satisfy any two criteria, you would just need to find the intersection on the grid (in the first question above, this is the intersection of row:"Male" and column:"18-25" or row:"18-25" and column: "Male").
This didn't sound like it should be a difficult problem but I'm completely stumped. I thought it would be solved by Pivots but I couldn't figure out a way to cross-tabulate more than 2 variables. SQL is likely not the right tool for this job either but there's not many other tools available to me right now so if possible I'd like to find a SQL solution but let me know if you know of better options.
I was surprised not to find an existing solution here but I might not be using the right search terms so apologies if this has already been answered.
Thanks!
--EDIT--
As requested here's some sample data:
party_id |gender |age |Emailability |
---------|-------|--------|--------------|
1 |Male |18-25 |Not Emailable |
2 |Female |Unknown |Emailable |
3 |Unknown|36-45 |Emailable |
4 |Male |36-45 |Not Emailable |
5 |Male |56-65 |Emailable |
6 |Female |26-35 |Emailable |
7 |Male |18-25 |Emailable |
8 |Unknown|18-25 |Not Emailable |
9 |Male |66-80 |Emailable |
10 |Female |26-35 |Emailable |
Based on this example data the first few rows of the grid filled in would look like this (I've had to drop the emailability columns because of the page width but they would be included too):
Male Female Unknown Gender 18-25 26-35 36-45 46-55 56-65 66-80
Male 5 0 0 2 0 1 0 1 1
Female 0 3 0 0 2 0 0 0 0
Unknown gender 0 0 2 1 0 1 0 0 0
18-25
26-35
36-45
46-55
56-65
66-80
Unknown age
Emailable
Not emailable
To clarify as I think it caused some confusion, I'm not trying to create a view for other database users - I'm hoping to create an output that I can export to excel for non-technical people to use as a handout/cheat sheet. It may be that this grid can't be set up as planned using SQL alone but due to the size of the dataset and limited numbers of tools I am hoping that it could be done in SQL. Thanks again for any suggestions.

pl sql. 0 rows merged

I am new to merge so this question may appear too dumb.
I'm running
MERGE INTO regions r
USING regions_temp t
ON (r.id_region = t.id_region)
WHEN NOT MATCHED THEN INSERT (r.id_region, r.region_name)
VALUES (t.id_region, t.region_name);
Before I run the command the tables have following records (id_region, region_name):
Regions:
1 1 Kiev
2 2 Asia
3 3 Australia
4 4 South America
5 5 Africa
Regions_temp:
1 1
2 2 Asia
3 3 Australia
4 4 South America
The command runs successfully but on the bottom of the window it says 0 rows merged and no records are merged.
Little help?
MERGE INTO regions r
USING regions_temp t
ON (r.id_region = t.id_region)
WHEN MATCHED THEN update set r.region_name = t.region_name
is what i actually needed
According to the merge_insert_clause documentation:
The merge_insert_clause specifies values to insert into the column of the target table if the condition of the ON clause is false.
Since there aren't records in Regions_temp whose id_region doesn't match a record in Regions, there are no rows to be merged.