SQL works in Athena Engine v1 but not v2 - sql

I have a SQL query embedded into a system that has worked successfully until now in Athena with engine version 1. However it fails in engine version 2 and I haven't been able to work out why.
Here is a generalised version of the SQL. It sums the number of people in 3 groups: adults, NY residents and the overlap of the two. (NY adults).
In version 1 this works, but in v2 I get the error "column z.id_field cannot be resolved"
WITH BASE AS (SELECT person_id, age, state
FROM people
WHERE gender = 'male'
)
,group_a as (
SELECT distinct (person_id) as id_field
FROM BASE
WHERE age > 17
),
group_b as (
SELECT distinct (person_id) as id_field
FROM BASE
WHERE state = 'NY'
)
SELECT CASE WHEN z.id_field is null then 'group_b_only' WHEN r.id_field is null then 'group_a_only' ELSE 'Overlap' END as group
, COUNT (coalesce (z.id_field, r.id_field)) as count
FROM group_a AS z FULL OUTER JOIN group_b as r USING (id_field)
GROUP BY 1;

As a note, in any database this would be simpler as an aggregation and probably faster too:
SELECT grp, COUNT(*)
FROM (SELECT person_id,
(CASE WHEN MAX(age) > 17 AND MAX(state) = 'NY' THEN 'Both'
WHEN MAX(age) > 17 THEN 'Age Only'
ELSE 'State Only'
END) as grp
FROM people
WHERE gender = 'male' AND
(age > 17 OR state = 'NY')
GROUP BY person_id
) x
GROUP BY grp;
The above assumes that person_id can be repeated in people. If that is not the case, then this can be simplified to:
SELECT (CASE WHEN age > 17 AND state = 'NY' THEN 'Both'
WHEN age > 17 THEN 'Age Only'
ELSE 'State Only'
END) as grp, COUNT(*)
FROM people
WHERE gender = 'male' AND
(age > 17 OR state = 'NY')
GROUP BY grp;

Related

using SQL to detect sex change

I have a dataset like this:
id : E.g. 111, 111, 111, 112, 112, 113, 113
Year: E.g. 2010, 2011, 2012, 2010, 2011,2010, 2015
Sex: E.g. M, M, F, F, F, M, M
In this dataset, ID = 111 had a sex change (switch from M to F - or from F to M)
With postgre sql, I try to find out:
A: How many ids stay as man (and which ids)
B: How many ids stay as woman (and which ids)
C: How many ids go from man to woman (and which ids)
D: How many ids go from woman to man (and which ids)
I try like this:
# problem A
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
# problem B
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
# all sex change
SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
Is it correct? Or is window-lag function needed?
You can try this, to calculate in advance some metrics:
SELECT *
,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
FROM mytable;
and then solve your issues:
SELECT SUM(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN 1 ELSE 0 END)
,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN CAST(id AS VARCHAR(12)) END, ', ')
,SUM(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN 1 ELSE 0 END)
,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN CAST(id AS VARCHAR(12)) END, ', ')
,SUM(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1) THEN 1 ELSE 0 END)
,string_agg(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1) THEN CAST(id AS VARCHAR(12)) END, ', ')
FROM
(
SELECT *
,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
FROM mytable
) DS;
Here is the full working example.
Assuming column SEX will only have either 'F' or 'M' as value, problem A can be solved
problem A
SELECT COUNT(DISTINCT ID) FROM table WHERE SEX != 'F';
SELECT DISTINCT ID FROM table WHERE SEX != 'F';
step-by-step demo: db<>fiddle
Assuming, the change happens only once, you could use the first_value() window function:
SELECT DISTINCT -- 5
id,
CASE
WHEN first_sex = last_sex THEN 'Stay ' || sex -- 3
ELSE 'Change from ' || first_sex || ' To ' || last_sex -- 4
END sex_status
FROM (
SELECT
id,
sex,
first_value(sex) OVER (PARTITION BY id ORDER BY year) as first_sex, -- 1
first_value(sex) OVER (PARTITION BY id ORDER BY year DESC) as last_sex -- 2
FROM mytable
) s
Fetch first sex values per id over years
Fetch last sex values per id over years (Notice the different order: It gives the first value from the "bottom")
Compare first and last; if they are the same, return "Stay" and sex
Otherwise return "Change" with sexes. (Of course, you can do whatever you want here. Adding appropriate status identifiers or similar instead of pure text seems to make sense at this point.)
DISTINCT clause to reduce the records to one per id.
Afterwards you can do whatever statistics you want. For example counting the different status by GROUP BY sex_status:
demo: db<>fiddle
SELECT
sex_status,
COUNT(*)
FROM (
-- query from above
) s
GROUP BY sex_status

Compare the count of data in same column in same table and display the larger value

I wanted to a count of the same field for different values for example:
user{user_id, gender}
Gender can have obviously male or female :)
i want to get count for all the males and females i.e.
COUNT(male) COUNT(female)
4 16
but i'm confused because they come from the same gender column thanks
ALSO, I Want the result to only display the higher count. Like
COUNT(female)
16
Try the following using case statement.
select
sum(case when gender = 'male' then 1 else 0 end) as total_male,
sum(case when gender = 'female' then 1 else 0 end) as total_female
from user
If you are using MySQL then use following
select
sum(gender = 'male') as total_male,
sum(gender = 'female') as total_female
from user
If you are using PostgreSQL then use filter
select
count(1) filter (where gender = 'male') as total_male,
count(1) filter (where gender = 'female') as total_female
from user
You can achieve your final result by following query. here is the demo.
select
case
when total_male < total_female then total_female
else total_male
end as total_count
from
(
select
count(1) filter (where gender = 'male') as total_male,
count(1) filter (where gender = 'female') as total_female
from users
) t

SQL Aggreate Functions

I have table which list a number of cases and assigned primary and secondary technicians. What I am trying to accomplish is to aggregate the number of cases a technician has worked as a primary and secondary tech. Should look something like this...
Technician Primary Secondary
John 4 3
Stacy 3 1
Michael 5 3
The table that I am pulling that data from looks like this:
CaseID, PrimaryTech, SecondaryTech, DOS
In the past I have used something like this, but now my superiors are asking for the number of secondary cases as well...
SELECT PrimaryTech, COUNT(CaseID) as Total
GROUP BY PrimaryTech
I've done a bit of searching, but cant seem to find the answer to my problem.
Select Tech,
sum(case when IsPrimary = 1 then 1 else 0 end) as PrimaryCount,
sum(case when IsPrimary = 0 then 1 else 0 end) as SecondaryCount
from
(
SELECT SecondaryTech as Tech, 0 as IsPrimary
FROM your_table
union all
SELECT PrimaryTech as Tech, 1 as IsPrimary
FROM your_table
) x
GROUP BY Tech
You can group two subqueries together with a FULL JOIN as demonstrated in this SQLFiddle.
SELECT Technician = COALESCE(pri.Technician, sec.Technician)
, PrimaryTech
, SecondaryTech
FROM
(SELECT Technician = PrimaryTech
, PrimaryTech = COUNT(*)
FROM Cases
WHERE PrimaryTech IS NOT NULL
GROUP BY PrimaryTech) pri
FULL JOIN
(SELECT Technician = SecondaryTech
, SecondaryTech = COUNT(*)
FROM Cases
WHERE SecondaryTech IS NOT NULL
GROUP BY SecondaryTech) sec
ON pri.Technician = sec.Technician
ORDER By Technician;
SELECT COALESCE(A.NAME, B.NAME) AS NAME, CASE WHEN A.CASES IS NOT NULL THEN A.CASES ELSE 0 END AS PRIMARY_CASES,
CASE WHEN B.CASES IS NOT NULL THEN B.CASES ELSE 0 END AS SECONDARY_CASES
FROM
(
SELECT COUNT(*) AS CASES, PRIMARYTECH AS NAME FROM YOUR_TABLE
GROUP BY PRIMARYTECH
) AS A
FULL OUTER JOIN
(
SELECT COUNT(*) AS CASES, SECONDARYTECH AS NAME FROM YOUR_TABLE
GROUP BY SECONDARYTECH
) AS B
ON A.NAME = B.NAME

Calculate percentage for each value of a column sql

I want to rewrite this sql query so that he shows a record with 0 for the corresponding age range if there are no matches and I want that he counts the percentages for each value of Member instead of the '0' at this moment, can anyone help me how I can achieve this?
SELECT COUNT(Name) * 100 /
(select COUNT(*) from 'cities'
WHERE city= 'Hoeselt' AND Member = '0' ) AS 'perc',
CASE
WHEN age <= 30 THEN '18-30'
WHEN age <= 50 THEN '31-50'
ELSE '50+'
END AS age, COUNT(*) AS n
FROM 'cities'
WHERE city= 'Hoeselt' AND elected='yes' AND Member= '0'
GROUP BY CASE
WHEN age <= 30 THEN '18-30'
WHEN age <= 50 THEN '31-50'
ELSE '50+'
END
Hard to be certain that this will work for you without the DDL.
This is a great tool for helping people give you the best solution.
http://sqlfiddle.com/#!6
;WITH AgeCat AS
(
SELECT MinAge = 18
,MaxAge = 30
,Descr = '18-30' UNION ALL
SELECT 31, 49, '31-49' UNION ALL
SELECT 50, 200, '50+'
)
SELECT DISTINCT
C.Descr
,Perc = COUNT(*) OVER (PARTITION BY 0) / COUNT(*) OVER (PARTITION BY A.Descr) * 100
FROM AgeCat A
JOIN Cities C ON C.Age BETWEEN A.MinAge AND A.MaxAge
WHERE city = 'Hoeselt'
AND elected = 'yes'
AND Member = '0'
My approach to this is to use a CTE to define the age group. Next select all the age groups as a "driver" table, left joining in the cities information. Then, you have the age group even when there are no matches:
with c as (
select c.*,
(CASE WHEN age <= 30 THEN '18-30'
WHEN age <= 50 THEN '31-50'
ELSE '50+'
END) as agegrp
from cities
)
select COUNT(Name) * 100 / (select COUNT(*) from cities WHERE city= 'Hoeselt' AND Member = '0') as perc,
driver.agegrp,
COUNT(*) as n
from (select distinct agegrp from c) as driver left outer join
c
on driver.agegrp = c.agegrp
group by driver.agegrp

SQL query to (group by ) by condition of column

if I want to make a query that gets the count of users grouping ages
to get the counts each year as alone :
select count(*)
from tbl_user
group by age
how can I make a custom group by so I can get ages in ranges for example ...
like this example :
group by ages as 0-18 , 19-25 , 26-...
Use a CASE expression in a subquery and group by that expression in the outer query:
select age_group, count(*)
from (
select case when age between 0 and 18 then '0-18'
when age between 19 and 26 then '19-25'
...
end as age_group
from tbl_user
) t
group by age_group
SUM 1 and CASE WHEN work in MS SQL Server, which version of SQL are you using?
SELECT
SUM(CASE WHEN Age >= 0 AND Age <= 18 THEN 1 ELSE 0 END) AS [0-18],
SUM(CASE WHEN Age >= 19 AND Age <= 25 THEN 1 ELSE 0 END) AS [19-25]
FROM
YourTable
You could use a CASE statement:
SELECT Sum(CASE WHEN age BETWEEN 0 AND 18 THEN 1 ELSE 0 END) as [0-18],
Sum(CASE WHEN age BETWEEN 19 AND 25 THEN 1 ELSE 0 END) as [19-25],
Sum(CASE WHEN age BETWEEN 26 AND 34 THEN 1 ELSE 0 END) as [26-34]
FROM tbl_user
this will "flatten" the data into one row - to get one row per grouping use this as the basis for a View, then select from that.
Data belongs in a table, not in the code. The age categories are data, IMHO.
CREATE TABLE one
( val SERIAL NOT NULL PRIMARY KEY
, age INTEGER NOT NULL
);
INSERT INTO one (age) SELECT generate_series(0,31, 1);
CREATE TABLE age_category
( low INTEGER NOT NULL PRIMARY KEY
, high INTEGER NOT NULL
, description varchar
);
INSERT INTO age_category (low,high,description) VALUES
( 0,19, '0-18')
, ( 19,26, '19-25')
, ( 26,1111, '26-...')
;
SELECT ac.description, COUNT(*)
FROM one o
JOIN age_category ac ON o.age >= ac.low AND o.age < ac.high
GROUP BY ac.description
;