SQL fetch records using group by with 3 conditions - sql

I'm trying to write a query which gives me the number of patient visits by age, gender and condition(Diabetes, Hypertension etc). Get the visit count for patients having diabetes and group by gender and patients who fall between the age range of 45-54. I used Inner Join to get only the rows which are present in both tables. I get the error:
age.Age is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
Do you think I should use partition by age.age?
TABLE_A
+------------+------------+------------+
| Member_Key | VisitCount | date |
+------------+------------+------------+
| 4000 | 1 | 2014-05-07 |
| 4000 | 1 | 2014-05-09 |
| 4001 | 2 | 2014-05-08 |
+------------+------------+------------+
TABLE_B
+------------+--------------+
| Member_Key | Condition |
+------------+--------------+
| 4000 | Diabetes |
| 4000 | Diabetes |
| 4001 | Hypertension |
+------------+--------------+
TABLE_C
+------------+---------------+------------+
| Member_Key | Member_Gender | Member_DOB |
+------------+---------------+------------+
| 4000 | M | 1970-05-21 |
| 4001 | F | 1968-02-19 |
+------------+---------------+------------+
Query
SELECT c.conditions,
age.gender,
CASE
WHEN age.age BETWEEN 45 AND 54
THEN SUM(act.visitcount)
END AS age_45_54_years
FROM table_a act
INNER JOIN
(
SELECT DISTINCT
member_key,
conditions
FROM table_b
) c ON c.member_key = act.member_key
INNER JOIN
(
SELECT DISTINCT
member_key,
member_gender,
DATEPART(year, '2017-10-16')-DATEPART(year, member_dob) AS Age
FROM [table_c]
) AS age ON age.member_key = c.member_key
GROUP BY c.conditions,
age.member_gender;
Expected Output
+--------------+--------+-------------+
| Condition | Gender | TotalVisits |
+--------------+--------+-------------+
| Diabetes | M | 2 |
| Hypertension | F | 2 |
+--------------+--------+-------------+

You can simplify your query filtering the age on the WHERE condition
And as Sean Lange said, use DATEDADD and GETDATE() to calculate the age more accurately.
SQL DEMO
SELECT [Condition],
[Member_Gender] as [Gender],
SUM([VisitCount]) as [VisitCount]
FROM TableA A
JOIN (SELECT DISTINCT [Member_Key], [Condition]
FROM TableB) B
ON A.[Member_Key] = B.[Member_Key]
JOIN TableC C
ON A.[Member_Key] = C.[Member_Key]
WHERE [Member_DOB] BETWEEN DATEADD(year, -50 , GETDATE())
AND DATEADD(year, -45 , GETDATE())
GROUP BY [Condition], [Member_Gender]
EDIT
Have to change the WHERE condition to solve the age precision and allow index use.

Related

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

Measure population on several dates

I want to measure the population of our manucipality (which contains out of several places). I've got two tables in: my first dataset is a calender table with a row for each first day of every month.
My second table contains alle the people that live and have lived in the manucipality.
What I want is the population of each place on every first day of the month from my calender table. I've put some raw data below (just a few records of the persons table because it contains 100.000 records)
Calender table:
+----------+
| Date |
+----------+
| 1-1-2018 |
+----------+
| 1-2-2018 |
+----------+
| 1-3-2018 |
+----------+
| 1-4-2018 |
+----------+
Persons table
+-----+-----------+-----------+---------------+-------+
| BSN | Startdate | Enddate | Date of death | Place |
+-----+-----------+-----------+---------------+-------+
| 1 | 12-1-2000 | null | null | A |
+-----+-----------+-----------+---------------+-------+
| 2 | 10-5-2011 | null | 22-1-2018 | B |
+-----+-----------+-----------+---------------+-------+
| 3 | 16-12-2011| 10-2-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
| 4 | 9-11-2012 | null | null | B |
+-----+-----------+-----------+---------------+-------+
| 5 | 8-9-2013 | null | 27-3-2018 | A |
+-----+-----------+-----------+---------------+-------+
| 6 | 7-10-2017 | 28-3-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
My expected result:
+----------+-------+------------+
| Date | Place | Population |
+----------+-------+------------+
| 1-1-2018 | A | 2 |
+----------+-------+------------+
| 1-1-2018 | B | 4 |
+----------+-------+------------+
| 1-2-2018 | A | 2 |
+----------+-------+------------+
| 1-2-2018 | B | 3 |
+----------+-------+------------+
| 1-3-2018 | A | 2 |
+----------+-------+------------+
| 1-3-2018 | B | 2 |
+----------+-------+------------+
| 1-4-2018 | A | 1 |
+----------+-------+------------+
| 1-4-2018 | B | 1 |
+----------+-------+------------+
What I've done so far but doesnt seems to work:
SELECT a.Place
,c.Date
,(SELECT COUNT(DISTINCT(b.BSN))
FROM Person as b
WHERE b.Startdate < c.Date
AND (b.Enddate > c.Date OR b.Enddate is null)
AND (b.Date of death > c.Date OR b.Date of death is null)
AND a.Place = b.Place) as Population
FROM Person as a
JOIN Calender as c
ON a.Startdate <= c.Date
AND a.Enddate >= c.Date
GROUP BY Place, Date
I hope someone can help finding out the problem. Thanks in advance
First cross join Calender and the places to get the date/place pairs. Then left join the persons on the place and the date. Finally group by date and place to get the count of people for that day and place.
SELECT [ca].[Date],
[pl].[Place],
count([pe].[Place]) [Population]
FROM [Calender] [ca]
CROSS JOIN (SELECT DISTINCT
[pe].[Place]
FROM [Persons] [pe]) [pl]
LEFT JOIN [Persons] [pe]
ON [pe].[Place] = [pl].[Place]
AND [pe].[Startdate] <= [ca].[Date]
AND (colaesce([pe].[Enddate],
[pe].[Date of death]) IS NULL
OR coalesce([pe].[Enddate],
[pe].[Date of death]) > [ca].[Date])
GROUP BY [ca].[Date],
[pl].[Place]
ORDER BY [ca].[Date],
[pl].[Place];
Some notes and assumptions:
If you have a table listing the places use that instead of the subquery aliases [pl]. I just had no other option with the given tables.
I believe the Date of death also implies an Enddate for the same day. You might want to consider a trigger, that sets the Enddate automatically to the Date of death if it isn't null. That would make things easier and probably more consistent.

SQL Query to Return SUMS and Count Ordered by Date

I have the two following tables:
Table 1
Table 2
What I want to do is to have a query that returns a SUM of PIT_VALORTOTAL, PIT_VOLUME and a count of PED_IDPESSOA per date. What I have so far is:
SELECT SUM(PIT_VALORTOTAL) AS VALORTOTAL, SUM(PIT_VOLUME) AS VOLUME, COUNT(DISTINCT PED_IDPESSOA) AS PESSOA FROM PEDIDOS_ITENS INNER JOIN PEDIDOS ON PIT_IDPEDIDO = PED_ID;
And it returns the sums and the count correctly, but I don't have a clue on how to get these seperatly per dates. So what I have is this:
VALORTOTAL | VOLUME | PESSOA |
49783.2000000 | 679780.360000| 11 |
And what I want is something like:
| DATE | VALORTOTAL | VOLUME | PESSOA |
| 2017-09-03| 1012,00 | 1209 | 12 |
| 2017-09-03| 2012,00 | 1450 | 10 |
| 2017-09-03| 3016,00 | 2500 | 20 |
| 2017-09-03| 3016,00 | 3200 | 5 |
| 2017-09-03| 2016,00 | 4000 | 9 |
You just need group by:
SELECT PED_DATA, SUM(PIT_VALORTOTAL) AS VALORTOTAL, SUM(PIT_VOLUME) AS VOLUME,
COUNT(DISTINCT PED_IDPESSOA) AS PESSOA
FROM PEDIDOS_ITENS pi INNER JOIN
PEDIDOS p
ON PIT_IDPEDIDO = PED_ID
GROUP BY PED_DATA
ORDER BY PED_DATA

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product

SQL Aggregation logic - Not sure

I am implementing the multiple separate logics between 2 tables by joining them in a view. I need to have minimal number of views with all logics implemented. I am struck with the following issue while implementing and need your expertise.
I had arrived to the base logic of the view as it serves as the base for most of the logics and I am to stick to it;
SELECT Acct_no, max(txn_date),......
FROM ACCT_CRD ac
INNER JOIN TRNSCTN txn ON ( ac.crd_no = txn.crd_no)
GROUP BY ACCT_NO, TO_CHAR(TXN_DATE,'YYYYMM');
Table_name: ACCT_CRD (This table has account and the credit card numbers with UPI on credit card numbers and a single account number can have multiple card_numbers)
Data:
Acct_no | Crd_no | biz_date | Status
--------+--------+------------+--------
acct1 | crd11 | 2015-10-01 | A
--------+--------+------------+--------
acct1 | crd12 | 2015-10-02 | A
--------+--------+------------+--------
acct1 | crd13 | 2015-10-03 | A
Table_name: TRNSCTN (This table has transactions done through the credit cards; data doesn't reflect any actual meaning) Please note that this table has dates with 5 years, for sample, I have took for only 1 month;
Data:
Crd_no | Txn_date | Txn_code | crd_limit | crd_commit
--------+-------------+----------+-------------+------------
crd11 | 2015-10-02 | 10 | 10000 | 9000
--------+-------------+----------+-------------+------------
crd11 | 2015-10-02 | 10 | 10000 | 14000
--------+-------------+----------+-------------+------------
crd11 | 2015-10-02 | 20 | 10000 | 16000
--------+-------------+----------+-------------+------------
crd11 | 2015-10-03 | 20 | 10000 | 12000
--------+-------------+----------+-------------+------------
crd11 | 2015-10-05 | 20 | 10000 | 15000
--------+-------------+----------+-------------+------------
crd12 | 2015-10-03 | 10 | 20000 | 5000
--------+-------------+----------+-------------+------------
crd12 | 2015-10-03 | 20 | 20000 | 22000
--------+-------------+----------+-------------+------------
crd12 | 2015-10-04 | 30 | 20000 | 25000
--------+-------------+----------+-------------+------------
crd12 | 2015-10-04 | 30 | 20000 | 5000
--------+-------------+----------+-------------+------------
crd13 | 2015-10-04 | 30 | 25000 | 10000
Here, in TRNSCTN table, for each card on each day if CRD_COMMIT > CRD_LIMIT, then take the count as 1 even if there are more records with same card_no and txn_date with crd_commit >= crd_limit or crd_commit < crd_limit;
Order is not important on a given day transactions;
SELECT crd_no, txn_date,
MAX(case when crd_commit > crd_limit then 1 else 0 end) day_overlimit_cnt
FROM TRNSCTN group by crd_no, txn_date;
Essentially, the above data in the table transforms to
Crd_no | txn_date | day_overlimit_cnt
-------+-------------+-------------------
crd11 | 2015-10-02 | 1
-------+-------------+-------------------
crd11 | 2015-10-03 | 1
-------+-------------+-------------------
crd11 | 2015-10-05 | 1
-------+-------------+-------------------
crd12 | 2015-10-03 | 1
-------+-------------+-------------------
crd12 | 2015-10-04 | 1
-------+-------------+-------------------
crd13 | 2015-10-04 | 0
Then, I need to find for each card in a given month, how many times it has exceeded the day_overlimit_cnt;
SELECT crd_no, to_char(txn_date,'yyymm') as txn_mnth,
SUM(day_overlimit_cnt) sum_month_ovrlmt from (select crd_no, txn_date,
MAX(case when crd_commit > crd_limit then 1 else 0 end) day_overlimit_cnt
FROM TRNSCTN group by crd_no, txn_date) dt_check
GROUP BY crd_no, to_char(txn_date,'yyymm');
The data from the above query will be
Crd_no | txn_mnth | sum_month_ovrlmt
-------+----------+-----------------
crd11 | 201510 | 3
-------+----------+-----------------
crd12 | 201510 | 2
-------+----------+-----------------
crd13 | 201510 | 0
And then finally find the max(sum_month_ovrlmt) at account level by joining the above one to ACCT_CRD;
SELECT acct_no, MAX(sum_month_ovrlmt) acct_mnth_ovrlmt
FROM ACCT_CRD ac
JOIN (SELECT crd_no, to_char(txn_date,'yyymm') as txn_mnth,
SUM(day_overlimit_cnt) sum_month_ovrlmt FROM (SELECT crd_no, txn_date, MAX(case when crd_commit > crd_limit then 1 else 0 end) day_overlimit_cnt FROM TRNSCTN group by crd_no, txn_date) dt_check GROUP BY crd_no, to_char(txn_date,'yyymm')) dt_dt_check dt on (ac.cr_no = dt.crd_no) GROUP BY acct_no;
Final output:
Acct_no | acct_mnth_ovrlmt
--------+-----------------
acct1 | 3
How to embed the above logic into the following base query; That is how to derive acct_mnth_ovrlmt without affecting the other columns data in the select part.
SELECT Acct_no, max(txn_date),......
FROM ACCT_CRD ac
INNER JOIN TRNSCTN txn ON ( ac.crd_no = txn.crd_no)
GROUP BY ACCT_NO, TO_CHAR(TXN_DATE,'YYYYMM');
Thanks in advance for your time. As a last resort, I will try to embed the above derived code until aggregation of cards into the base query and will try it out.
Greetings Gordon Linoff,
Thank you for your post. I need the distinct conditional count as you mentioned at card_number level. And as account_number can have more than 1 card_number, I need to find out the max(overlimit_cnt) at account_level;
i.e., say if the distinct conditional count is as
crd11 | 3
crd12 | 2
crd13 | 0
As all these card_numbers belong to acct1, need to get the max(overlimit_cnt) of the above card_numbers; i.e.,
acct | 3
I guess I need to again have another join to the same table with group by on different columns as
SELECT Acct_no, max(txn_date),......,MAX(day_overlimit_cnt)
FROM ACCT_CRD ac
INNER JOIN TRNSCTN txn ON ( ac.crd_no = txn.crd_no)
INNER JOIN ( SELECT CRD_NO, TO_CHAR(TXN_DATE,'YYYYMM') AS TXN_DATE_Y,
COUNT(DISTINCT (CASE WHEN crd_commit > crd_limit then TXN_DATE end)) day_overlimit_cnt from TRNSCTN GROUP BY CRD_NO, TO_CHAR(TXN_DATE,'YYYYMM')) TRNSCTN_OVRLMT ON (TRNSCTN.CRD_NO=TRNSCTN_OVRLMT.CRD_NO AND TO_CHAR(TRNSCTN.TXN_DATE,'YYYYMMDD')=TRNSCTN_OVRLMT.TXN_DATE_Y) GROUP BY ACCT_NO, TO_CHAR(TXN_DATE,'YYYYMM');
Can I avoid new join TRNSCTN_OVRLMT and derive above value.
Your logic is a little hard to follow, but I think you just want conditional count distinct:
SELECT Acct_no, max(txn_date),
COUNT(DISTINCT (CASE WHEN crd_commit > crd_limit THEN txn_date END)) as DaysOverLimit
FROM ACCT_CRD ac INNER JOIN
TRNSCTN txn
ON ac.crd_no = txn.crd_no
GROUP BY ACCT_NO, TO_CHAR(TXN_DATE,'YYYYMM');