Teradata SQL aggregate with conditions - sql

I have the below table with data for one month for some customers, and I need to aggregate this table based on conditions for the values of every column,
The conditions
If the customer's stability have >= 2 of "Very Unstable" status then return 1 else 0
In Value_1 if the customer has at least one record with the value: 1 along the entire month then return 1 else 0
|Cust_ID|Date |Stability |Value_1|
|-------+--------+--------------+-------|
|123 |3/1/2022|Unstable |1 |
|123 |3/2/2022|Very Unstable |0 |
|123 |3/3/2022|Stable |1 |
|123 |3/4/2022|Ver Stable |NULL |
|123 |3/5/2022|Unstable |NULL |
|123 |3/6/2022|Very Unstable |0 |
|123 |3/7/2022|Unstable |0 |
|123 |3/8/2022|Very Unstable |0 |
|… |… |… |… |
|123 |3/31/2022|Very Unstable|0 |
to be the result table like that:
|Cust_ID|Stability|Value_1|
|-------+---------+-------|
|123 |1 |1 |

This seems to match your description:
SELECT Cust_ID
-- If the customer's stability have >= 2 of "Very Unstable" status
-- then return 1 else 0
,CASE
WHEN COUNT(CASE WHEN Stability = 'Very Unstable' THEN 1 END) >= 2
THEN 1
ELSE 0
END AS STABILITY_COUNT
-- In Value_1 if the customer has at least one record with
-- the value: 1 along the entire month then return 1 else 0
,MAX(Value_1) Value_1_m
FROM ST_TABLE
GROUP BY 1

Related

In PostgreSQL, conditionally count rows

Background
I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g that looks like this:
+--+---------+----------------+
|id|treatment|outcome_category|
+--+---------+----------------+
|a |1 |cardiovascular |
|a |0 |cardiovascular |
|b |0 |metabolic |
|b |0 |sensory |
|c |1 |NULL |
|c |0 |cardiovascular |
|c |1 |sensory |
|d |1 |NULL |
|d |0 |cns |
+--+---------+----------------+
The Problem
I'd like to get a count of outcome_category by outcome_category for those id who are "ever treated" -- defined as "id's who have any row where treatment=1".
Here's the desired result:
+----------------+---------+
|outcome_category| count |
+----------------+---------+
|cardiovascular | 3 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
It would be fine if the result had to contain metabolic, like so:
+----------------+---------+
|outcome_category|treatment|
+----------------+---------+
|cardiovascular | 3 |
|metabolic | 0 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
Obviously I don't need the rows to be in any particular order, though descending would be nice.
What I've tried
Here's a query I've written:
select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
outcome_category,
count(outcome_category) as outcome_ct
from g
group by outcome_category) as sub
group by outcome_category, sub.treatment;
But it's a mishmash result:
+---------+----------------+---+
|treatment|outcome_category|sum|
+---------+----------------+---+
|1 |cardiovascular |3 |
|1 |sensory |2 |
|0 |metabolic |1 |
|1 |NULL |0 |
|0 |cns |1 |
+---------+----------------+---+
I'm trying to identify the "ever exposed" id's using that first line in the subquery: select max(treatment) as treatment. But I'm not quite getting at the rest of it.
EDIT
I realized that the toy dataset g I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g to reflect that many id's who are "ever treated" won't have a non-null outcome_category next to a row with treatment=1.
Interesting little problem. You can do:
select
outcome_category,
count(x.id) as count
from g
left join (
select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc
Result:
outcome_category count
----------------- -----
cardiovascular 3
sensory 1
cns 1
metabolic 0
See running example at db<>fiddle.
This would appear to be just a simple aggregation,
select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc
Demo fiddle

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

SQL add multiple calculated rows after using pivot

I have a table of data that looks like this:
|Agency|Rating |AVG|
------------------------
|Army |Exceptional|10 |
|Navy |Very Good |8.5|
And I need to pivot/count the number of each type of rating as well as calculate the totals and percentages of each rating category so it looks like this:
|Rating |Army|Navy|
--------------------------
|Exceptional |1 |0 |
|Very Good |0 |1 |
|Satisfactory |0 |0 |
|Marginal |0 |0 |
|Unsatisfactory|0 |0 |
|Total |1 |1 |
|% of Ex |100 |0 |
|% of VG |0 |100 |
|% of Sat |0 |0 |
|% of Mar |0 |0 |
|% of Uns |0 |0 |
Using the following Query:
select RatingWords as Rating, case when grouping([RatingWords]) = 1 then
'Total' else [RatingWords] end as [RatingWords], Sum([Navy]) as Navy,
sum([Army]) as Army
from
(select ratingwords, agency,
CASE WHEN Average BETWEEN 8.6 AND 10 THEN 1 else 0 end as Exceptional,
case WHEN Average BETWEEN 7.1 AND 8.5 THEN 1 else 0 end as VeryGood,
case WHEN Average BETWEEN 3.1 AND 7.0 THEN 1 else 0 end as Satisfactory,
case WHEN Average BETWEEN 0.1 AND 3.0 THEN 1 else 0 end as Marginal,
case WHEN Average = 0 THEN 1 ELSE 0 END as Unsatisfactory
from dbo.DOD_Average) as sourcetable
pivot
(
Count(exceptional)
for Agency in ([Navy], [Army])
) as PivotTable
group by grouping sets ((RatingWords),())
I get the following table:
|Rating |Army|Navy|
--------------------------
|Exceptional |1 |0 |
|Very Good |0 |1 |
|Satisfactory |0 |0 |
|Marginal |0 |0 |
|Unsatisfactory|0 |0 |
|Total |1 |1 |
So my question is, how can I add another group below the Total row to calculate the percentages? Or, if this is not the best way to create this report, i'm open to reworking it.
Keeping your above query in a table variable and UNION ALL the percentage queries. Something like this:
Declare #table table (Rating nvarchar(60), Army int, Navy int);
Insert into #table
select RatingWords as Rating, case when grouping([RatingWords]) = 1 then
'Total' else [RatingWords] end as [RatingWords], Sum([Navy]) as Navy,
sum([Army]) as Army
from
(select ratingwords, agency,
CASE WHEN Average BETWEEN 8.6 AND 10 THEN 1 else 0 end as Exceptional,
case WHEN Average BETWEEN 7.1 AND 8.5 THEN 1 else 0 end as VeryGood,
case WHEN Average BETWEEN 3.1 AND 7.0 THEN 1 else 0 end as Satisfactory,
case WHEN Average BETWEEN 0.1 AND 3.0 THEN 1 else 0 end as Marginal,
case WHEN Average = 0 THEN 1 ELSE 0 END as Unsatisfactory
from dbo.DOD_Average) as sourcetable
pivot
(
Count(exceptional)
for Agency in ([Navy], [Army])
) as PivotTable
group by grouping sets ((RatingWords),())
Declare #army int, #navy int;
Select #army = Army, #navy = Navy from #table where Rating = 'Total';
Select * from #table
union all
Select '% of'+Rating, (Army/#army)*100, (Navy/#navy)*100 from #table
where Rating <> 'Total'
Using your existing query as a CTE, you can SELECT * from it, and follow that with a series of UNION ALL queries that get each of the percentage rows that you want.

Sql Server Aggregation or Pivot Table Query

I'm trying to write a query that will tell me the number of customers who had a certain number of transactions each week. I don't know where to start with the query, but I'd assume it involves an aggregate or pivot function. I'm working in SqlServer management studio.
Currently the data is looks like where the first column is the customer id and each subsequent column is a week :
|Customer| 1 | 2| 3 |4 |
----------------------
|001 |1 | 0| 2 |2 |
|002 |0 | 2| 1 |0 |
|003 |0 | 4| 1 |1 |
|004 |1 | 0| 0 |1 |
I'd like to see a return like the following:
|Visits |1 | 2| 3 |4 |
----------------------
|0 |2 | 2| 1 |0 |
|1 |2 | 0| 2 |2 |
|2 |0 | 1| 1 |1 |
|4 |0 | 1| 0 |0 |
What I want is to get the count of customer transactions per week. E.g. during the 1st week 2 customers (i.e. 002 and 003) had 0 transactions, 2 customers (i.e. 001 and 004) had 1 transaction, whereas zero customers had more than 1 transaction
The query below will get you the result you want, but note that it has the column names hard coded. It's easy to add more week columns, but if the number of columns is unknown then you might want to look into a solution using dynamic SQL (which would require accessing the information schema to get the column names). It's not that hard to turn it into a fully dynamic version though.
select
Visits
, coalesce([1],0) as Week1
, coalesce([2],0) as Week2
, coalesce([3],0) as Week3
, coalesce([4],0) as Week4
from (
select *, count(*) c from (
select '1' W, week1 Visits from t union all
select '2' W, week2 Visits from t union all
select '3' W, week3 Visits from t union all
select '4' W, week4 Visits from t ) a
group by W, Visits
) x pivot ( max (c) for W in ([1], [2], [3], [4]) ) as pvt;
In the query your table is called t and the output is:
Visits Week1 Week2 Week3 Week4
0 2 2 1 1
1 2 0 2 2
2 0 1 1 1
4 0 1 0 0

selecting records matching condition A and at least X matching B

I have table of data as follows
|table_id|ref_table_id|is_used| date |url|
|--------+------------+-------+-------------------+---|
|1 |1 | | |abc|
|2 |1 | |2016-01-01 00:00:00|abc|
|3 |1 |0 | |abc|
|4 |1 |1 | |abc|
|5 |2 | | | |
|6 |2 | |2016-01-01 00:00:00|abc|
|7 |2 |1 | |abc|
|8 |2 |1 |2016-01-01 00:00:00|abc|
|9 |2 |1 |2016-01-01 00:00:00|abc|
|10 |3 | | | |
|11 |3 | |2016-01-01 00:00:00|abc|
|12 |3 |0 | | |
|13 |3 |0 | | |
|14 |3 |0 |2016-01-01 00:00:00| |
|15 |3 |1 |2016-01-01 00:00:00|abc|
...
|int |int |boolean|timestamp |varchar|
As it is obvious, the combination of null values and filled values in the columns is_used, date, url has no rules.
Now I want to get distinct ref_table_id with conditions
there is at least 1 row that is not used and has empty date and url
there are fewer than X rows that are not used and has filled either
date or url
The table has many rows (~7mil) and groupped ref_table_id can range from 50 rows to 600k rows.
I tried to create this select, which runs for more than 2secs.
select
distinct on (ref_table_id) t1.ref_table_id,
count(1) as my_count
from my_table t1 inner join (
select distinct t2.ref_table_id from my_table t2
where t2.is_used is not true -- null or false
and t2.url is null
and t2.date is null
group by t2.ref_table_id
) tjoin on t1.ref_table_id = tjoin.ref_table_id
where t1.is_used is not true
and (t1.date is not null
or t1.url is not null)
group by t1.ref_table_id
having my_count < X
order by 1,2;
Can I rewrite it using INTERSECT, VIEW or other db features so that it would be faster?
This sounds like aggregation with a having clause:
select ref_table_id
from my_table t
group by ref_table_id
having sum(case when is_used = 0 and date is null and url is null
then 1 else 0 end) > 0 and
sum(case when is_used = 0 and (date is not null or url is not null)
then 1 else 0 end) >= N;
This checks explicitly for is_used to be 0 as the meaning of "not used". I'm not sure what the blanks represent, so the logic may need to be tweaked.
As a note, you can simplify the query by removing the common condition on is_used:
select ref_table_id
from my_table t
where is_used = 0 -- or is_used is NULL ??
group by ref_table_id
having sum(case when date is null and url is null
then 1 else 0 end) > 0 and
sum(case when (date is not null or url is not null)
then 1 else 0 end) >= N;