Sql Server Aggregation or Pivot Table Query - sql

I'm trying to write a query that will tell me the number of customers who had a certain number of transactions each week. I don't know where to start with the query, but I'd assume it involves an aggregate or pivot function. I'm working in SqlServer management studio.
Currently the data is looks like where the first column is the customer id and each subsequent column is a week :
|Customer| 1 | 2| 3 |4 |
----------------------
|001 |1 | 0| 2 |2 |
|002 |0 | 2| 1 |0 |
|003 |0 | 4| 1 |1 |
|004 |1 | 0| 0 |1 |
I'd like to see a return like the following:
|Visits |1 | 2| 3 |4 |
----------------------
|0 |2 | 2| 1 |0 |
|1 |2 | 0| 2 |2 |
|2 |0 | 1| 1 |1 |
|4 |0 | 1| 0 |0 |
What I want is to get the count of customer transactions per week. E.g. during the 1st week 2 customers (i.e. 002 and 003) had 0 transactions, 2 customers (i.e. 001 and 004) had 1 transaction, whereas zero customers had more than 1 transaction

The query below will get you the result you want, but note that it has the column names hard coded. It's easy to add more week columns, but if the number of columns is unknown then you might want to look into a solution using dynamic SQL (which would require accessing the information schema to get the column names). It's not that hard to turn it into a fully dynamic version though.
select
Visits
, coalesce([1],0) as Week1
, coalesce([2],0) as Week2
, coalesce([3],0) as Week3
, coalesce([4],0) as Week4
from (
select *, count(*) c from (
select '1' W, week1 Visits from t union all
select '2' W, week2 Visits from t union all
select '3' W, week3 Visits from t union all
select '4' W, week4 Visits from t ) a
group by W, Visits
) x pivot ( max (c) for W in ([1], [2], [3], [4]) ) as pvt;
In the query your table is called t and the output is:
Visits Week1 Week2 Week3 Week4
0 2 2 1 1
1 2 0 2 2
2 0 1 1 1
4 0 1 0 0

Related

In PostgreSQL, conditionally count rows

Background
I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g that looks like this:
+--+---------+----------------+
|id|treatment|outcome_category|
+--+---------+----------------+
|a |1 |cardiovascular |
|a |0 |cardiovascular |
|b |0 |metabolic |
|b |0 |sensory |
|c |1 |NULL |
|c |0 |cardiovascular |
|c |1 |sensory |
|d |1 |NULL |
|d |0 |cns |
+--+---------+----------------+
The Problem
I'd like to get a count of outcome_category by outcome_category for those id who are "ever treated" -- defined as "id's who have any row where treatment=1".
Here's the desired result:
+----------------+---------+
|outcome_category| count |
+----------------+---------+
|cardiovascular | 3 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
It would be fine if the result had to contain metabolic, like so:
+----------------+---------+
|outcome_category|treatment|
+----------------+---------+
|cardiovascular | 3 |
|metabolic | 0 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
Obviously I don't need the rows to be in any particular order, though descending would be nice.
What I've tried
Here's a query I've written:
select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
outcome_category,
count(outcome_category) as outcome_ct
from g
group by outcome_category) as sub
group by outcome_category, sub.treatment;
But it's a mishmash result:
+---------+----------------+---+
|treatment|outcome_category|sum|
+---------+----------------+---+
|1 |cardiovascular |3 |
|1 |sensory |2 |
|0 |metabolic |1 |
|1 |NULL |0 |
|0 |cns |1 |
+---------+----------------+---+
I'm trying to identify the "ever exposed" id's using that first line in the subquery: select max(treatment) as treatment. But I'm not quite getting at the rest of it.
EDIT
I realized that the toy dataset g I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g to reflect that many id's who are "ever treated" won't have a non-null outcome_category next to a row with treatment=1.
Interesting little problem. You can do:
select
outcome_category,
count(x.id) as count
from g
left join (
select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc
Result:
outcome_category count
----------------- -----
cardiovascular 3
sensory 1
cns 1
metabolic 0
See running example at db<>fiddle.
This would appear to be just a simple aggregation,
select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc
Demo fiddle

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

How to subset the readmitted cases from an inpatients’ table to calculate the total length of stay of the readmitted cases in SQL Server 17?

I am working with an inpatients' data table that looks like the following:
ID | AdmissionDate |DischDate |LOS |Readmitted30days
+------+-------+-------------+---------------+---------------+
|001 | 2014-01-01 | 2014-01-12 |11 |1
|101 | 2014-02-05 | 2014-02-12 |7 |1
|001 | 2014-02-18 | 2018-02-27 |9 |1
|001 | 2018-02-01 | 2018-02-13 |12 |0
|212 | 2014-01-28 | 2014-02-12 |15 |1
|212 | 2014-03-02 | 2014-03-15 |13 |0
|212 | 2016-12-23 | 2016-12-29 |4 |0
|1011 | 2017-06-10 | 2017-06-21 |11 |0
|401 | 2018-01-01 | 2018-01-11 |10 |0
|401 | 2018-10-01 | 2018-10-10 |9 |0
I want to create another table from the above in which the total length of stay (LOS) is summed up for those who have been readmitted within 30 days. The table I want to create looks like the following:
ID |Total LOS
+------+-----------
|001 |39
|212 |28
|212 |4
|1011 |11
|401 |10
|401 |9
I am using SQL Server Version 17.
Could anyone help me do this?
Thanks in advance
The Readmitted30days column seems irrelevant to the question and a complete red herring. What you seem to want is to aggregate rows which are within 30 days of each other.
This is a type of gaps-and-islands problem. There are a number of solutions, here is one:
We use LAG to check whether the previous DischDate is within 30 days of this AdmissionDate
Based on that we assign a grouping ID by doing a running count
Then simply group by ID and our grouping ID, and sum
The dates and LOS don't seem to match up, so I've given you both
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN
DATEADD(day, -30, AdmissionDate) <
LAG(DischDate) OVER (PARTITION BY ID ORDER BY DischDate)
THEN 1 END
FROM YourTable
),
Groupings AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY ID ORDER BY DischDate ROWS UNBOUNDED PRECEDING)
FROM StartPoints
)
SELECT
ID,
TotalBasedOnDates = SUM(DATEDIFF(day, AdmissionDate, DischDate)), -- do you need to add 1 within the sum?
TotalBasedOnLOS = SUM(LOS)
FROM Groupings
GROUP BY ID, GroupID;
db<>fiddle
if I understand correctly :
select Id, sum(LOS)
from tablename
where Readmitted30days = 1
group by Id
You want to use aggregation:
select id, sum(los)
from t
group by id
having max(Readmitted30days) = 1;
This filters after the aggregation so all los values are included in the sum.
EDIT:
I think I understand. Every occasion where Readmitted30days = 0, you want a row in the result set that combines that row with the following rows up to the next matching row.
If that interpretation is correct, you can construct groups using a cumulative sum and then aggregate:
select id, sum(los)
from (select t.*,
sum(1 - Readmitted30days = 0) over (partition by id order by admissiondate) as grp
from t
) t
group by id, grp;

How to group query results based on date in one month

I have database looks like below:
ID |DATETIME |T_NUMBER|SOLD|STORE_ID|
---+----------+--------+----+--------+
1 |2019-02-01|1111 |10 |STORE_1
-------------------------------------|
2 |2019-02-01|1112 |5 |STORE_1
-------------------------------------|
3 |2019-02-02|1113 |10 |STORE_1
-------------------------------------|
4 |2019-02-02|1114 |7 |STORE_1
-------------------------------------|
5 |2019-02-02|1115 |3 |STORE_1
-------------------------------------|
6 |2019-02-03|1116 |4 |STORE_1
-------------------------------------| etc.
And the result that what i want looks like below:
STORE | 1 | 2 | 3 | 4 | 5 | ..... |28|
-------+---+---+---+---+---+-------+--+
STORE_1| 2 | 3 | 1 | 0 | 0 | ..... |0 |
---------------------------------------
STORE_2| X | X | X | X | X | ..... |X |
A little bit explanation: Number 1, 2 ,3 ... 28 in the header means DATE in feb.
Number 2,3,1,0 .... 0 that means Sum of transactions per DATE. The report that i want in one month. Store_2 means if any store data that i have in the future.
My T-SQL looks like below (absolutelly wrong)
select SUM(T_NUMBER) as 'Total'
from store_logs
group by cast(time as date)
Thanks a lot
You can use PIVOT operator.
Sum of transactions per DATE do you mean you wanted the COUNT of number of transactions per day ? If this is the case change SUM (T_NUMBER) to COUNT (T_NUMBER)
SELECT *
FROM (
SELECT [STORE_ID], [DAY] = DATEPART (DAY , [DATETIME])
FROM store_logs
) AS D
PIVOT
(
SUM (T_NUMBER)
FOR DAY IN ([1], [2], [3], [4], [5], ... [31])
) AS P

Hive conditional count by resetting counter?

I have two hive tables, customers and transaction.
customer table
---------------------------------
customer_id | account_threshold
---------------------------------
101 | 200
102 | 500
transaction table
-------------------------------------------
transaction_date | customer_id | amount
-------------------------------------------
07/01/2018 101 250
07/01/2018 102 450
07/02/2018 101 500
07/03/2018 102 100
07/04/2018 102 50
Result:
------------------------------
customer_id | breach_count
------------------------------
101 2
102 1
I have to count the number of instances the sum of amount in transaction table exceeds the account_threshold in customer table.
When a breach is detected I reset the counter to 0.
For customer 101, the first transaction is above threshold so, the breach count is 1. Then again there is a breach for 101 in 3rd transaction. Hence, the total breach count for 101 is 2.
for customer 102, the first transaction(450) is below the threshold. Next transaction for 102 is $100 which breaches the threshold of 500, so breach_count will be 1.
I have tried windowing but I am not able to get any clue how to proceed by joining two tables.
You can try to write a subquery to get accumulate amount order by amount by customer_id, then Outer JOIN base on customer then Count
SELECT t.customer_id, COUNT(t.totle) breach_count
FROM customer c
LEFT JOIN
(
select t1.*,SUM(t1.amount) OVER(PARTITION BY t1.customer_id order by t1.amount) as totle
from transaction1 t1
) t on c.customer_id = t.customer_id
WHERE c.account_threshold < t.totle
GROUP BY t.customer_id
Here is a sqlfildde from Sqlserver, although different DBMS, but the windows function syntax is the same
[Results]:
| customer_id | breach_count |
|-------------|--------------|
| 101 | 2 |
| 102 | 1 |
To reset count/rank/sum whenever value changes
Input table :-
Time | value
12 |A
13 |A
14 |C
15 |C
16 |B
17 |B
18 |A
You just need to take lag to know about previous value
Step 1.Select *, lag(status) as lagval
Now compare lag value to actual value and if it differs take it 1 else 0 ( take this column as flag)
Step 2. Select * , case when lagval! = status then 1 else 0
Now do sum over flag take it as running sum - you will get sum values different for each group, group means whenver value changed its a new group
Step 3. Select *, sum(flag) over (order by time) flag_sum
Now just row number on each group
Step 4.Select Rownumber() over (partition by flag_sum order by time)
Final result
Time | value | lagval | flag | flag_sum | rownumber
12 |A | null | 1 | 1 | 1
13 |A | A |0 |1 |2
14 |C |A |1 |2 |1
15 |C | C |0 |2 |2
16 |B |C |1 | 3 |1
17 |B |B |0 |3 |2
18 |A |B |1 |4 |1
You can use sum / count in place of rownumber whatever you want to reset whenever value changes.