I have a a table as below:
SourceCustomerId
BusinessDate
HasTaxBenifit
HasCollateral
HasLoan
BS:100037
2016-12-23
No
No
Yes
BS:100056
2018-01-13
No
Yes
No
BS:100037
2011-06-03
No
Yes
Yes
BS:100056
2019-10-14
Yes
No
No
BS:100022
2014-09-17
Yes
No
Yes
BS:100037
2013-07-18
Yes
Yes
No
BS:100056
2016-03-19
Yes
Yes
Yes
BS:100022
2015-04-20
Yes
No
No
BS:100022
2017-08-14
No
Yes
No
BS:100022
2012-11-23
No
Yes
No
And the output that I am expecting is
BinaryTaxBenefit
BinaryLoan
BinaryCollateral
diff_BinaryTaxBenefit
diff_BinaryLoan
diff_BinaryCollateral
0
0
0
NULL
NULL
NULL
1
0
1
1
0
1
1
0
0
0
0
-1
0
1
0
-1
1
0
0
1
1
NULL
NULL
NULL
1
1
0
0
-1
1
0
0
1
0
1
0
1
1
1
NULL
NULL
NULL
0
1
0
-1
0
1
1
0
0
0
0
0
To obtain this output, We need to follow three steps:
Partition the data by SourceCustomerID and then order the data by Source customer ID and Business Date
Create other columns BinaryTaxBenefit, BinaryLoan, BinaryCollateral The logic is every column has a binary equivalent and they will be having a value of 0 if the columns are having a value 'No'
Last and most difficult part subtract(BinaryColumn's only) the rows. But the subtraction must be within the group only.
So the first value difference is always NULL and the rest difference
I am able to write separate SQL queries for Step1 and Step2 :
Step1: Partition Data by SourceCustomerID and then order the data by Source customer ID and Business Date:
SELECT
SourceCustomerId,
BusinessDate,
ROW_NUMBER() OVER(PARTITION BY SourceCustomerId ORDER BY SourceCustomerId, BusinessDate ASC) RowNumber,
HasTaxBenifit,
HasLoan,
HasCollateral
from personDetail pd
from personDetail pd
Step2: Create other columns BinaryTaxBenefit, BinaryLoan, BinaryCollateral:
select * ,
(case when HasTaxBenifit = 'Yes' then 1 else 0 end) as BinaryTaxBenefit,
(case when HasLoan = 'Yes' then 1 else 0 end) as BinaryLoan,
(case when HasCollateral = 'Yes' then 1 else 0 end) as BinaryCollateral
from personDetail pd
How to I club step1 and Step2 into a single SQL Query?
Step3: Last and most difficult part subtract(Binary Column's only) the rows:
Here it is subtract all the rows without considering the gorup, not sure how to fix this
with v as (
select RowNumber, BinaryTaxBenefit, BinaryLoan, BinaryCollateral from personDetailTrial
)
select
RowNumber,BinaryTaxBenefit, BinaryLoan, BinaryCollateral,
BinaryTaxBenefit - lag(BinaryTaxBenefit, 1) over(order by RowNumber) as diff_BinaryTaxBenefit,
BinaryLoan - lag(BinaryLoan, 1) over(order by RowNumber) as diff_BinaryLoan,
BinaryCollateral - lag(BinaryCollateral, 1) over(order by RowNumber) as diff_BinaryCollateral
from v
if I understood, you can use CROSS APLY,
https://www.sqlshack.com/es/la-diferencia-entre-cross-apply-y-outer-apply-en-sql-server/
Related
I am working on a project in SQL Server with diagnosis codes and a patient can have up to 4 codes but not necessarily more than 1 and a patient cannot repeat a code more than once. However, codes can occur in any order. My goal is to be able to count how many times a Diagnosis code appears in total, as well as how often it appears in a set position.
My data currently resembles the following:
PtKey
Order #
Order Date
Diagnosis1
Diagnosis2
Diagnosis3
Diagnosis 4
345
1527
7/12/20
J44.9
R26.2
NULL
NULL
367
1679
7/12/20
R26.2
H27.2
G47.34
NULL
325
1700
7/12/20
G47.34
NULL
NULL
NULL
327
1710
7/12/20
I26.2
J44.9
G47.34
NULL
I would think the best approach would be to create a dummy column here that would match up the diagnosis by position. For example, Diagnosis 1 with A, and Diagnosis 2 with B, etc.
My current plan is to rollup the diagnosis using an unpivot:
UNPIVOT ( Diag for ColumnALL IN (Diagnosis1, Diagnosis2, Diagnosis3, Diagnosis4)) as unpvt
However, this still doesn’t provide a way to count the diagnoses by position on a sales order.
I want it to look like this:
Diagnosis
Total Count
Diag1 Count
Diag2 Count
Diag3 Count
Diag4 Count
J44.9
2
1
1
0
0
R26.2
1
1
0
0
0
H27.2
1
0
1
0
0
I26.2
1
1
0
0
0
G47.34
3
1
0
2
0
You can unpivot using apply and aggregate:
select v.diagnosis, count(*) as cnt,
sum(case when pos = 1 then 1 else 0 end) as pos_1,
sum(case when pos = 2 then 1 else 0 end) as pos_2,
sum(case when pos = 3 then 1 else 0 end) as pos_3,
sum(case when pos = 4 then 1 else 0 end) as pos_4
from data d cross apply
(values (diagnosis1, 1),
(diagnosis2, 2),
(diagnosis3, 3),
(diagnosis4, 4)
) v(diagnosis, pos)
where diagnosis is not null;
Another way is to use UNPIVOT to transform the columns into groupable entities:
SELECT Diagnosis, [Total Count] = COUNT(*),
[Diag1 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis1' THEN 1 ELSE 0 END),
[Diag2 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis2' THEN 1 ELSE 0 END),
[Diag3 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis3' THEN 1 ELSE 0 END),
[Diag4 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis4' THEN 1 ELSE 0 END)
FROM
(
SELECT * FROM #x UNPIVOT (Diagnosis FOR DiagGroup IN
([Diagnosis1],[Diagnosis2],[Diagnosis3],[Diagnosis4])) up
) AS x GROUP BY Diagnosis;
Example db<>fiddle
You can also manually unpivot via UNION before doing the conditional aggregation:
SELECT Diagnosis, COUNT(*) As Total Count
, SUM(CASE WHEN Position = 1 THEN 1 ELSE 0 END) As [Diag1 Count]
, SUM(CASE WHEN Position = 2 THEN 1 ELSE 0 END) As [Diag2 Count]
, SUM(CASE WHEN Position = 3 THEN 1 ELSE 0 END) As [Diag3 Count]
, SUM(CASE WHEN Position = 4 THEN 1 ELSE 0 END) As [Diag4 Count]
FROM
(
SELECT PtKey, Diagnosis1 As Diagnosis, 1 As Position
FROM [MyTable]
UNION ALL
SELECT PtKey, Diagnosis2 As Diagnosis, 2 As Position
FROM [MyTable]
WHERE Diagnosis2 IS NOT NULL
UNION ALL
SELECT PtKey, Diagnosis3 As Diagnosis, 3 As Position
FROM [MyTable]
WHERE Diagnosis3 IS NOT NULL
UNION ALL
SELECT PtKey, Diagnosis4 As Diagnosis, 4 As Position
FROM [MyTable]
WHERE Diagnosis4 IS NOT NULL
) d
GROUP BY Diagnosis
Borrowing Aaron's fiddle, to avoid needing to rebuild the schema from scratch, and we get this:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=d1f7f525e175f0f066dd1749c49cc46d
I have a table like as shown below
subject_id Desc Name class BC FU PA VI
1 Fung FU 0 1 0 0
1 Para PA 0 0 1 0
1 Viru VI 0 0 0 1
1 Para PA 0 0 1 0
1 T5 Bacte BC 1 0 0 0
1 T6 Bacte BC 1 0 0 0
2 T5 Bacte BC 1 0 0 0
2 Fung FU 1 0 0 0
What I would like to do is create a new column "BC_FU" which will have 1 if the subject has 1 in BC column and FU column. But it shouldn't be looking for in the same row but across all the records of the same subject.
The same logic applies for BC_VI column (another new column) as well
For ex: If you look at subject_id = 1, we can see that he has 1 in BC column at row 5 and 6 whereas he has FU = 1 at row 1. Now we know that subject_id has both BC and FU in this records (from database)
This is what I tried but doesn't help. I am trying to write in BigQuery, so your help to correct or fix this in BigQuery will be helpful
select *,
CASE WHEN (MAX(BC) == 1 AND MAX(FU) == 1) THEN 1
ELSE 0 END AS BC_FU,
CASE WHEN (MAX(BC) == 1 AND MAX(VI) == 1) THEN 1
ELSE 0 END AS BC_VI,
FROM TABLE T
GROUP BY SUBJECT_ID
So, I would like to create an output whick looks like below
subject_id Desc Name class BC FU PA VI BC_FU BC_VI
1 Fungi FU 0 1 0 0 1 1
1 Para PA 0 0 1 0 1 1
1 Virus VI 0 0 0 1 1 1
1 Para PA 0 0 1 0 1 1
1 T5 Bacte BC 1 0 0 0 1 1
1 T6 Bacte BC 1 0 0 0 1 1
2 T5 Bacte BC 1 0 0 0 1 1
2 Virus VI 0 1 0 1 1 1
You can do the following
select t1.*
,max(BC) over(partition by subject_id)
*max(FU) over(partition by subject_id) as BC_FU
,max(BC) over(partition by subject_id)
*max(VI) over(partition by subject_id) as BC_VI
from your_table t1
select t1.*, tmp.BC_FU, tmp.BC_VI
from your_table t1
join
(
select subject_id,
CASE WHEN MAX(BC) + MAX(FU) = 2 THEN 1 ELSE 0 END AS BC_FU,
CASE WHEN MAX(BC) + MAX(VI) = 2 THEN 1 ELSE 0 END AS BC_VI
from your_table
group by subject_id
) tmp on t1.subject_id = tmp.subject_id
If I understand your requirement correctly, you should just be able to use analytic functions here:
SELECT *,
CASE WHEN MAX(BC) OVER (PARTITION BY subject_id) +
MAX(FU) OVER (PARTITION BY subject_id) = 2 THEN 1 ELSE 0 END AS BC_FU,
CASE WHEN MAX(BC) OVER (PARTITION BY subject_id) +
MAX(VI) OVER (PARTITION BY subject_id) = 2 THEN 1 ELSE 0 END AS BC_VI
FROM yourTable
ORDER BY subject_id;
This answer avoids the need for an unnecessary subquery.
I've read your question and let me add my solution as well.
SELECT t1.*,
CASE WHEN (MAX(BC) OVER (PARTITION BY subject_id) +
MAX(FU) OVER (PARTITION BY subject_id) ) == 2
THEN 1
ELSE 0
END AS BC_FU
CASE WHEN (MAX(BC) OVER(PARTITION BY subject_id) +
MAX(VI) OVER(PARTITION BY subject_id) ) == 2
THEN 1
ELSE 0
END AS BC_VI
FROM table as t1
ORDER BY subject_id
Here are some tips which I thought might be useful for you:)
OVER () is sometimes referred to as window function. If you type SELECT (aggregating function) OVER(PARTITION BY columnA) you can use the aggregation function but not turning the result into single row at the same time. (Please ignore this if you already know)
As Mr. Tim already mentioned, it's good to remove unnecessary sub-query (a query inside the other) to increase the readability.
Be sure to add 'Else 0' in every CASE sentence so as not to make the possibility of causing 'NULL'.
Here, I chose to sum up the maximum value of BC and FU to check if it's 2 or not,
rather than not checking 'intersection' (e.g. max(BC) ==1 AND max(FU) ==1) like Mr.Tim's post.
That is because I thought you would probably add such column as 'BC_FU_VI' in the future,
when '2' would increase readability that this case sentence is actually trying to convert two columns into a single column.
Thank you.
I'm using the following SQL request to Informix DB:
select fromQ, toQ, count(callid) as cont_num, type
from some_table
group by fromQ, toQ, type
order by fromQ, toQ;
It produces the result:
fromq toq cont_num type
-----------------------------------
Sales 12 1
English 1 1
MB 59 1
Reception 3 2
Reception 53 1
Service 1 1
MB Sales 1 1
MB English 1 1
This is OK, as expected. Please note there are 2 rows for toq=Reception.
Field WRTYPE can have values only from 1 to 3.
So idea is to make an output like this:
fromq toq cont_num type1 type2 type3
------------------------------------------------
Sales 12 12 0 0
English 1 1 0 0
MB 59 59 0 0
Reception 56 53 3 0
Service 1 1 0 0
MB Sales 1 1 0 0
MB English 1 1 0 0
Is there a simple way to do this?
Use conditional aggregation:
select fromQ, toQ, count(callid) as cont_num,
sum(case when type = 1 then 1 else 0 end) as type_1,
sum(case when type = 2 then 1 else 0 end) as type_2,
sum(case when type = 3 then 1 else 0 end) as type_3
from some_table
group by fromQ, toQ
order by fromQ, toQ;
I have table like this
ID Specified TIN
-----------------
1 0 tin1
2 0 tin1
3 1 tin1
4 0 tin2
5 0 tin3
6 1 tin3
7 1 tin3
I need to count rows groupped by TIN, Specified columns - but result should one row for each TIN:
TIN ZEROSpecified NOTZEROSpecified
tin1 2 1
tin2 0 1
tin3 1 2
Important notice - i have only 2 values for Specified column - 0 and 1
SELECT TIN,
SUM(case when Specified=0 then 1 else 0 end) as ZeroSpecified,
SUM(case when Specified<>0 then 1 else 0 end) as NOTZEROSpecified
FROM table
GROUP BY TIN
Pretty Simple;
SELECT
TIN
,SUM(CASE WHEN Specified = 0 THEN 1 ELSE 0 END) ZEROSpecified
,SUM(CASE WHEN Specified <> 0 THEN 1 ELSE 0 END) NotZEROSpecified
FROM TableName
GROUP BY TIN
I am finding difficulty to frame a select query.
PFB, for the table and corresponding data:
ID DLS MATCH_STATUS LAST_UPDATE_TIME BO CH FT
1 0 0 09-07-2013 00:00:00 IT TE NA
1 1 1 09-07-2013 00:01:01 IT TE NA
2 0 0 09-07-2013 10:00:00 IP TE NA
3 0 0 09-07-2013 11:00:00 IT YT NA
3 2 2 09-07-2013 11:01:00 IT YT NA
Here
Match_Status 0-->Initial Record
1-->Singel Match
2-->Multi Match
For every record there will be a initial entry with match_status 0 and subsequent matching process end other number such as 1,2 will be update.
I am trying to retrieve records such as total record , waiting match ,single match and multi match group by BO, CH and FT
Below is the expected out put:
BO CH FT TOTAL_RECORD AWAITNG_MATCH SINGLE_MATCH MULTI_MATCH
IT TE NA 1 0 1 0
IP TE NA 1 1 0 0
IT YT NA 1 0 0 2
I have tried below query :
select BO,CH,FT,sum(case when match_status=0 then 1 else 0 end) as TOTAL_RECORD,
sum(case when match_status = 0 then 1 else 0 end) as AWAITING_MATCH,
sum(case when match_status = 1 then 1 else 0 end) as SINGLE_MATCH,
sum(case when match_status = 2 then 1 else 0 end) as MULTI_MATCH from
table1 where last_update_time >= current_timestamp-1
group by BO,CH,FT;
problem with the above query is, awaiting_match is getting populated same as total record as I understand because of match_status=0
Similarly I tried with
select BO,CH,FT,sum(case when match_status=0 then 1 else 0 end) as TOTAL_RECORD,
select (sum(case when t1.ms=0 then 1 else 0 end) from
(select max(match_status) as ms from table1 where last_update_time >= current_timestamp-1 group by id)t1) )awaiting_match,
sum(case when match_status = 1 then 1 else 0 end) as SINGLE_MATCH,
sum(case when match_status = 2 then 1 else 0 end) as MULTI_MATCH from
table1 where last_update_time >= current_timestamp-1
group by BO,CH,FT;
problem with the approach is awaiting_match is getting populated with the same value for subsequent row entry.
Please help me with a suitable query for the desired format.
Thanks a lot in advance.
It seems that you want the last match status. I am guessing that this is actually the maximum of the statuses. If so, the following solves the problem by first grouping on id and then doing the grouping to summarize:
select BO, CH, FT,
count(*) as TOTAL_RECORD,
sum(case when lastms = 0 then 1 else 0 end) as AWAITING_MATCH,
sum(case when lastms = 1 then 1 else 0 end) as SINGLE_MATCH,
sum(case when lastms = 2 then 1 else 0 end) as MULTI_MATCH
from (select id, bo, ch, ft, MAX(match_status) as lastms
from table1
where last_update_time >= current_timestamp-1
group by id, bo, ch, ft
) t
group by BO, CH, FT;
If you actually want the last update to provide the status for the id, then you can use row_number() to enumerate the rows for each id, order by update time descending, and choose the first one:
select BO, CH, FT,
count(*) as TOTAL_RECORD,
sum(case when lastms = 0 then 1 else 0 end) as AWAITING_MATCH,
sum(case when lastms = 1 then 1 else 0 end) as SINGLE_MATCH,
sum(case when lastms = 2 then 1 else 0 end) as MULTI_MATCH
from (select id, bo, ch, ft, match_status,
ROW_NUMBER() over (partition by id order by last_update_time desc) as seqnum
from table1
where last_update_time >= current_timestamp-1
) t
where seqnum = 1
group by BO, CH, FT;