How do I flatten a table for an entire population? - sql

I'm currently working to replace an update process which currently iterates over a very large table using a PL/SQL cursor, updating several columns with flattened data.
The query is structured such that the flattened results can only return a single row, by limiting to term and id. The term_eff column indicates when an activity should start appearing in results, but there is no current limitation for an end date. How can I return the flattened results of the test_activity table for all rows in the test_person table?
Test case tables:
create table test_person (id number,term varchar2(6));
create table test_activity(id number,term_eff varchar2(6),activity varchar2(10));
insert into test_person values(1,'201001');
insert into test_person values(1,'201101');
insert into test_person values(1,'201102');
insert into test_person values(2,'201001');
insert into test_person values(2,'201101');
insert into test_person values(2,'201102');
insert into test_activity values (1,'201001','Jump');
insert into test_activity values (1,'201001','Play');
insert into test_activity values (1,'201102','Run');
insert into test_activity values (2,'201001','Jump');
insert into test_activity values (2,'201101','Play');
insert into test_activity values (2,'201101','Run');
commit;
Here is the current query to return a single row. Would like a version of this that can return values for all rows in the test_person table.
select Max(CASE WHEN A.activity_rank = 1 THEN A.activity ELSE NULL END) AS activity1,
Max(CASE WHEN A.activity_rank = 2 THEN A.activity ELSE NULL END) AS activity2,
Max(CASE WHEN A.activity_rank = 3 THEN A.activity ELSE NULL END) AS activity3
from (SELECT id,
term_eff,
activity,
row_number() OVER (PARTITION BY ID ORDER BY term_eff desc) AS activity_rank
FROM test_activity
WHERE id = 1
AND term_eff <= '201001') A;
Edit: Expected results from the final query:
ID Term Activity1 Activity2 Activity3
1 201001 Jump Play
1 201101 Jump Play
1 201102 Jump Play Run
...

Basically, just remove the where condition and add a group by. You've already done the hard part:
select id,
Max(CASE WHEN A.activity_rank = 1 THEN A.activity ELSE NULL END) AS activity1,
Max(CASE WHEN A.activity_rank = 2 THEN A.activity ELSE NULL END) AS activity2,
Max(CASE WHEN A.activity_rank = 3 THEN A.activity ELSE NULL END) AS activity3
from (SELECT id, term_eff, activity,
row_number() OVER (PARTITION BY ID ORDER BY term_eff desc) AS activity_rank
FROM test_activity
WHERE term_eff <= '201001'
) A
group by id;

Related

Conditional SQL logic

I have a simple table of voting frequencies of registered voters
create table public.campaign_202206 (
registrant_id INTEGER not null references votecal.voter_registration (registrant_id),
voting_frequency smallint
);
I want to insert values into this table with the count of elections that the voter has participated in among the past four elections:
insert into campaign_202206 (
select registrant_id, count(*)
from votecal.voter_participation_history
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
However, if the count is 1, then I want to look at the participation from five elections ago on '2018-06-05' and if there is no participation in that election, I want to store the voting_frequency as 0 instead of 1.
insert into campaign_202206 (
select
registrant_id,
case
when count(*) = 1 then --- what goes here?
else count(*)
end as voting_frequency
from votecal.voter_participation_history
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
What would go in this case-when-then to get the value for this special case?
Use a correlated subquery as foloows:
insert into campaign_202206 (
select
registrant_id,
case when count(*) = 1 then
(
select count(*)
from votecal.voter_participation_history sqvph
where sqvph.election_date = '2018-06-05'
and sqvph.registrant_id = vph.registrant_id
)
else count(*)
end as voting_frequency
from votecal.voter_participation_history vph
where election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by registrant_id
);
The resultset providers in the query need aliases for this to work.
User nested case:
insert into campaign_202206 (
select
registrant_id,
case
when count(*) = 1 then
case
when (select count(*) from voter_participation_history
where election_date in ('2018-06-05') and registrant_id
= v1.registrant_id) > 0
then 1
else 0
end
else count(*)
end as voting_frequency from voter_participation_history v1 where
election_date in ('2021-09-14', '2020-11-03', '2020-03-03', '2018-11-06')
group by v1.registrant_id);

Updating a table using Case statements in SQL

I am trying to add a 0, 1, or null to a column in a specific category where a relativepersonid of a person has a diagdate up to a person's servicedate. Here are my tables:
DROP TABLE ICDCodes_w;
GO
CREATE TABLE ICDCodes_w
(
AnxietyDisorder VARCHAR(6),
DepressiveDisorder VARCHAR(6),
PTSD VARCHAR(6)
);
INSERT INTO ICDCodes_w
(
AnxietyDisorder,
DepressiveDisorder,
PTSD
)
VALUES
('293.84', '296.2', '309.81'),
('300', '296.21', 'F43.1'),
('305.42', 'F11.28', 'F31.76'),
('305.81', 'F43.8', 'F31.78'),
('F40.00', 'F43.10', '305.52');
GO
DROP TABLE DiagHX_w;
GO
CREATE TABLE DiagHX_w
(
ArchiveID VARCHAR(10),
RelativePersonID VARCHAR(10),
ICDCode VARCHAR(6),
DiagDate DATE
);
INSERT INTO DiagHX_w
(
ArchiveID,
RelativePersonID,
ICDCode,
DiagDate
)
VALUES
('1275741', '754241', '293.84', '1989-01-03'),
('2154872', '754241', '293.84', '1995-04-07'),
('4587215', '754241', '998.4', '1999-12-07'),
('4588775', '711121', 'F11.28', '2001-02-07'),
('3545455', '711121', NULL, NULL),
('9876352', '323668', '400.02', '1988-04-09'),
('3211514', '112101', 'F31.78', '2005-09-09'),
('3254548', '686967', 'F40.00', '1999-12-31'),
('4411144', '686967', '305.52', '2000-01-01'),
('6548785', '99999999','F40.00', '2000-02-03');
GO
DROP TABLE PatientFlags_w;
GO
CREATE TABLE PatientFlags_w
(
PersonID VARCHAR(10),
RelativePersonID VARCHAR(10),
AnxietyDisorder VARCHAR(2),
DepressiveDisorder VARCHAR(2),
PTSD VARCHAR(2),
);
INSERT INTO PatientFlags_w
(
PersonID,
RelativePersonID
)
VALUES
('99999999', '754241'),
('88888888', '754241'),
('77777777', '754241'),
('66666666', '711121'),
('55555555', '711121'),
('44444444', '323668'),
('33333333', '112101'),
('22222222', '686967'),
('11111111', '686967'),
('32151111', '887878'),
('78746954', '771125'),
('54621333', '333114'),
('55648888', '333114');
GO
DROP TABLE Person_w;
GO
CREATE TABLE Person_w
(
PersonID VARCHAR(10),
ServiceDate date
);
INSERT INTO Person_w
(
PersonID,
ServiceDate
)
VALUES
('99999999', '2000-12-31'),
('88888888', '2000-11-01'),
('69876541', '2000-09-04'),
('66666666', '2000-01-15'),
('55555555', '2000-07-22'),
('44444444', '2000-07-20'),
('65498711', '2000-11-17'),
('22222222', '2000-09-02'),
('11111111', '2000-02-04'),
('32151111', '2000-02-17'),
('78746954', '2000-03-29'),
('54621333', '2000-08-22'),
('55648888', '2000-10-20');
Here is my update statement:
UPDATE a
SET AnxietyDisorder = CASE
WHEN ICDCode IN
(
SELECT AnxietyDisorder FROM
Project..ICDCodes_w
) THEN
1
ELSE
0
END,
DepressiveDisorder = CASE
WHEN ICDCode IN
(
SELECT DepressiveDisorder FROM
Project..ICDCodes_w
) THEN
1
ELSE
0
END,
PTSD = CASE
WHEN ICDCode IN
(
SELECT PTSD FROM Project..ICDCodes_w
) THEN
1
ELSE
0
END
FROM PatientFlags_w a
JOIN DiagHX_w b
ON a.relativepersonid = b.RelativePersonID
JOIN Person_w p
ON a.personid = p.PersonID
WHERE diagdate <= p.servicedate;
This works on some values, but there are some that don't get updated. I know the issue is with my case statement and probably a join issue. What is a better way to write this? Here is an example query I used to check. The PTSD column should have a 1.
SELECT * FROM project..patientflags_w a
JOIN project..diaghx_w b
ON a.relativepersonid = b.RelativePersonID
JOIN project..person_w p
ON a.personid = p.personid
WHERE b.icdcode IN (SELECT PTSD FROM Project..ICDCodes_w)
AND b.diagdate <= p.servicedate
I did ask this question the other day, but my sample tables were all messed up, so I've verified that they work this time.
At first glance, the problem with your query is that you update the target (PatientFlags_w) multiple times: once for each flag. In some cases you seem to be ending up with the correct result, but its just by luck.
It's hard to tell if you want one row per person in the flag table, or one row per flag.
Can you review these queries and let us know if they are close to your desired results:
-- If you want one row per Person:
select RelativePersonID,
[AnxietyDisorder] = max(case when c.AnxietyDisorder is not null then 1 else 0 end),
[DepressiveDisorder] = max(case when c.DepressiveDisorder is not null then 1 else 0 end),
[PTSD] = max(case when c.PTSD is not null then 1 else 0 end)
from DiagHX_w d
left
join ICDCodes_w c on d.ICDCode in (c.AnxietyDisorder, c.DepressiveDisorder, c.PTSD)
group
by RelativePersonID;
-- If you want one row per Flag:
select RelativePersonID,
d.ICDCode,
[AnxietyDisorder] = case when c.AnxietyDisorder is not null then 1 else 0 end,
[DepressiveDisorder] = case when c.DepressiveDisorder is not null then 1 else 0 end,
[PTSD] = case when c.PTSD is not null then 1 else 0 end
from DiagHX_w d
left
join ICDCodes_w c on d.ICDCode in (c.AnxietyDisorder, c.DepressiveDisorder, c.PTSD);
If the diagnoses are not related to each other (I assumed since they are in the same table), you might want this instead:
select RelativePersonID,
[AnxietyDisorder] = max(case when c.AnxietyDisorder = d.ICDCode then 1 else 0 end),
[DepressiveDisorder] = max(case when c.DepressiveDisorder = d.ICDCode then 1 else 0 end),
[PTSD] = max(case when c.PTSD = d.ICDCode then 1 else 0 end)
from DiagHX_w d
left
join ICDCodes_w c on d.ICDCode in (c.AnxietyDisorder, c.DepressiveDisorder, c.PTSD)
group
by RelativePersonID;

How to get the name and values of columns which have different values in the same table's consecutive rows

I have a table History with columns as follows:
HistoryId---User---Tag---Updateddate----DeptId
1 B12 abc 10-08-2017 D34
2 B24 abc 11-08-2017 D34
3 B24 def 12-08-2017 D34
I have a query
SELECT
*
FROM History
WHERE DeptId = 'D34'
ORDER BY Updateddate
The result of this query gives me the above 3 rows.
Now out of these rows I want the name and value of columns which have different values in consecutive rows.
Something like:
HistoryId-----Column-----Value
2 User B24
3 Tag def
Is there a way to do this?
If there are no null values for tag or user, then one way you could achieve this is by using a combination of LAG() and a CROSS APPLY to check whether any values are different.
SELECT H.HistoryID, C.Col, C.Val
FROM (
SELECT *, PrevUser = LAG([User]) OVER (ORDER BY HistoryID), PrevTag = LAG([Tag]) OVER (ORDER BY HistoryID)
FROM [History] AS H
) AS H
CROSS APPLY (
VALUES
('User', CASE WHEN PrevUser != [User] THEN [User] END),
('Tag', CASE WHEN PrevTag != Tag THEN Tag END)
) AS C(Col, Val)
WHERE C.Val IS NOT NULL;
If there are null values, it gets a bit more complicated, but the basic idea would be the same, you'd just have to add in rules to check for null values (and ignore the first row).
EDIT: If you needed to check for null values too, one way to do it would be like the following...
DECLARE #History TABLE (HistoryID INT, [User] CHAR(3), [Tag] CHAR(3));
INSERT #History VALUES
(1,'B12','abc'),
(2,'B24','abc'),
(3,'B24','def'),
(4,NULL,'def'),
(5,'A24',NULL),
(6,NULL,NULL),
(7,'123','456');
SELECT H.HistoryID, C.Col, C.Val
FROM (
SELECT *, RN = ROW_NUMBER() OVER (ORDER BY HistoryID), PrevUser = LAG([User]) OVER (ORDER BY HistoryID), PrevTag = LAG([Tag]) OVER (ORDER BY HistoryID)
FROM #History AS H
) AS H
CROSS APPLY (
VALUES
('User', CASE WHEN RN != 1 AND (PrevUser != [User] OR (PrevUser IS NULL AND [User] IS NOT NULL) OR (PrevUser IS NOT NULL AND [User] IS NULL)) THEN [User] END, CASE WHEN RN != 1 AND (PrevUser != [User] OR (PrevUser IS NULL AND [User] IS NOT NULL) OR (PrevUser IS NOT NULL AND [User] IS NULL)) THEN 1 END),
('Tag', CASE WHEN RN != 1 AND (PrevTag != Tag OR (PrevTag IS NULL AND Tag IS NOT NULL) OR (PrevTag IS NOT NULL AND Tag IS NULL)) THEN Tag END, CASE WHEN RN != 1 AND (PrevTag != Tag OR (PrevTag IS NULL AND Tag IS NOT NULL) OR (PrevTag IS NOT NULL AND Tag IS NULL)) THEN 1 END)
) AS C(Col, Val, Chk)
WHERE C.Chk = 1;
You can unpivot your results and then check them for change over consecutive rows like below
select
HistoryId,
column,
values
into #temp_up_values
from
(
select *
from History
where DeptId='D34'
)src
unpivot
(
values
for column
in ([User],[Tag],[Updateddate],[DeptId])
)up
select t2.*
from
#temp_up_values t1
join
#temp_up_values t2
on t1.HistoryId=t2.HistoryId +1
and t1.column=t2.column and t1.values<>t2.values

Calculation of occurrence of strings

I have a table with 3 columns, id, name and vote. They're populated with many registers. I need that return the register with the best balance of votes. The votes types are 'yes' and 'no'.
Yes -> Plus 1
No -> Minus 1
This column vote is a string column. I am using SQL SERVER.
Example:
It must return Ann for me
Use conditional Aggregation to tally the votes as Kannan suggests in his answer
If you really only want 1 record then you can do it like so:
SELECT TOP 1
name
,SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) AS VoteTotal
FROM
#Table
GROUP BY
name
ORDER BY
VoteTotal DESC
This will not allow for ties but you can use this method which will rank the responses and give you results use RowNum to get only 1 result or RankNum to get ties.
;WITH cteVoteTotals AS (
SELECT
name
,SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) AS VoteTotal
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) DESC) as RowNum
,DENSE_RANK() OVER (PARTITION BY 1 ORDER BY SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) DESC) as RankNum
FROM
#Table
GROUP BY
name
)
SELECT name, VoteTotal
FROM
cteVoteTotals
WHERE
RowNum = 1
--RankNum = 1 --if you want with ties use this line instead
Here is the test data used and in the future do NOT just put an image of your test data spend the 2 minutes to make a temp table or a table variable so that people you are asking for help do not have to!
DECLARE #Table AS TABLE (id INT, name VARCHAR(25), vote VARCHAR(4))
INSERT INTO #Table (id, name, vote)
VALUES (1, 'John','no'),(2, 'John','no'),(3, 'John','yes')
,(4, 'Ann','no'),(5, 'Ann','yes'),(6, 'Ann','yes')
,(9, 'Marie','no'),(8, 'Marie','no'),(7, 'Marie','yes')
,(10, 'Matt','no'),(11, 'Matt','yes'),(12, 'Matt','yes')
Use this code,
;with cte as (
select id, name, case when vote = 'yes' then 1 else -1 end as votenum from register
) select name, sum(votenum) from cte group by name
You can get max or minimum based out of this..
This one gives the 'yes' rate for each person:
SELECT Name, SUM(CASE WHEN Vote = 'Yes' THEN 1 ELSE 0 END)/COUNT(*) AS Rate
FROM My_Table
GROUP BY Name

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))