Exclude an item only when there are multiple entries

Exclude an item only when there are multiple entries - sql

I have a table of data with policy information:
Row PolicyNumber Member ID PolicyName
1 1234 789 Main
2 1265 789 Travel
3 1523 541 Travel
4 6778 374 Main
5 5821 123 Main
6 8763 123 Travel
I want to count the number of distinct MEMBERID's within each policynumber. However, some members have more than one policynumber because some members have a travel policy and a main policy.
So I want SQL to check for a count of POLICYNUMBER by MEMBERID first, and where the POLICYNUMBER count is greater than one, to bring only the policy numbers where the POLICYNAME is not like '%travel%'
But, if the member ONLY has one policy, I want SQL to include that policy even if it's travel.
So in my sample data:
Member 789 would be counted against policy 1234 but not 1265
Member 541 would be counted against policy 541
Member 374 would be counted against policy 6778
Member 123 would be counted against policy 5821 but not 8763
Desired result would be a table of two columns, PolicyNumber and the distinct count of MemberId associated to that policy.
How do I write the syntax?

You can achieve this using multiple having conditions -
SELECT Member_ID, COUNT(*)
FROM YOUR_TABLE
GROUP BY Member_ID
HAVING COUNT(PolicyName) = 1
OR COUNT(CASE WHEN PolicyName NOT LIKE '%travel%' THEN 1 ELSE 0 END) >= 1;

Related

Create column with unique identifier when adjacent fields are identical

The below is an example of an existing table (except for the activity instance column). I'd like the activity_instance column to create/add a numeric identifier each time a unique combination presents itself in the three adjacent columns for each individual (unique_id), i.e. when unique_id, activity and date match, it's assigned the instance of 1 for that person, and so on. This same combination could appear more than once later in the dataset.
The idea is to distinguish which events belong together and which not. This instance identifier should be unique, also among different cases and activities.
unique_id
activity
date
activity_instance
1234
activity_a
2016-04-01
1
1234
activity_a
2016-04-01
1
1234
activity_b
2016-04-01
2
5678
activity_a
2019-09-01
1
5678
activity_a
2019-09-01
1
65431
activity_c
2019-09-01
1
1234
activity_a
2019-09-01
3

using dense_rank :
select *
, dense_rank() over (partition by unique_id order by date,activity) as activity_instance
from tablename

Get the running unique count of items till a give date, similar to running total but instead a running unique count

I have a table with user shopping data as shown below
I want an output similar to running total but instead I want the running total of the count of unique categories that the user has shopped for by date.
I know I have to make use of ROWS PRECEDING AND FOLLOWING in the count function but I am not able to user count(distinct category) in a window function
Dt category userId
4/10/2022 Grocery 123
4/11/2022 Grocery 123
4/12/2022 MISC 123
4/13/2022 SERVICES 123
4/14/2022 RETAIl 123
4/15/2022 TRANSP 123
4/20/2022 GROCERY 123
Desired output
Dt userID number of unique categories
4/10/2022 123 1
4/11/2022 123 1
4/12/2022 123 2
4/13/2022 123 3
4/14/2022 123 4
4/15/2022 123 5
4/20/2022 123 5

Consider below approach
select Dt, userId,
( select count(distinct category)
from t.categories as category
) number_of_unique_categories
from (
select *, array_agg(lower(category)) over(partition by userId order by Dt) categories
from your_table
) t
if applied to sample data in your question - output is

Identifying Records Where a String Appears More Than Once

I have a following dataset that looks like:
ID Medication Dose
1 Aspirin 4
1 Tylenol 7
1 Aspirin 2
1 Ibuprofen 1
2 Aspirin 6
2 Aspirin 2
2 Ibuprofen 6
2 Tylenol 4
3 Tylenol 3
3 Tylenol 7
3 Tylenol 2
I would like to develop a code that would identify patients who have been administered a medication more than once. So for example, ID 1 had Aspirin twice, ID 2 had Aspirin twice and ID 3 had Tylenol three times.
I could be wrong but I think the easiest way to do this would be to concatenate each ID based on Medication using a code similar to the one below; but I'm not quite sure what to do after that - is it possible to count if a string appears twice within a cell?
SELECT DISTINCT ST2.[ID],
SUBSTRING(
(
SELECT ','+ST1.Medication AS [text()]
FROM ED_NOTES_MASTER ST1
WHERE ST1.[ID] = ST2.[ID]
Order BY [ID]
FOR XML PATH ('')
), 1, 200000) [Result]
FROM ED_NOTES_MASTER ST2
I would like the output to look like the following:
ID MEDICATION Aspirin2x Tylenol2x Ibuprofen2x
1 Aspirin, Tylenol , Aspirin YES NO NO
2 Ibuprofen, Aspirin, Aspirin YES NO NO
3 Tylenol, Tylenol ,Tylenol NO YES NO

For the first part of your question (identify patients that have had a particular medication more than once), you can do this using GROUP BY to group by the ID and medication, and then using COUNT to get how many times each medication was given to each patient. For example:
SELECT ID, Medication, COUNT(*) AS amount
FROM ST2
GROUP BY ID, Medication
This will give you a list of all ID - Medication combinations that appear in the table and a count of how many times each combo appears. To limit these results down to just those that are greater than 2, you can add a condition to the COUNTed field using HAVING:
SELECT ID, Medication, COUNT(*) AS amount
FROM ST2
GROUP BY ID, Medication
HAVING amount >= 2
The problem now is formatting the results in the way you want. What you will get from the query above is a list of all patient - medication combinations that came up in the table more than once, like this:
ID | Medication | Count
------+---------------+-------
1 | Aspirin | 2
2 | Aspirin | 2
3 | Tylenol | 3
I'd suggest that you try and work with this format if possible, because as you have found, to get multiple values returned in a comma delimited list as you have in your Medication column you have to resort to some hacks to get it to work (although a recent version of SQL Server does implement some sort of proper group concatenation functionality.). If you really need the Aspirin2x etc. columns, take a look at the PIVOT operation in SQL Server.

Distributing Records Evenly From One Table to Another

I have 3 tables:
Users
-----
UserID (varchar)
Active (bit)
Refunds_Upload
--------------
BorrowerNumber (varchar)
Refunds
-------
BorrowerNumber
UserID
I first select all of the UserID values where Active = 1.
I need to insert the records from Refunds_Upload to Refunds but I need to insert the same (or as close as possible) number of records for each Active UserID.
For example, if Refunds_Upload has 20 records and the Users table has 5 people where Active = 1, then I would need to insert 4 records per UserID into table Refunds.
End Result would be:
BorrowerNumber UserID
105 Fred
110 Fred
111 Fred
115 Fred
120 Billy
122 Billy
123 Billy
125 Billy
130 Lucius
131 Lucius
133 Lucius
135 Lucius
138 Lucy
139 Lucy
140 Lucy
141 Lucy
142 Grady
143 Grady
144 Grady
145 Grady
Of course, it won't always come to an even number of records per User so I need to account for that as well.

First run this and check it returns something like what you want to insert, before you uncomment the insert and actually carry it out..
--INSERT INTO Refunds
SELECT
numbered_u.UserID,
numbered_ru.BorrowerNumber
FROM
(SELECT u.*, ROW_NUMBER() OVER(ORDER BY UserID) - 1 as rown, SUM(CAST(Active as INT)) OVER() as count_users FROM Users u WHERE active=1) numbered_u
INNER JOIN
(SELECT ru.*, ROW_NUMBER() OVER(ORDER BY BorrowerNumber) - 1 as rown, COUNT(*) OVER() as count_ru FROM Refund_Uploads ru) numbered_ru
ON
ROUND(CAST(numbered_ru.rown AS FLOAT) / (count_ru / count_users)) = numbered_u.rown
The logic:
We number every interesting (active=1) row in users and we also count them all. This should return us all 5 users, numbered 0 to 4 and with a ctr that is 5 on each row.
Then we join them to a similarly numbered list of Refund_Uploads (say 20). Similarly, those rows will be numbered 0 to 19 for mathematical reasons that become apparent later. We also count all these rows too
And we then join these two datasets together but the condition is a range of values rather than exact values. The logic is "refund_upload row number, divided by the_count_of_rows_there_should_be_per_user" (i.e. 0..19 / (20/5) ) = user_row_number. Hopefully thus refund rows 0 to 3, associate with user 0, refund rows 4 thru 7 associate with user 1.. etc
It's a little hard to debug without full data - I feel it might need a few +1 / -1 tweaks here and there.
I originally used FLOOR but switched to using ROUND, as I think this might work for distributing sets of numbers where there isn't a whole number of divisions in Refund/User e.g. your 240/13 example.. Hopefully some users will have 18 rows and some 19

Performing calculations based on dates in oracle

I have the following tables.
Accounts(account_number*,balance)
Transactions(account_number*,transaction_number*,date,amount,type)
Date is the date that the transaction happened. Amount is the amount of the transaction
and it can have a positive or a negative value, dependent of the type(Withdrawal -,Deposit +). I think the type is irrelevant here as the amount is already given in the proper way.
I need to write a query which points out the account_number of the accounts that have at least once had negative balance.
Here's some sample data from the Transactions table, ordered by account_number and date.
account_number transaction_number date amount type
--------------------------------------------------------------------
1 2 02/03/2013 -20000 withdrawal
1 3 03/15/2013 300 deposit
1 1 01/01/2013 100 deposit
2 1 04/15/2013 235236 deposit
3 1 06/15/2013 500 deposit
4 1 03/01/2013 10 deposit
4 2 04/01/2013 80 deposit
5 1 11/11/2013 10000 deposit
5 2 12/11/2013 20000 deposit
5 3 12/13/2013 -10002 withdrawal
6 1 03/15/2013 102300 deposit
7 1 03/15/2013 100 deposit
8 1 08/08/2013 133990 deposit
9 1 05/09/2013 10000 deposit
9 2 06/01/2013 300 deposit
9 3 10/11/2013 23 deposit

Something like this with an analytic to keep a running balance for an account:
SELECT DISTINCT account_number
FROM ( SELECT account_number
,SUM(amount)
OVER (PARTITION BY account_number ORDER BY date) AS running_balance
FROM transactions
) x
WHERE running_balance < 0
Explanation:
It is using an analytic function: the PARTITION BY breaks the table into groups identified by the account number. Within each group, the data is ordered by date. Then there is a walk through each element in the ordered group and the SUM function is applied (by default summing everything from the beginning of the group to the current row). This gives you a running balance. Just run the inner query on its own and take a look at the output, then read a bit about analytic queries. They are pretty cool.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Exclude an item only when there are multiple entries - sql

You can achieve this using multiple having conditions - SELECT Member_ID, COUNT(*) FROM YOUR_TABLE GROUP BY Member_ID HAVING COUNT(PolicyName) = 1 OR COUNT(CASE WHEN PolicyName NOT LIKE '%travel%' THEN 1 ELSE 0 END) >= 1;

Related

Create column with unique identifier when adjacent fields are identical

Get the running unique count of items till a give date, similar to running total but instead a running unique count

Identifying Records Where a String Appears More Than Once

Distributing Records Evenly From One Table to Another

Performing calculations based on dates in oracle

Categories

Resources