SQL terminology to combine a NOT EXIST query with latest value - sql

I am a beginner with basic knowledge.
I have a single table that I am trying to pull all UID's that have not had a particular code in the table within the past year.
My table looks like this: (but much larger of course)
FACID DPID EID DID UID DT Code Units Charge ET Ord
1 1 6 2 1002 15-Mar-07 99204 1 180 09:36.7 1
1 1 7 5 10004 15-Mar-07 99213 1 68 02:36.9 1
1 1 24 55 25887 15-Mar-07 99213 1 68 43:55.3 1
1 1 25 2 355688 15-Mar-07 99213 1 68 53:20.2 1
1 1 26 5 555654 15-Mar-07 99213 1 68 42:22.6 1
1 1 27 44 135514 15-Mar-07 99213 1 68 00:36.8 1
1 1 28 2 3244522 15-Mar-07 99214 1 98 34:59.4 1
1 1 29 5 235445 15-Mar-07 99213 1 68 56:42.1 1
1 1 30 3 3214444 15-Mar-07 99213 1 68 54:56.5 1
1 1 33 1 221444 15-Mar-07 99204 1 180 37:44.5 1
I am attempting to use the following, but this is not working for my time frame limits.
select distinct UID from PtProcTbl
where DT<'20120101'
and NOT EXISTS (Select Distinct UID
where Code in ('99203','99204','99205','99213',
'99214','99215','99244','99245'))
I need to know how to make sure the UID's that I am pulling are the ones don't have a DT after the 1/1/2012 cut off date that contains one of the NOT Exists codes.
The above query returned UID's that actually dates after 1/1/2012 that does contain one of the above codes...
Not sure what I am doing wrong or if I am totally off base on this..
Thanks in advance.

Are you sure you need the NOT EXISTS? How about instead:
AND Code NOT IN ('99203','99204','99205','99213','99214','99215','99244','99245')

Related

How to group merge columns based on one row identifier with pandas?

I have a dataset, in which it has a lot of entries for a single location. I am trying to find a way to sum up all of those entries without affecting any of the other columns. So, just in case I'm not explaining it well enough, I want to use a dataset like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 10 12 14 17 27
Bedford 11 40 34 9 1
Bedford 7 1 2 3 3
Leeds 1 1 2 0 0
Leeds 20 13 6 1 1
Bath 101 20 33 41 3
Bath 11 2 3 1 0
And turn it into something like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 28 53 50 29 31
Leeds 21 33 39 1 1
Bath 111 22 36 42 3
Now, I have read up that a groupby should work in a way, but from my understanding a group by will change it into 2 columns and I don't particularly want to make hundreds of 2 columns and then merge it all. Surely there's a much simpler way to do this?
IIUC, groupby+sum will work for you:
df.groupby('Locations',as_index=False,sort=False).sum()
Output:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
0 Bedford 28 53 50 29 31
1 Leeds 21 14 8 1 1
2 Bath 112 22 36 42 3
Pivot table should work for you.
new_df = pd.pivot_table(df, values=['Cyclists', 'maleRunners', 'femalRunners',
'maleCyclists','femaleCyclists'],index='Locations', aggfunc=np.sum)

Properly 'Joining' two Cross Applies

I've got a query with three Cross-Applies that gather data from three different tables. The first Cr-Ap assists the 2nd and 3rd Cr-Ap's. It finds the most recent ID of a certain refill for a 'cartridge', the higher the ID the more recent the refill.
The second and third Cr-Ap's gather the SUMS of items that have been refilled and items that have been dispensed under the most recent Refill.
If I run the query for Cr-Ap 2 or 3 separately the output would look something like:
ID Amount
1 100
2 1000
3 100
4 0
5 0
etc
Amount would be either the amount of dispensed or refilled items.
Only I don't want to run these queries separately, I want them next to each other.
So what I want is a table that looks like this:
ID Refill Dispense
1 100 1
2 1000 5
3 100 7
4 0 99
5 0 3
etc
My gut tells me to do
INNER JOIN crossaply2 ON crossapply3.ID = crossapply2.ID
But this doesn't work. I'm still new to SQL so I don't exactly know what I can and can't join, what I do know is that you can use crossapply as a join (sorta?). I think that might be what I need to do here, I just don't know how.
But that's not it, there's another complication, there are certain refills where nothing gets dispensed. In these scenarios the crossapply I wrote for dispenses won't return anything for that refillID. With nothing I don't mean NULL, I mean it just skips the refillID. But I'd like to see a 0 in those cases. Because it just skips over those ID's I can't get COALESCE or ISNULL to work, this might also complicate the joining of these two applies. Because an INNER JOIN would skip any line where there is no Dispensed amount, even though there is a Refilled amount Id like to see.
Here is my code:
-- Dispensed SUM and Refilled SUM combined
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id]
,Refills.Refilled
,Dispenses.Dispensed
FROM [CartridgeRefill]
CROSS APPLY(
SELECT MAX([CartridgeRefill].[Id]) AS RecentRefillID
FROM [CartridgeRefill]
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS RecentRefill
CROSS APPLY(
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id] AS RefilledID
,SUM([CartridgeRefillMedication].[Amount]) AS Refilled
FROM [CartridgeRefillMedication]
INNER JOIN [CartridgeRefill] ON [CartridgeRefillMedication].[FK_CartridgeRefill_Id] = [CartridgeRefill].[Id]
WHERE [CartridgeRefillMedication].[FK_CartridgeRefill_Id] = RecentRefill.RecentRefillID
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS Refills
CROSS APPLY(
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id] AS DispensedID
,SUM([CartridgeDispenseAttempt].[Amount]) AS Dispensed
FROM [CartridgeDispenseAttempt]
INNER JOIN [CartridgeRefill] ON [CartridgeDispenseAttempt].[FK_CartridgeRefill_Id] = [CartridgeRefill].[Id]
WHERE [CartridgeDispenseAttempt].[FK_CartridgeRefill_Id] = RecentRefill.RecentRefillID
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS Dispenses
GO
The output of this code is as follows:
1 300 1
1 300 1
1 200 194
1 200 194
1 200 8
1 200 8
1 0 39
1 0 39
1 100 14
1 100 14
1 200 1
1 200 1
1 0 28
1 0 28
1 1000 102
1 1000 102
1 1000 557
1 1000 557
1 2000 92
1 2000 92
1 100 75
1 100 75
1 100 100
1 100 100
1 100 51
1 100 51
1 600 28
1 600 28
1 200 47
1 200 47
1 200 152
1 200 152
1 234 26
1 234 26
1 0 227
1 0 227
1 10 6
1 10 6
1 300 86
1 300 86
1 0 194
1 0 194
1 500 18
1 500 18
1 1000 51
1 1000 51
1 1000 56
1 1000 56
1 500 48
1 500 48
1 0 10
1 0 10
1 1500 111
1 1500 111
1 56 79
1 56 79
1 100 6
1 100 6
1 44 134
1 44 134
1 1000 488
1 1000 488
1 100 32
1 100 32
1 100 178
1 100 178
1 500 672
1 500 672
1 200 26
1 200 26
1 500 373
1 500 373
1 100 10
1 100 10
1 900 28
1 900 28
2 900 28
2 900 28
2 900 28
etc
It is total nonsense that I can't do much with, it goes on for about 20k lines and goes through all the ID's, eventually.
Any help is more than appreciated :)
Looks like overcomplicated a bit.
Try
WITH cr AS (
SELECT [FK_CartridgeRegistration_Id]
,MAX([CartridgeRefill].[Id]) RecentRefillID
FROM [CartridgeRefill]
GROUP BY [FK_CartridgeRegistration_Id]
)
SELECT cr.[FK_CartridgeRegistration_Id], Refills.Refilled, Dispenses.Dispensed
FROM cr
CROSS APPLY(
SELECT SUM(crm.[Amount]) AS Refilled
FROM [CartridgeRefillMedication] crm
WHERE crm.[FK_CartridgeRefill_Id] = cr.RecentRefillID
) AS Refills
CROSS APPLY(
SELECT SUM(cda.[Amount]) AS Dispensed
FROM [CartridgeDispenseAttempt] cda
WHERE cda.[FK_CartridgeRefill_Id] = cr.RecentRefillID
) AS Dispenses;

Creating Pivots based for a rowcount in SQL

I have 2 tables T1 and T2.
T1 has the following columns:
Id, EntityId, TypeofValue, Year, value
EntityId can have 7 values:
1,2,3,4,5,100,101
Typeofvalue can have 2 values
1 indicates Actual
2 indicates Target
T2 has the following columns:
NoOfRecordsToDisplay
I need to fetch the number of records (if existing) for Target corresponding to an Id. The record count (1 if any record is present 0 if none) for Target needs to be grouped into two categories: First having Entityid 1 and Maximum records under Second group with entityids 2,3,4,100,101 [Not 5]
However, the catches are:
Sometimes Target value might not be present for a year
I need to get only last records for targets on the basis of NoOfRecordsToDisplay (The number of records to display comes from T2) for actual
Example1:
NoOfRecordsToDisplay =3, ID =123
The data below should return
CountGroup1: 1, Countgroup2: 1
as Entityid 1 has least one value for target for last 3 years -2015, 2014,2013 in this case
as Entityid 2 or 3 has at least 1 value for years -2015, 2014
Id EntityId TypeofValue Year Value
123 1 1 2015 55
123 1 1 2014 56
123 1 1 2013 57
123 1 1 2012 58
123 1 2 2015 50
123 1 2 2014 50
123 1 2 2013 50
123 1 2 2012 50
123 2 1 2015 55
123 2 1 2014 56
123 3 1 2015 57
123 3 1 2014 58
123 2 2 2015 55
123 2 2 2014 56
123 3 2 2015 57
124 1 1 2015 55
124 1 1 2014 56
124 2 1 2013 57
124 2 1 2012 58
124 1 2 2015 50
124 1 2 2014 50
124 2 2 2013 50
124 2 2 2012 50
Another dataset
NoOfRecordsToDisplay =3, ID =123
The data below should return:
CountGroup1: 0, Countgroup2: 1
as Entityid 1 has no target value for last 3 years (entityid 1 has a target value but for 2012)
as Entityid 2 has at one value for target years -2015 (entityid 3 has a target value but for 2010)
Id EntityId TypeofValue Year Value
123 1 1 2015 55
123 1 1 2014 56
123 1 1 2013 57
123 1 1 2012 58
123 1 2 2012 58
123 2 1 2015 55
123 2 1 2014 56
123 2 2 2015 55
123 2 2 2011 56
123 3 2 2010 57
Thank you so much for your help.
I have been trying to find this solution for a long time, I am not sure if Pivot will help
The question is different from the other one that I had posted as I am trying to create a group count based on entity groups.

Getting average of product sales each day and calculate number of days that have positive sales

I have this table TARGETSALE that have the following columns
SELECT DATE, WEEK, BRANCH, PROD, TARGETREACH
FROM TARGETSALE
WHERE BRANCH = 1
AND WEEK BETWEEN 52 AND 53;
DATE WEEK BRANCH PROD TARGETREACH
-------------------------------------------------------------------
01/09/2014 52 1 1 50
02/09/2014 52 1 1 -10
03/09/2014 52 1 1 50
04/09/2014 52 1 1 50
05/09/2014 52 1 1 40
06/09/2014 52 1 1 -10
07/09/2014 53 1 1 -5
08/09/2014 53 1 1 0
09/09/2014 53 1 1 10
10/09/2014 53 1 1 20
11/09/2014 53 1 1 30
12/09/2014 53 1 1 40
13/09/2014 53 1 1 0
01/09/2014 52 1 2 20
02/09/2014 52 1 2 0
03/09/2014 52 1 2 0
04/09/2014 52 1 2 10
05/09/2014 52 1 2 20
06/09/2014 52 1 2 10
07/09/2014 53 1 2 -10
08/09/2014 53 1 2 10
09/09/2014 53 1 2 -10
10/09/2014 53 1 2 20
11/09/2014 53 1 2 20
12/09/2014 53 1 2 40
13/09/2014 53 1 2 0
01/09/2014 52 1 3 30
02/09/2014 52 1 3 30
03/09/2014 52 1 3 5
04/09/2014 52 1 3 0
05/09/2014 52 1 3 10
06/09/2014 52 1 3 -10
07/09/2014 53 1 3 -10
08/09/2014 53 1 3 -10
09/09/2014 53 1 3 20
10/09/2014 53 1 3 10
11/09/2014 53 1 3 40
12/09/2014 53 1 3 10
13/09/2014 53 1 3 10
"targetsales" shows how much over the target the sales is, where negative means how far below the target the sales was. How can I do the following:
1. I need to get the average for all the product for each day. Something like this:
DATE BRANCH AVERAGE_SALES_OF_ALL_PRODUCT
01/09/2014 1 33.33
02/09/2014 1 -1.67
...and so on
And then I need to have another query that shows how many days within those two weeks that there's positive average sales. Something like this:
BRANCH 2WEEKS_SINCE DAYS_WITH_POSITIVE_AVERAGE_SALES
1 53 9
Above just an example not a real result.
Sorry, hope this not too confusing. Thank you so much.
In Oracle, the date type might still have a time component. If you do not know if this is there, then use trunc() to remove it:
select trunc(date), branch, avg(targetreach)
from targetsale
group by truncdate, branch
order by 1, 2;
For the second query, you want to use case:
select branch, count(distinct case when targetreach > 0 then date end) as DaysWithPositiveSales
from targetsales
group by branch;
If you know there is one row per date per branch -- and the time component of the date is empty -- then the distinct is not necessary.
1)
SELECT TRUNC(DATE, 'DD'), BRANCH, SUM(TARGETREACH)
FROM TARGETSALE WHERE BRANCH = 1 AND WEEK BETWEEN 52 AND 53
GROUP BY TRUNC(DATE, 'DD'), BRANCH;
2)
SELECT BRANCH, SUM(DECODE(ABS(TARGETREACH), 1, 1, 0)
FROM TARGETSALE WHERE BRANCH = 1 AND WEEK BETWEEN 52 AND 53
GROUP BY BRANCH;

Complex grouping - design / performance problem

WARNING : This is one BIG Question
I have a design problem that started simple, but in one step of growth has stumped me completely.
The simple version of reality has a nice flat fact table...
All names have been changed to protect the innocent
CREATE TABLE raw_data (
tier0_id INT, tier1_id INT, tier2_id INT, tier3_id INT,
metric0 INT, metric1 INT, metric2 INT, metric3 INT
)
The tierIDs relate to entities in a fixed depth tree. Such as a business hierarchy.
The metrics are just performance figures, such as number of frogs captured, or pigeons released.
In the reporting the kindly user would make selections to mean something like the following:
tier0_id's 34 and 55 - shown separately
all of tier1_id's - grouped together
all of tier2_id's - grouped together
all of tier3_id's - shown separately
metrics 2 and 3
This gives me the following type of query:
SELECT
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END AS tier0_id,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END AS tier1_id,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END AS tier2_id,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END AS tier3_id,
SUM(metric2) AS metric2, SUM(metric3) AS metric3
FROM
raw_data
INNER JOIN tier0_values ON tier0_values.id = raw_data.tier0_id OR tier0_values.id IS NULL
INNER JOIN tier1_values ON tier1_values.id = raw_data.tier1_id OR tier1_values.id IS NULL
INNER JOIN tier2_values ON tier2_values.id = raw_data.tier2_id OR tier2_values.id IS NULL
INNER JOIN tier3_values ON tier3_values.id = raw_data.tier3_id OR tier3_values.id IS NULL
GROUP BY
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END
It's a nice hybrid of Dynamic SQL, and parametrised queries. And yes, I know, but SQL-CE makes people do strange things. Besides, that can be tidied up as and when the following change gets incorporated...
From now on, we need to be able to include NULLs in the different tiers. This will mean "applies to ALL entities in that tier".
For example, with the following very simplified data:
Activity WorkingTime ActiveTime BusyTime
1 0m 10m 0m
2 0m 15m 0m
3 0m 20m 0m
NULL 60m 0m 45m
WorkingTime never applies to an activity, so al the values go in with a NULL ID. But ActiveTime is specifically about a specific activity, so it goes in with a legitimate ID. BusyTime is also against a NULL activity because it's the cumulation of all the ActiveTime.
If one were to report on this data, the NULL values -always- get included in every row, because the NULL -means- "applies to everything". The data would look like...
Activity WorkingTime ActiveTime BusyTime (BusyOnOtherActivities)
1 60m 10m 45m (45-10 = 35m)
2 60m 15m 45m (45-15 = 30m)
3 60m 20m 45m (45-20 = 25m)
1&2 60m 25m 45m (45-25 = 20m)
1&3 60m 30m 45m (45-30 = 15m)
2&3 60m 35m 45m (45-35 = 10m)
ALL 60m 45m 45m (45-45 = 0m)
Hopefully this example makes sense, because it's actually a multi-tiered hierarchy (as per the original example), and in every tier NULLs are allowed. So I'll try an example with 3 tiers...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 0 10 0 0 0
1 4 10 | 0 15 0 0 0
1 5 10 | 0 20 0 0 0
1 NULL 10 | 60 0 45 0 0
2 3 10 | 0 5 0 0 0
2 5 10 | 0 10 0 0 0
2 6 10 | 0 15 0 0 0
2 NULL 10 | 50 0 30 0 0
1 3 11 | 0 7 0 0 0
1 4 11 | 0 8 0 0 0
1 5 11 | 0 9 0 0 0
1 NULL 11 | 30 0 24 0 0
2 3 11 | 0 8 0 0 0
2 5 11 | 0 10 0 0 0
2 6 11 | 0 12 0 0 0
2 NULL 11 | 40 0 30 0 0
NULL NULL 10 | 0 0 0 60 0
NULL NULL 11 | 0 0 0 60 0
NULL NULL NULL | 0 0 0 0 2
This would give many, many possible different output records in the reporting, but here are a few examples...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 60 10 45 60 2
1 4 10 | 60 15 45 60 2
1 5 10 | 60 20 45 60 2
2 3 10 | 50 5 30 60 2
2 5 10 | 50 10 30 60 2
2 6 10 | 50 15 30 60 2
1 ALL 10 | 60 45 45 60 2
2 ALL 10 | 50 30 30 60 2
ALL 3 10 | 110 15 75 60 2
ALL 4 10 | 60 15 45 60 2
ALL 5 10 | 110 30 75 60 2
ALL 6 10 | 50 15 30 60 2
ALL 3 ALL | 180 30 129 120 2
ALL 4 ALL | 90 23 69 120 2
ALL 5 ALL | 180 49 129 120 2
ALL 6 ALL | 90 27 60 120 2
ALL ALL 10 | 110 129 129 60 2
ALL ALL 11 | 70 129 129 60 2
ALL ALL ALL | 180 129 129 120 2
1 3&4 ALL | 90 40 69 120 2
ALL 3&4 ALL | 180 53 129 120 2
As messy as this is to explain, it makes complete and logical sense in my head. I understand what is being asked, but for the life of me I can not seem to write a query for this that doesn't take excruciating amounts of time to execute.
So, how would you write such a query, and/or refactor the schema?
I appreciate that people will ask for examples of what I've done so far, but I'm eager to hear other people's uncorrupted ideas and advice first ;)
The problem looks more like a normalization activity. I would start with normalizing the table
to something like: (You may need some more identity fields depending on your usage)
CREATE TABLE raw_data (
rawData_ID INT,
Activity_id INT,
metric0 INT)
I'd create a tiering table that looks something like: (tierplan allows for multiple groupings. If a tier_id has no parent to roll up under, then tierparent_id is NULL This alllows for recursion in the query.)
CREATE TABLE tiers (
tierplan_id INT,
tier_id INT,
tierparent_id INT)
Finally, I'd create a table that relates tiers and Activities something like:
CREATE TABLE ActivTiers (
Activplan_id INT, --id on the table
tierplan_id INT, --tells what tierplan the raw_data falls under
rawdata_id INT) --this allows the ActivityId to be payload instead of identifier.
Queries off of this ought to be "not too difficult."