How to select a foreign key after narrowing down via Group By and Having in a subquery - sql

I've got a unique problem. I'm querying a replicated database table cost_plan_breakdown, and the replication is known to have some duplicates due to issues with deleting records. I'm not the Admin so I'm trying to sidestep these duplicates as efficiently as possible. The table looks like this:
sys_id
sys_created_on
cost_plan
breakdown_start_date
axr123
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
10-01-2020
pqo100
2020-12-23 05:50:20
Outlook KTLO - Lisa Lymon
10-01-2020
cji985
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
11-01-2020
twg795
2020-10-05 13:23:08
DataPyramid CTB - Dave Dods
10-01-2020
jqr820
2020-09-28 16:11:54
Revoluccion CTB - Marcus Vance
11-01-2020
vjo150
2021-01-13 11:10:09
Server KTLO - Tom Smith
10-01-2020
Cost Plans typically have between 1 and 12 breakdowns during their lifespan, but there should only be one breakdown per cost plan per month. Notice that the Outlook Cost Plan has two breakdowns within the same month (October) with differing sys_id and sys_created_on.
So by using a smaller subquery in the where clause, I'm trying to determine the following:
"Group the rows with identical month and year of breakdown_start_date, and identical cost_plan. Of the remaining rows, select the one with the MAX sys_created_on. Take the sys_id of that row and feed it to the parent query to only include these rows."
...rest of query above
WHERE cpb.breakdown_type = 'requirement'
AND cpb.sys_id IN
(SELECT cpb2.sys_id
FROM cost_plan_breakdown cpb2
GROUP BY cpb2.name,
YEAR(cpb2.start_date_time),
MONTH(cpb2.start_date_time)
HAVING MAX(cpb2.sys_created_on))
At this point, I'm running into the error
cpb2.sys_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I've previously semi-solved this by putting the MAX sys_created_on in the SELECT statement, and matching off that, but I realized that could pull in unwanted dupe records just because they match the sys_created_on of another.
I feel like the solution may be staring me in the face, but I'm stuck. Appreciate your help!

Use row_number to number the duplicate rows and then exclude them. Ordering the row number by sys_created_on desc ensures you get the latest of each per month.
declare #Test table (sys_id varchar(6), sys_created_on datetime2(0), cost_plan varchar(32), breakdown_start_date date);
insert into #Test (sys_id, sys_created_on, cost_plan, breakdown_start_date)
values
('axr123', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('pqo100', '2020-12-23 05:50:20', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('cji985', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '11-01-2020'),
('twg795', '2020-10-05 13:23:08', 'DataPyramid CTB - Dave Dods', '10-01-2020'),
('jqr820', '2020-09-28 16:11:54', 'Revoluccion CTB - Marcus Vance', '11-01-2020'),
('vjo150', '2021-01-13 11:10:09', 'Server KTLO - Tom Smith', '10-01-2020');
with cte as (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
)
select *
from cte
where rn = 1;
As per your comments this (the CTE) is just a neat way to write a sub-query/derived table and can still be written as follows:
select *
from (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
) cte
where rn = 1;
Note: If you provide DDL+DML as shown above you make it much easier for people to assist.

Related

How to get the set size, first and last record in a db2 ordered set with one call

I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456
As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected

How to select a record from SQL that does not have a certain value

I have a list of students who have several levels of English to complete. Is there a way to find those students who are doing English I and haven't moved to the next level
ID STUDENTID NAME ENGLISHLEVEL STARTDATE ENDDATE
----------------------------------------------------------------
1 001 Eric English-1 2017-01-01 2018-01-01
2 002 Brian English-1 2017-01-01 2017-01-31
3 002 Brian English-2 2017-02-01 2017-03-01
4 003 David English-1 2017-05-01 2017-06-01
5 003 David English-2 2017-06-02 2017-07-03
I have a list similar to above for thousands of students and want to know how I can query the table to show me those students who did English-1 but never got started with English-2 or English-3
Advice would be appreciated.
You can join the table to itself with an outer join. Have the outer joined table check for English2/3, and then filter out any results where there is a match.
select
eng1.*
from
students_table eng1
left outer join students_table eng2
on (
eng1.STUDENTID=eng2.STUDENTID
and eng2.ENGLISHLEVEL in ('English-2', 'English-3')
)
where
eng2.STUDENTID is null -- Filter out rows where an eng2 row was found
You can use not exists :
select t.*
from table t
where not exists (select 1
from table t1
where t1.STUDENTID = t.STUDENTID and
t1.ENGLISHLEVEL in ('English-2', 'English-3')
);
Your English levels can be sorted alphabetically:
select studentid, name
from t
group by studentid, name
having max(englishlevel) = 'English-1'
You can just use IN in your WHERE statement to get those students who did English-1
SELECT * FROM tblStudents
WHERE ENGLISHLEVEL IN ('English-1')
What does "students who are doing English I" mean in data terms in the table in your post?
What does "and havent moved to the next level" mean in data terms in the table?
Do you have a table that contains all possible values of ENGLISHLEVEL?
How are the STARTDATE and ENDDATE columns populated?
If the ENDDATE was only populated once the person has actually completed the training, you could use the NULL value there to look for people that have not completed a training. That might be the way to go here.
If you have a table that defines all possible values of ENGLISHLEVEL, then you can compare the count of distinct ENGLISHLEVEL values with the count of values in the ENGLISHLEVEL table, and that will tell you which students have completed all the courses. You can also do this comparison for each training that way.

finding duplicate rows with different IDs based on multiple columns

please forgive me if my jargon is off. I'm still learning!
I just started using Teradata, and to be honest has been a lot of fun. however, I have hit a road block that has stumped me for a while.
I successfully selected a table from a database that looks like:
ID service date name
1 service1 1/5/15 john
2 service2 1/7/15 steve
3 service3 1/8/15 lola
4 service4 1/3/15 joan
5 service5 1/5/15 fred
6 service3 1/3/15 joan
7 service5 1/8/15 oscar
Now I want to search the data base again to find any duplicate IDs (example: to see if service service1 with date 1/5/15 with name john exists on another row with a different ID.)
At first, I did something like this:
SELECT ID, service, date, name
FROM table
WHERE table.service = ANY(service1, service2, service3, service4, service5, service3, service5)
AND table.date = ANY('1/5/15', '1/7/15, '1/8/15', '1/3/15', '1/5/15', '1/3/15', '1/8/15')
AND table.name = ANY('john', 'steve', 'lola', 'joan', 'fred', 'joan', 'oscar');
But this is giving me more rows than I wanted.
example:
ID service date name
92 service3 1/8/15 steve
is of no use to me since I am looking for IDs that have the same combination of service, date, and name as of any of the other IDs in the above table.
something like this would be favorable:
ID service date name
609 service3 1/8/15 lola
since it matches than of ID 3.
I was curious to see if it were possible to treat the three columns (service, date, name) as a vector and maybe select the rows that match it that way?
ex
......
WHERE (table.service, table.date, table.name) = ANY((service3,1/8/15,lola), (service1, 1/5/15, john), ...etc)
My Teradata is down right now, So I have yet to try the above example. Nevertheless, any thoughts/feedback is greatly appreciated!
The following query may be what you are trying to achieve. This selects IDs for which the combination of service, date, and name appears more than once.
SELECT t1.ID
FROM yourTable t1
INNER JOIN
(
SELECT service, date, name
FROM yourTable
GROUP BY service, date, name
HAVING COUNT(*) > 1
) t2
ON t1.service = t2.service AND
t1.date = t2.date AND
t1.name = t2.name
This is a simple task for a Windowed Aggregate:
SELECT *
FROM tab
QUALIFY
COUNT(*) OVER (PARTITION BY service, date, name) > 1
This counts the number of rows with the same combination of values (like Tim Biegeleisen's Derived Table) but unlike a Standard Aggregate it keeps all rows. The QUALIFY is a nice Teradata syntax extension to avoid a Derived Table.
Don't hardcode values in your query unless you absolutely have to. Instead, take the query you already wrote and join to that.
SELECT dupes.*
FROM (your query) yourquery
JOIN table dupes
ON yourquery.service = dupes.service
AND yourquery.date = dupes.date
AND yourquery.name = dupes.name

GROUP BY and aggregate function query

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.
Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)

Finding the next occurrence of a value in a table

Sorry in advance if this has already been covered.
I am working on a database which isnt particularly well structured but it is owned by a third party and cannot be changed.
I need some assistance with t-sql in find the next occurrence of a value within the table and return records based on the result. Let me first explain the data. I have simplified this to make it easier to understand.
Polref Effective Date Transaction Type Suffix Value
ABCD1 01/06/2010 New Bus 1 175.00
ABCD1 01/06/2011 Ren 2 200.00
ABCD1 19/08/2011 Adjust 3 50.00
ABCD1 23/04/2012 Adjust 4 50.00
ABCD1 01/06/2012 Ren 5 275.00
So if I ran my query for 2011, the code would need to return in this example rows with suffix 2,3 and 4. So what I have been trying to do is find the first suffix of a New Bus or Ren for the specified year and then finding the next suffix for a New Bus or Ren for the same polref and then using those two suffix values to limit my recordset. It aint working!!
I cant use MAX() as transactions for 2013 have already been added to the system to I would get more records than I actually need.
There result I should be expecting for this example data would be:
ABCD1 300.00
Any help would be greatly appreciated.
To answer another question, If I select 2011 as my year to run the report, there should only be one New Bus or Ren transaction for 2011 so if its a New Bus transaction, the next main transaction will be a Ren, if its a Ren then the next main transaction will be a Ren. Again in my example below, if I run for 2011, it should find the Ren from 01/06/2011 so I want to return that Ren and the two Adjust records.
Sorry, I've not used this forum before so apologies if I was a little vague.
The table I am using has many polrefs so I need this code to calculate totals for all polrefs that fall within the date range. Some polrefs may only have one row, a New Bus, some will have many rows depending on how many adjustments have been made throughout the year of the policy
Partial answer:
This query:
declare #t table (PolRef char(5) not null, EffectiveDate date not null,TransactionType varchar(10) not null,Suffix int not null,Value decimal(10,2) not null)
insert into #t (Polref,EffectiveDate,TransactionType,Suffix,Value) values
('ABCD1','20100601','New Bus',1,175.00),
('ABCD1','20110601','Ren',2,200.00),
('ABCD1','20110819','Adjust',3,50.00),
('ABCD1','20120423','Adjust',4,50.00),
('ABCD1','20120601','Ren',5,275.00)
;With StartTransactions as (
select PolRef,Suffix,ROW_NUMBER() OVER (PARTITION BY PolRef ORDER BY Suffix) rn
from #t where TransactionType in ('New Bus','Ren')
), Periods as (
select st1.PolRef,st1.Suffix as StartSuffix,st2.Suffix as EndSuffix
from
StartTransactions st1
left join
StartTransactions st2
on
st1.PolRef = st2.PolRef and
st1.rn = st2.rn - 1
)
select
p.PolRef,t2.EffectiveDate,SUM(t.Value) as Total
from
Periods p
inner join
#t t
on
p.PolRef = t.PolRef and
p.StartSuffix <= t.Suffix and
(p.EndSuffix > t.Suffix or
p.EndSuffix is null)
inner join
#t t2
on
p.PolRef = t2.PolRef and
t2.Suffix = p.StartSuffix
group by
p.PolRef,t2.EffectiveDate
Groups each set of transactions based on each successive Ren or New Bus transaction:
PolRef EffectiveDate Total
------ ------------- ---------------------------------------
ABCD1 2010-06-01 175.00
ABCD1 2011-06-01 300.00
ABCD1 2012-06-01 275.00
From that, it should be trivial to e.g. select out only the ones you're interested in from a particular year. But your question is still vague on some specifics, so I'm not taking it any further at this point.