How to find the min, where TSQL groups by - sql

I have found the first transaction (min), but when I add the column 'Winners', I get a row for their first win and a row for their first loss. I need only the first row, including whether they won or lost. I have tried aggregating the winners column to no avail. I would prefer not to sub-query if possible. Thanks in advance for checking this out.
SELECT
MIN(dbo.ADT.Time) AS FirstShowWager,
dbo.AD.Account, dbo.AD.FirstName,
dbo.AD.LastName, dbo.ADW.Winners
FROM
dbo.BLAH
WHERE
(dbo.ADT.RunDate = CONVERT(DATETIME, '2014-04-12
00:00:00', 102)) AND (dbo.ADW.Pool = N'shw')
GROUP BY
dbo.AD.Account,
dbo.AD.FirstName,
dbo.AD.LastName,
dbo.AD.RunDate,
dbo.ADW.Winners
ORDER BY
dbo.AD.Account

select sorted.*
from
(
SELECT dbo.ADT.Time AS FirstShowWager,
dbo.AD.Account, dbo.AD.FirstName,
dbo.AD.LastName, dbo.ADW.Winners,
ROW_NUMBER ( ) OVER (partition by dbo.AD.Account,
dbo.AD.FirstName,
dbo.AD.LastName,
dbo.AD.RunDate
order by dbo.ADT.Time) as rowNum
FROM dbo.AD
WHERE dbo.ADT.RunDate = CONVERT(DATETIME, '2014-04-1200:00:00', 102)
AND dbo.ADW.Pool = N'shw'
) as sorted
where rowNum = 1
ROW_NUMBER

It sounds like you don't care about the value of winners column, by grouping on winners you'd get multiple rows, one for null and others for non-null values. If you don't care about the amount they've won but just simply if they've won or lost, you can do something like this,
SELECT
MIN(dbo.ADT.Time) AS FirstShowWager,
dbo.AD.Account, dbo.AD.FirstName,
dbo.AD.LastName, CASE WHEN dbo.ADW.Winners IS NULL THEN 0 ELSE 1 END
FROM
dbo.BLAH
WHERE
(dbo.ADT.RunDate = CONVERT(DATETIME, '2014-04-12
00:00:00', 102)) AND (dbo.ADW.Pool = N'shw')
GROUP BY
dbo.AD.Account,
dbo.AD.FirstName,
dbo.AD.LastName,
dbo.AD.RunDate,
dbo.ADW.Winners
ORDER BY
dbo.AD.Account

Add this case statement instead of winners column in the select statement and group by
Case ( winners is NULL then 'Lose' else 'Win' end )

Usually this is done with a derived table selecting the record you want then joining back to the orginal table on all the group by fields.

You can find the MIN in an inner query and then join it to the ADW table on by the ID to get if they are a winner.
SELECT b.*, ADW.winner
FROM dbo.ADW ADW INNER JOIN (SELECT MIN(ADT.RunTime) AS FirstShowWager,
AD.Account, AD.FirstName,
AD.Lastnamne, AD.ADID
FROM dbo.AD AD INNER JOIN dbo.ADT ADT AD.adid = ADT.ADID
GROUP BY AD.Account, AD.Firstname, AD.Lastnamne, AD.ADID) b
ON ADW.ADID = b.ADID
Assumptions: There is a foreign key between
From the ADT to the AD table.
From the ADW to the AD table.

Related

null result for average number of days by month

select a.clientid, a.CaseType, b.EnrollmentStartDate, a.EligibilityStartDate, datediff(day, a.EligibilityStartDate, b.EnrollmentStartDate) as date_diff
INTO ##temptable1
FROM dbo.Client a, dbo.ClientEnrollment b
WHERE a.ClientId = b.ClientId
AND a.CaseType = 99
ORDER BY a.ClientId
select avg (date_diff) from ##temptable1
so the above query gives me the overall average number of days it takes for a client to enroll into a program from their eligibility start date. I now want to sort the results by each month
select avg (date_diff) from ##temptable1
where EligibilityStartDate = '2019-03-01
for some reason I'm getting NULL no matter what date I specify ( even though the original query produces over 40k results ) I've tried inserting EligibilityStartDate = '2019-03-01' into the table itself but that did not work either.
Presumably, you want something like this:
SELECT YEAR(c.EligibilityStartDate) as yyyy,
MONTH(c.EligibilityStartDate) as mm,
AVG(DATEDIFF(DAY, c.EligibilityStartDate, ce.EnrollmentStartDate) as date_diff
FROM dbo.Client c JOIN
dbo.ClientEnrollment ce
ON c.ClientId = ce.ClientId AND c.CaseType = 99
GROUP BY YEAR(c.EligibilityStartDate), MONTH(c.EligibilityStartDate)
ORDER BY YEAR(c.EligibilityStartDate), MONTH(c.EligibilityStartDate);
Notes:
Never use commas in the FROM clause.
Always use proper, explicit, standard JOIN syntax.
Use meaningful table aliases (i.e. abbreviations of table names) rather than meaningless ones.
You seem to want an aggregation query.

Select latest and 2nd latest date rows per user

I have the following query to select rows where the LAST_UPDATE_DATE field is getting records that have a date value greater than or equal to the last 7 days, which works great.
SELECT 'NEW ROW' AS 'ROW_TYPE', A.EMPLID, B.FIRST_NAME, B.LAST_NAME,
A.BANK_CD, A.ACCOUNT_NUM, ACCOUNT_TYPE, PRIORITY, A.LAST_UPDATE_DATE
FROM PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
WHERE
B.EMPL_STATUS NOT IN ('T','R','D')
AND ((A.DEPOSIT_TYPE = 'P' AND A.AMOUNT_PCT = 100)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B')
AND A.EFFDT = (SELECT MAX(A1.EFFDT)
FROM PS_DIR_DEP_DISTRIB A1
WHERE A1.EMPLID = A.EMPLID
AND A1.EFFDT <= GETDATE())
AND D.EFF_STATUS = 'A'
AND D.EFFDT = (SELECT MAX(D1.EFFDT)
FROM PS_DIRECT_DEPOSIT D1
WHERE D1.EMPLID = D.EMPLID
AND D1.EFFDT <= GETDATE())
AND A.LAST_UPDATE_DATE >= GETDATE() - 7
What I would like to add onto this is to also add the previous (2nd MAX) row per EMPLID, so that I can output the 'old' row (that was prior to the last update the latest row meeting above criteria), along with the new row that I already am outputting in the query.
ROW_TYPE EMPLID FIRST_NAME LAST_NAME BANK_CD ACCOUNT_NUM ACCOUNT_TYPE PRIORITY LAST_UPDATE_DATE
NEW ROW 12345 JOHN SMITH 123548999 45234879 C 999 2019-03-06 00:00:00.000
OLD ROW 12345 JOHN SMITH 214080046 92178616 C 999 2018-10-24 00:00:00.000
NEW ROW 56399 CHARLES MASTER 785816167 84314314 C 999 2019-03-07 00:00:00.000
OLD ROW 56399 CHARLES MASTER 345761227 547352 C 999 2017-05-16 00:00:00.000
So the EMPLID would be ordered by NEW ROW, followed by OLD ROW as shown above. In this example the 'NEW ROW' is getting the record that is within the past 7 days, as indicated by the LAST_UPDATE_DATE.
I would like to get feedback on how to modify the query so I can also get the 'old' row (which is the max row that is less than the 'NEW' row retrieved above).
It was a slow day for crime in Gotham, so I gave this a whirl. Might work.
This is unlikely to work right out of the box, though, but it should get you started.
Your LAST_UPDATE_DATE column is on the table PS_DIR_DEP_DISTRIB, so we'll start there. First, you want to identify all of the records that were updated in the last 7 days because those are the only ones you're interested in. Throughout this, I'm assuming, and I'm probably wrong, that the natural key for the table consists of EMPLID, BANK_CD, and ACCOUNT_NUM. You'll want to sub in the actual natural key for those columns in a few places. That said, the date limiter looks something like this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND
limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
Now we'll use that as a correlated sub-query in a WHERE EXISTS clause that we'll correlate back to the base table to limit ourselves to records with natural key values that were updated in the last week. I altered the SELECT list to just SELECT 1, which is typical verbiage for a correlated sub, since it stops looking for a match when it finds one (1), and doesn't actually return any values at all.
Additionally, since we're filtering this record set anyway, I moved all the other WHERE clause filters for this table into this (soon to be) sub-query.
Finally, in the SELECT portion, I added a DENSE_RANK to force order the records. We' use the DENSE_RANK value later to filter off only the first (N) records of interest.
So that leaves us with this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
Replace the original INNER JOIN to PS_DIR_DEP_DISTRIB with that query. In the SELECT list, the first hard-coded value is now dependent on the RowNum value, so that's a CASE expression now. In the WHERE clause, the dates are all driven by the subquery, so they're gone, several were folded into the subquery, and we're adding WHERE dist.RowNum <= 2 to bring back the top 2 records.
(I also replaced all the table aliases so I could keep track of what I was looking at.)
SELECT
CASE dist.RowNum
WHEN 1 THEN 'NEW ROW'
ELSE 'OLD ROW'
END AS ROW_TYPE
,dist.EMPLID
,emp.FIRST_NAME
,emp.LAST_NAME
,dist.BANK_CD
,dist.ACCOUNT_NUM
,ACCOUNT_TYPE
,dist.PRIORITY
,dist.LAST_UPDATE_DATE
FROM
PS_DIRECT_DEPOSIT AS dd
INNER JOIN
(
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
) AS dist
ON
dist.EMPLID = dd.EMPLID
AND dist.EFFDT = dd.EFFDT
INNER JOIN
PS_EMPLOYEES AS emp
ON
emp.EMPLID = dist.EMPLID
WHERE
dist.RowNum <= 2
AND
emp.EMPL_STATUS NOT IN ('T', 'R', 'D')
AND
dd.EFF_STATUS = 'A';

SQL values disappear when using max dates

First time posting here and have a query that I hope someone maybe able to help with, i have tried to search for the answer but with no joy.
When i use the below SQL to find a value (in this case eb.annualvalue) it returns multiple values because no end dates have been entered into the eb table and there are too many employees without end dates for me to close down.
LEFT JOIN
(
SELECT
eb.empid, eb.bencode, eb.currencycode AS [currencycode], eb.notes AS [notes], eb.annualvalue
FROM
employeebenefit AS [eb]
WHERE
eb.bencode IN ('US 401K Plan')
AND (eb.enddate IS NULL OR eb.enddate >= '20180101')
)
AS eb26
ON eb26.empid = e.empid
However, when i use MAX startdate (code below) it returns the correct number or rows however, the eb.annualvalue figure disappears.
LEFT JOIN
(
SELECT
eb.empid, eb.bencode, eb.currencycode AS [currencycode], eb.notes AS [notes], eb.annualvalue
FROM
employeebenefit AS [eb]
WHERE
eb.bencode IN ('US 401K Plan')
AND (eb.enddate IS NULL OR eb.enddate >= '20180101')
AND (eb.startdate = (SELECT MAX(eb.startdate) FROM employeebenefit AS [eb]))
)
AS eb26
ON eb26.empid = e.empid
Any help would be greatly appreciated. Thanks Dan.
This sounds like a greatest-n-per-group problem, you just want one row per employee, from a table with many rows per employee. I'm not 100% clear on how you want to select that one row, but I can give an example.
Ideally, you would use ROW_NUMBER() but that only came in to effect from SQL Server 2008 onward.
The two commons alternative are:
- Join on your data twice. Once to find the "highest date" per user, again to find the whole row.
- Use a correlated sub-query to work out an individual's best row (still really joining twice)
Simple-self-join:
LEFT JOIN
(
SELECT
empid,
MAX(startdate) AS max_startdate
FROM
employeebenefit
WHERE
bencode IN ('US 401K Plan')
AND (enddate IS NULL OR enddate >= '20180101')
GROUP BY
empid
)
latest_employeebenefit
ON latest_employeebenefit.empid = e.empid
LEFT JOIN
employeebenefit
ON employeebenefit.empid = latest_employeebenefit.empid
AND employeebenefit.startdate = latest_employeebenefit.max_startdate
AND employeebenefit.bencode IN ('US 401K Plan')
AND (employeebenefit.enddate IS NULL OR employeebenefit.enddate >= '20180101')
This has the "feature" that if two such records both match the max_startdate (a tie) then both will come through. Often that is impossible, often it's desirable, it depends on your data and your needs.
Correlated-sub-query for join:
LEFT JOIN
employeebenefit
ON employeebenefit.id =
(
SELECT TOP(1) lookup.id
FROM employeebenefit AS lookup
WHERE lookup.empid = e.empid -- the correlated bit
AND lookup.bencode IN ('US 401K Plan')
AND (lookup.enddate IS NULL OR lookup.enddate >= '20180101')
ORDER BY lookup.startdate DESC
)
This is slightly different in that it always returns just one row. If there can be a tie when only sorting by startdate it's generally best to add another column to the ORDER BY, even if it's just an id column, to ensure the results are deterministic.
You can use the code bellow , if I undestood your question
OUTER APPLY
(
SELECT TOP 1
eb.empid, eb.bencode, eb.currencycode AS [currencycode], eb.notes AS [notes], eb.annualvalue
FROM
employeebenefit AS [eb]
WHERE
eb.empid = e.empid
AND eb.bencode IN ('US 401K Plan')
AND (eb.enddate IS NULL OR eb.enddate >= '20180101')
ORDER BY
eb.startdate DESC
)
AS eb26

Select records where the only exist 1 in a joined table

I have the following query:
SELECT
A.POSTCARD_ID, A.STAMP_ID, B.END_DT
FROM
PST_VS_STAMP A
JOIN
STAMP B ON A.POSTCARD_ID = B.POSTCARD_ID
WHERE
B.ACCOUNT LIKE 'AA%'
AND B.END_DT = '9999-12-31'
GROUP BY
A.POSTCARD_ID, A.STAMP_ID, B.END_DT
HAVING
COUNT(A.POSTCARD_ID) < 2
But I get the wrong results.
I want only the postcards ID's where there is 1 record (HAVING < 2) in the PST_VS_STAMP table. How can I query this?
Do the aggregation in the subquery, only on the table where you want one row. Because there is one row, you can use an aggregation function to pull out the value of any column (for one row min(col) is the column's value):
select s.postcard_id, vs.stamp_id, s.end_dt
from stamp s join
(select vs.postcard_id, min(stamp_id) as stamp_id
from pst_vs_stamp vs
group by vs.postcard_id
having count(*) = 1
) s
on vs.POSTCARD_ID = s.POSTCARD_ID
where s.ACCOUNT like 'AA%' and s.END_DT = '9999-12-31';

Skip rows for specific time in SQL

Need a help.
I have two timestamp columns, so basically I want to get the max and min value with a thirD column showing as timedifference. I am skipping any 12.am time so used the syntax below. ANy help how to achieve the third column, timedifference.. It is in DB2.
SELECT EMPID,MIN(STARTDATETIME),MAX(ENDDATETIME)
FROM TABLE
WHERE DATE(STARTDATETIME)= '2012-05-15' AND HOUR(STARTDATETIME)<>0 AND HOUR(ENDDATETIME)<>0
GROUP BY EMPID
You can use the results from that in an inner select, and use those values to define the TimeDifference column. My knowledge of DB2 is very limited, so I'm making some assumptions, but this should give you an idea. I'll update the answer if something is drastically incorrect.
Select EmpId,
MinStartDate,
MaxEndDate,
MaxEndDate - MinStartDate As TimeDifference
From
(
Select EMPID,
MIN(STARTDATETIME) As MinStartDate,
MAX(ENDDATETIME) As MaxEndDate
From Table
Where DATE(STARTDATETIME) = '2012-05-15'
And HOUR(STARTDATETIME) <> 0
And HOUR(ENDDATETIME) <> 0
Group By EMPID
) A