SQL: Updating a column based on subquery results - sql

I have a T-SQL table that contains the following columns: Date, StationCode, HDepth, and MaxDepth. Each row in the MaxDepth column is set to 0 by default. What I am trying to do is find the maximum HDepth by Date and StationCode, and update the MaxDepth to a column on these rows. I have written a SELECT statement to find where the maximums occur and it is:
SELECT StationCode, [Date], MAX(HDepth) AS Maximum FROM dbo.[DepthTable] GROUP BY [Date], StationCode
How could I put this query into an Update statement to set the MaxDepth to 1 on the rows that are returned by this query?

You might try something like this:
UPDATE a
SET MaxDepth = 1
FROM dbo.[DepthTable] AS a
JOIN (
-- Your original query
SELECT StationCode, [Date], MAX(HDepth) AS Maximum
FROM dbo.[DepthTable]
GROUP BY [Date], StationCode
) AS b ON a.StationCode = b.StationCode
AND a.[DATE] = b.[DATE]
AND a.HDepth = b.Maximum -- Here we get only the max rows
However, if a column is simply based upon other columns, then you might think about putting this logic into a view (to avoid update anomalies). The select for such a view might look like:
SELECT a.[Date], a.StationCode, a.HDepth,
CASE WHEN b.Maximum IS NULL THEN 0 ELSE 1 END AS MaxDepth
FROM dbo.[DepthTable] AS a
LEFT JOIN (
-- Your original query
SELECT StationCode, [Date], MAX(HDepth) AS Maximum
FROM dbo.[DepthTable]
GROUP BY [Date], StationCode
) AS b ON a.StationCode = b.StationCode
AND a.[DATE] = b.[DATE]
AND a.HDepth = b.Maximum -- Here we get only the max rows

Related

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

Select latest and 2nd latest date rows per user

I have the following query to select rows where the LAST_UPDATE_DATE field is getting records that have a date value greater than or equal to the last 7 days, which works great.
SELECT 'NEW ROW' AS 'ROW_TYPE', A.EMPLID, B.FIRST_NAME, B.LAST_NAME,
A.BANK_CD, A.ACCOUNT_NUM, ACCOUNT_TYPE, PRIORITY, A.LAST_UPDATE_DATE
FROM PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
WHERE
B.EMPL_STATUS NOT IN ('T','R','D')
AND ((A.DEPOSIT_TYPE = 'P' AND A.AMOUNT_PCT = 100)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B')
AND A.EFFDT = (SELECT MAX(A1.EFFDT)
FROM PS_DIR_DEP_DISTRIB A1
WHERE A1.EMPLID = A.EMPLID
AND A1.EFFDT <= GETDATE())
AND D.EFF_STATUS = 'A'
AND D.EFFDT = (SELECT MAX(D1.EFFDT)
FROM PS_DIRECT_DEPOSIT D1
WHERE D1.EMPLID = D.EMPLID
AND D1.EFFDT <= GETDATE())
AND A.LAST_UPDATE_DATE >= GETDATE() - 7
What I would like to add onto this is to also add the previous (2nd MAX) row per EMPLID, so that I can output the 'old' row (that was prior to the last update the latest row meeting above criteria), along with the new row that I already am outputting in the query.
ROW_TYPE EMPLID FIRST_NAME LAST_NAME BANK_CD ACCOUNT_NUM ACCOUNT_TYPE PRIORITY LAST_UPDATE_DATE
NEW ROW 12345 JOHN SMITH 123548999 45234879 C 999 2019-03-06 00:00:00.000
OLD ROW 12345 JOHN SMITH 214080046 92178616 C 999 2018-10-24 00:00:00.000
NEW ROW 56399 CHARLES MASTER 785816167 84314314 C 999 2019-03-07 00:00:00.000
OLD ROW 56399 CHARLES MASTER 345761227 547352 C 999 2017-05-16 00:00:00.000
So the EMPLID would be ordered by NEW ROW, followed by OLD ROW as shown above. In this example the 'NEW ROW' is getting the record that is within the past 7 days, as indicated by the LAST_UPDATE_DATE.
I would like to get feedback on how to modify the query so I can also get the 'old' row (which is the max row that is less than the 'NEW' row retrieved above).
It was a slow day for crime in Gotham, so I gave this a whirl. Might work.
This is unlikely to work right out of the box, though, but it should get you started.
Your LAST_UPDATE_DATE column is on the table PS_DIR_DEP_DISTRIB, so we'll start there. First, you want to identify all of the records that were updated in the last 7 days because those are the only ones you're interested in. Throughout this, I'm assuming, and I'm probably wrong, that the natural key for the table consists of EMPLID, BANK_CD, and ACCOUNT_NUM. You'll want to sub in the actual natural key for those columns in a few places. That said, the date limiter looks something like this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND
limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
Now we'll use that as a correlated sub-query in a WHERE EXISTS clause that we'll correlate back to the base table to limit ourselves to records with natural key values that were updated in the last week. I altered the SELECT list to just SELECT 1, which is typical verbiage for a correlated sub, since it stops looking for a match when it finds one (1), and doesn't actually return any values at all.
Additionally, since we're filtering this record set anyway, I moved all the other WHERE clause filters for this table into this (soon to be) sub-query.
Finally, in the SELECT portion, I added a DENSE_RANK to force order the records. We' use the DENSE_RANK value later to filter off only the first (N) records of interest.
So that leaves us with this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
Replace the original INNER JOIN to PS_DIR_DEP_DISTRIB with that query. In the SELECT list, the first hard-coded value is now dependent on the RowNum value, so that's a CASE expression now. In the WHERE clause, the dates are all driven by the subquery, so they're gone, several were folded into the subquery, and we're adding WHERE dist.RowNum <= 2 to bring back the top 2 records.
(I also replaced all the table aliases so I could keep track of what I was looking at.)
SELECT
CASE dist.RowNum
WHEN 1 THEN 'NEW ROW'
ELSE 'OLD ROW'
END AS ROW_TYPE
,dist.EMPLID
,emp.FIRST_NAME
,emp.LAST_NAME
,dist.BANK_CD
,dist.ACCOUNT_NUM
,ACCOUNT_TYPE
,dist.PRIORITY
,dist.LAST_UPDATE_DATE
FROM
PS_DIRECT_DEPOSIT AS dd
INNER JOIN
(
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
) AS dist
ON
dist.EMPLID = dd.EMPLID
AND dist.EFFDT = dd.EFFDT
INNER JOIN
PS_EMPLOYEES AS emp
ON
emp.EMPLID = dist.EMPLID
WHERE
dist.RowNum <= 2
AND
emp.EMPL_STATUS NOT IN ('T', 'R', 'D')
AND
dd.EFF_STATUS = 'A';

Select records where the only exist 1 in a joined table

I have the following query:
SELECT
A.POSTCARD_ID, A.STAMP_ID, B.END_DT
FROM
PST_VS_STAMP A
JOIN
STAMP B ON A.POSTCARD_ID = B.POSTCARD_ID
WHERE
B.ACCOUNT LIKE 'AA%'
AND B.END_DT = '9999-12-31'
GROUP BY
A.POSTCARD_ID, A.STAMP_ID, B.END_DT
HAVING
COUNT(A.POSTCARD_ID) < 2
But I get the wrong results.
I want only the postcards ID's where there is 1 record (HAVING < 2) in the PST_VS_STAMP table. How can I query this?
Do the aggregation in the subquery, only on the table where you want one row. Because there is one row, you can use an aggregation function to pull out the value of any column (for one row min(col) is the column's value):
select s.postcard_id, vs.stamp_id, s.end_dt
from stamp s join
(select vs.postcard_id, min(stamp_id) as stamp_id
from pst_vs_stamp vs
group by vs.postcard_id
having count(*) = 1
) s
on vs.POSTCARD_ID = s.POSTCARD_ID
where s.ACCOUNT like 'AA%' and s.END_DT = '9999-12-31';

How return a count(*) of 0 instead of NULL

I have this bit of code:
SELECT Project, Financial_Year, COUNT(*) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year
where it's not returning any rows when the count is zero. How do I make these rows appear with the HighRiskCount set as 0?
You can't select the values from the table when the row count is 0. Where would it get the values for the nonexistent rows?
To do this, you'll have to have another table that defines your list of valid Project and Financial_Year values. You'll then select from this table, perform a left join on your existing table, then do the grouping.
Something like this:
SELECT l.Project, l.Financial_Year, COUNT(t.Project) AS HighRiskCount
INTO #HighRisk
FROM MasterRiskList l
left join #TempRisk1 t on t.Project = l.Project and t.Financial_Year = l.Financial_Year
WHERE t.Risk_1 = 3
GROUP BY l.Project, l.Financial_Year
Wrap your SELECT Query in an ISNULL:
SELECT ISNULL((SELECT Project, Financial_Year, COUNT(*) AS hrc
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year),0) AS HighRiskCount
If your SELECT returns a number, it will pass through. If it returns NULL, the 0 will pass through.
Assuming you have your 'Project' and 'Financial_Year' where Risk_1 is different than 3, and those are the ones you intend to include.
SELECT Project, Financial_Year, SUM(CASE WHEN RISK_1 = 3 THEN 1 ELSE 0 END) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
GROUP BY Project, Financial_Year
Notice i removed the where part.
By the way, your current query is not returning null, it is returning no rows.
Use:
SELECT x.Project, x.financial_Year,
COUNT(y.*) AS HighRiskCount
INTO #HighRisk
FROM (SELECT DISTINCT t.project, t.financial_year
FROM #TempRisk1
WHERE t.Risk_1 = 3) x
LEFT JOIN #TempRisk1 y ON y.project = x.project
AND y.financial_year = x.financial_year
GROUP BY x.Project, x.Financial_Year
The only way to get zero counts is to use an OUTER join against a list of the distinct values you want to see zero counts for.
SQL generally has a problem returning the values that aren't in a table. To accomplish this (without a stored procedure, in any event), you'll need another table that contains the missing values.
Assuming you want one row per project / financial year combination, you'll need a table that contains each valid Project, Finanical_Year combination:
SELECT HR.Project, HR.Financial_Year, COUNT(HR.Risk_1) AS HighRiskCount
INTO #HighRisk HR RIGHT OUTER JOIN ProjectYears PY
ON HR.Project = PY.Project AND HR.Financial_Year = PY.Financial_Year
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY HR.Project, HR.Financial_Year
Note that we're taking advantage of the fact that COUNT() will only count non-NULL values to get a 0 COUNT result for those result set records that are made up only of data from the new ProjectYears table.
Alternatively, you might only one 0 count record to be returned per project (or maybe one per financial_year). You would modify the above solution so that the JOINed table has only that one column.
Little longer, but what about this as a solution?
IF EXISTS (
SELECT *
FROM #TempRisk1
WHERE Risk_1 = 3
)
BEGIN
SELECT Project, Financial_Year, COUNT(*) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year
END
ELSE
BEGIN
INSERT INTO #HighRisk
SELECT 'Project', 'Financial_Year', 0
END
MSDN - ISNULL function
SELECT Project, Financial_Year, ISNULL(COUNT(*), 0) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year

Selecting minimum date in data set

I have a data set that i am attempting to select the first record with a station id of 2.
InspectionNbr Station DateTimeStamp
825065 1 2010-11-16 04:38:49.000
825065 2 2010-11-16 12:38:31.000
825065 2 2010-12-06 01:35:14.000
825065 2 2011-01-24 08:11:04.000
In this case i want to select the second line of the results. How can i use SQL to get the minimum date where stationid = 2?
That being stated, this is what i have.
i created a temporary table in SQL. i have it setup to populate the table with the latest date. Then i attempt and update the temporary table with the following code
UPDATE #report_out
set
DateTimeStamp = Min(si.CreatedDate)
from
#report_out as r
INNER JOIN
StationInspection as si
on si.ModifiedDate = r.DateTimeStamp
where
r.Station = 2
For some reason beyond me it doesn't like the DateTimeStamp = Min(si.CreatedDate)
i get the follwing error:
An aggregate may not appear in the set list of an UPDATE statement.
any pointers?
As far as I can figure out, an aggregate can't be used in an update statement because the aggregate and the update affect two different row sets. Think about a normal SELECT with an aggregate:
SELECT MIN(CreatedDate)
FROM StationInspection
WHERE Station = 2
The aggregate works on all rows in the row set. The row set is determined by the WHERE clause, which determines which rows will be in the row set.
In an update statement, the WHERE clause determines which rows will be changed:
UPDATE StationInspection
SET CreatedDate = #newDate
WHERE Station = 2
The update affects all rows in the row set (all rows that pass the filter specified by the WHERE clause).
So, in the case where you try to do both (I realize this is somewhat simplified from your code, but it makes the point):
UPDATE StationInspection
SET CreatedDate = MIN(CreatedDate)
WHERE Station = 2
You have two operations that require unique row sets, but only one row set selector (WHERE clause).
SQL doesn't support two WHERE clauses in a single statement. So you'll need two statements:
DECLARE #newDate datetime
SELECT #newDate = SELECT MIN(CreatedDate)
FROM StationInspection
WHERE Station = 2
UPDATE StationInspection
SET CreatedDate = #newDate
WHERE Station = 2
If DateTimeStap is a candidate key (at least when composed with Station), then there is no need to create a temp table; just do:
Select a.* from a join
(select a.Station, Min(a.DateTimeStamp) as m group by a.Station) as b
on a.Station = b.Station and a.DateTimeStamp = b.m
then you've got StationID and minimum DateTimeStamp for all Stations. This is a fast Query.
If DateTimeStamp is not a candidate key... The query becomes slow.
If you just want to get the Record with Station id '2' and having minimum date, just try:
SELECT InspectionNbr, Station, DateTimeStamp
FROM StationInspection
WHERE Station = 2
AND DateTimeStamp = (
SELECT MIN(DateTimeStamp)
FROM StationInspection
WHERE Station = 2
)
This way Eliminates Grouping
select T.InspectionNbr,
T.Station,
T.DateTimeStamp
from (
select *,
row_number() over(order by DateTimeStamp) as rn
from StationInspection
where Station = 2
) as T
where T.rn = 1
A shorter statement for some DBs (notably MySQL) might be:
SELECT InspectionNbr, Station, DateTimeStamp
FROM StationInspection
WHERE Station = 2
ORDER BY DateTimeStamp ASC
LIMIT 1