Convert MySQL query to MS SQL Server ... failing on aggregate requirements - sql

GOAL:
I need to retrieve the most recent message date (max), number of rows in its attachment, and the vendors name.
Also, we need to limit the results to messages sent this year (after 2014-01-01 00:00:00.000) which have an attachment with 50k rows or more.
TRIED:
See this sqlFiddle.
SELECT
v.name
,a.attachmentRows
,MAX(e.createdDate) recentDate
FROM emailMessage e
INNER JOIN vendor v
ON (e.vendorID = v.vendorID)
INNER JOIN emailAttachment a
ON (e.emailMessageID = a.emailMessageID)
WHERE e.createdDate > '2014-01-01 00:00:00.000'
AND a.attachmentRows >= 50000
GROUP BY e.vendorID
EXPECTATIONS:
| NAME | ATTACHMENTROWS | RECENTDATE |
|-------------|----------------|---------------------------------|
| "Company C" | 123880 | February, 22 2014 10:00:00+0000 |
PROBLEM:
While my SQL skills are rather primitive, I'm fairly comfortable with the MySQL flavor so I started my fiddling there. That query worked as expected.
When switching over to SQL Server, though, I run into this error for each of the selected fields:
Column 'blahBlah' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I understand what the error is telling me, but with three tables involved, I'm at a loss as to how to remedy it. (And of course, simply grouping by all the selected fields would not yield the desired results.)
PLEA:
Please help!

Please try this Fiddle:
SELECT
v.name
,a.attachmentRows
,e.createdDate recentDate
FROM emailMessage e
INNER JOIN vendor v
ON (e.vendorID = v.vendorID)
INNER JOIN emailAttachment a
ON (e.emailMessageID = a.emailMessageID)
INNER JOIN (SELECT MAX(emailMessageID) emailMessageID, vendorID from emailMessage group by vendorID) as maxi
on maxi.emailMessageID = e.emailMessageID
WHERE e.createdDate > '2014-01-01 00:00:00.000'
AND a.attachmentRows >= 50000
This assumes the emailMessageID increments with the createdDate. Using the date is problematic if two emails arrive at the exact same time stamp.

SELECT
v.name
,a.attachmentRows
,MAX(e.createdDate) recentDate
FROM emailMessage e
INNER JOIN vendor v
ON (e.vendorID = v.vendorID)
INNER JOIN emailAttachment a
ON (e.emailMessageID = a.emailMessageID)
WHERE e.createdDate > '2014-01-01 00:00:00.000'
AND a.attachmentRows >= 50000
GROUP BY v.name ,a.attachmentRows

Related

Creating Daily In-Use table w/ Zeros When NULL

Hello Stack Community,
I am not sure if I titled this accurately, but I am attempting to create a table that tracks the daily in-use quantity by product code. Currently my code drops dates where a product isn't in-use whereas I need that to show as a 0.
My thoughts where that by using the date from the date table that my LEFT OUTER JOIN with the ISNULL on the field would produce a 0, but nay.
Here is my code, with a screenshot of what it outputs with the red square highlighting where it's missing date records that I need to show as 0 :
SELECT
DD.DATE,
DE.PRODUCT_CODE,
--OOC = OUT OF CONTEXT, EITHER ISN'T CHARGEABLE OR ISN'T CURRENTLY ACTIVE
ISNULL(SUM(LIDV.QTY - LIDV.QTYSUB),0),
OD.LOCATION,
OD.SOURCE
FROM Dim_Date AS DD
LEFT OUTER JOIN ORDERv_DatesDays AS OD ON DD.DATE BETWEEN OD.SHIP_DATE AND OD.adjRETURN_DATE
LEFT OUTER JOIN FACT_Orders_LIDs AS LIDV ON LIDV.SORDERID_DAX = OD.SORDERID_DAX
LEFT OUTER JOIN DIM_ECODES AS DE ON DE.PRODUCT_CODE = LIDV.eCODE
WHERE
--DD.DATE = '3/1/2017' AND
DD.DATE BETWEEN '1/1/2017' AND EOMONTH( DATEADD( MONTH , -1, CURRENT_TIMESTAMP ) ) AND
DE.PRODUCT_CODE = '07316-' AND
YEAR(DD.DATE) = 2017
GROUP BY
DD.DATE,
DE.PRODUCT_CODE,
OD.LOCATION,
OD.SOURCE
ORDER BY
DD.DATE
I also thought, since I'm no SQL expert, that perhaps I need to just create a table with each product code and date for a specified date range but I got tripped up trying to create that as well.
Thank you for any assistance, if I need to add more info just let me know what I'm missing.
This WHERE predicate is killing your left join:
DE.PRODUCT_CODE = '07316-' AND
If product_code 07316 was not "out on loan" (or whatever) between Feb 24 and April 6 then all those rows would have looked like:
DATE PRODUCT_CODE INUSE LOCATION
2017-02-25 NULL NULL NULL
2017-02-26 NULL NULL NULL
2017-02-27 NULL NULL NULL
2017-02-28 NULL NULL NULL
...
2017-04-05 NULL NULL NULL
But, that NULL in product_code means that when the where clause asks "is NULL equal to 07316- ?" the answer is false, so the row diasppears from the resultset
Consider
LEFT OUTER JOIN DIM_ECODES AS DE
ON
DE.PRODUCT_CODE = LIDV.eCODE AND
DE.PRODUCT_CODE = '07316-'
You might also want to make some changes in the SELECT block too:
'07316-' as PRODUCT_CODE,
COALESCE(INUSE,0) AS INUSE
It might make more sense to you to write it like this:
FROM
Dim_Date AS DD
LEFT OUTER JOIN
(
SELECT
OD.SHIP_DATE,
OD.adjRETURN_DATE,
LIDV.QTY,
LIDV.QTYSUB,
OD.LOCATION,
OD.SOURCE
FROM
ORDERv_DatesDays AS OD
INNER JOIN FACT_Orders_LIDs AS LIDV ON LIDV.SORDERID_DAX = OD.SORDERID_DAX
INNER JOIN DIM_ECODES AS DE ON DE.PRODUCT_CODE = LIDV.eCODE
WHERE
DE.PRODUCT_CODE = '07316-'
) x
ON DD.DATE BETWEEN x.SHIP_DATE AND x.adjRETURN_DATE
WHERE
This is "list of dates on the left" and "any relevant data, already joined together and where'd on the right"
It should also be noted that if you're doing this for multiple product codes, to prevent just a single date row if both product 07316 and 07317 are in use on the 28th Feb you'd need to:
FROM
(
SELECT DISTINCT DD.DATE, DE.PRODUCT_CODE
FROM Dim_Date AS DD CROSS JOIN DIM_ECODES DE
WHERE ..date range clause..
)
This takes your list of dates, and crosses it with your list of prod codes, so you can be certain there are at least these two rows:
2017-02-28 07316-
2017-02-28 07317-
Then when you left join the products on date and product code, both those rows' data survive the left join, and become associated with nulls:
2017-02-28 07316- NULL NULL
2017-02-28 07317- NULL NULL
Without doing that CROSS, you'd have just one row (null in product code)

Group By Dynamic Ranges in SQL (cockroachdb/postgres)

I have a query that looks like
select s.session_id, array_agg(sp.value::int8 order by sp.value::int8) as timestamps
from sessions s join session_properties sp on sp.session_id = s.session_id
where s.user_id = '6f129b1c-43a6-4871-86f6-1749bfe1a5af' and sp.key in ('SleepTime', 'WakeupTime') and value != 'None' and value::int8 > 0
group by s.session_id
The result would look like
f321c813-7927-47aa-88c3-b3250af34afa | {1588499070,1588504354}
f38a8841-c402-433d-939d-194eca993bb6 | {1588187599,1588212803}
2befefaf-3b31-46c9-8416-263fa7b9309d | {1589912247,1589935771}
3da64787-65cd-4305-b1ac-1393e2fb11a9 | {1589741569,1589768453}
537e69aa-c39d-484d-9108-2f2cd956d4ee | {1588100398,1588129026}
5a9470ff-f930-491f-a57d-8c089e535d53 | {1589140368,1589165092}
The first column is a unique id and the second column is from and to timestamps.
Now I have a third table which has some timeseries data
records
------------------------
timestamp | name | value
Is it possible to find avg(value) from from records in group of session_ids over the from and to timestamps.
I could run a for loop in the application and do a union to get the desired result. But I was wondering if that is possible in postgres or cockroachdb
I wouldn't aggregate the two values but use two joins to find them. That way you can be sure which value belongs to which property.
Once you have that, you can join that result to your records table.
with ranges as (
select s.session_id, st.value as from_value, wt.value as to_value
from sessions s
join session_properties st on sp.session_id = s.session_id and st.key = 'SleepTime'
join session_properties wt on wt.session_id = s.session_id and wt.key = 'WakeupTime'
where s.user_id = '6f129b1c-43a6-4871-86f6-1749bfe1a5af'
and st.value != 'None' and wt.value::int8 > 0
and wt.value != 'None' and wt.value::int8 > 0
)
select ra.session_id, avg(rc.value)
from records rc
join ranges ra
on ra.from_value >= rc.timewstamp
and rc.timestamp < ra.to_value
group by ra.session_id;

Select Latest or most recent date in SQL query

I am running a query in SQL on our EHR/EMR database. I am primarily looking at an assessment that is done by a nurse during each patient encounter/visit and looking to return an answer for the most recent assessment date along with some other info. I have the query created and all the data is coming over, however, it is returning all assessment dates and the answers instead of just the latest date and answer. I'll attach the full code below.
SELECT DISTINCT
MAX(PTA.ASSESSMENT_DATE) AS Max_Date,
SAQ.QUESTION_TEXT, SAA.ANSWER_TEXT, dbo.PT_BASIC.PATIENT_CODE,
dbo.PT_BASIC.NAME_FULL
FROM
dbo.PTC_ASSESSMENT_ANSWER AS PAA
INNER JOIN
dbo.PTC_ASSESSMENT AS PTA ON PTA.ASSESSMENT_ID = PAA.ASSESSMENT_ID
AND PTA.PATIENT_ID = PAA.PATIENT_ID
INNER JOIN
dbo.SYS_ASSESSMENT_POINTER AS SAP ON SAP.POINTER_ID = PAA.POINTER_ID
INNER JOIN
dbo.SYS_ASSESSMENT_QUESTION AS SAQ ON SAQ.QUESTION_ID = SAP.QUESTION_ID
INNER JOIN
dbo.SYS_ASSESSMENT_ANSWER AS SAA ON SAA.ANSWER_ID = SAP.ANSWER_ID
INNER JOIN
dbo.PT_BASIC ON PTA.PATIENT_ID = dbo.PT_BASIC.PATIENT_ID
WHERE
(PTA.ASSESSMENT_DATE BETWEEN CONVERT(DATETIME, '2017-09-05 00:00:00', 102)
AND CONVERT(DATETIME, '2017-10-12 00:00:00', 102))
GROUP BY
dbo.PT_BASIC.PATIENT_CODE, dbo.PT_BASIC.NAME_FULL, SAQ.QUESTION_TEXT,
SAA.ANSWER_TEXT
HAVING
(SAA.ANSWER_TEXT LIKE '%LEVEL % -%')
The current output would be something similar to this:
9/5/2017 PATIENT ABC Answer1
9/6/2017 PATIENT ABC Answer2
9/7/2017 PATIENT ABC Answer3
9/6/2017 PATIENT XYZ Answer4
What I am expecting is:
9/7/2017 PATIENT ABC Answer3
9/6/2017 PATIENT XYZ Answer4
If your version of SQL Server supports it, using ROW_NUMBER() OVER() is an efficient and simple method for arriving at "latest" (or "earliest") rows from a single table. However as we know so little about your data model it isn't easy to guess how to reduce the rows to just the "lastest answer" which probably requires a more complex subquery. However you can still use ROW_NUMBER() OVER() on that subquery. I suspect that the nature of questions and answers is that the table aliases SAP, SAQ, SAA may all need to be involved in this subquery.
Note that instead of directly joining PTA this is now a subquery and the join condition to the outer query requires that RN=1 which is the row with the "latest" date.
SELECT
MAX(PTA.ASSESSMENT_DATE) AS Max_Date
, SAQ.QUESTION_TEXT
, SAA.ANSWER_TEXT
, dbo.PT_BASIC.PATIENT_CODE
, dbo.PT_BASIC.NAME_FULL
FROM dbo.PTC_ASSESSMENT_ANSWER AS PAA
INNER JOIN (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY PATIENT_ID
ORDER BY ASSESSMENT_DATE DESC) AS RN
FROM dbo.PTC_ASSESSMENT
WHERE ASSESSMENT_DATE BETWEEN '20170905' AND '20171012'
) AS PTA ON PTA.ASSESSMENT_ID = PAA.ASSESSMENT_ID
AND PTA.PATIENT_ID = PAA.PATIENT_ID
AND PTA.RN = 1
INNER JOIN dbo.SYS_ASSESSMENT_POINTER AS SAP ON SAP.POINTER_ID = PAA.POINTER_ID
INNER JOIN dbo.SYS_ASSESSMENT_QUESTION AS SAQ ON SAQ.QUESTION_ID = SAP.QUESTION_ID
INNER JOIN dbo.SYS_ASSESSMENT_ANSWER AS SAA ON SAA.ANSWER_ID = SAP.ANSWER_ID
INNER JOIN dbo.PT_BASIC ON PTA.PATIENT_ID = dbo.PT_BASIC.PATIENT_ID
WHERE SAA.ANSWER_TEXT LIKE '%LEVEL % -%'
GROUP BY
dbo.PT_BASIC.PATIENT_CODE
, dbo.PT_BASIC.NAME_FULL
, SAQ.QUESTION_TEXT
, SAA.ANSWER_TEXT
select distinct is not required on this query (or any similar query using GROUP BY)
yyymmdd is the safest date literal in SQL Server, you don't need the converts using style 102
your having clause should be moved to a where clause as it does not evaluate any aggregated value
Cross apply allows you to use a correlated query and chive the top most n records ordered by date desc for each patient assessment. (after review maybe you just need patient?)
Perhaps just change:
INNER JOIN
dbo.PTC_ASSESSMENT AS PTA ON PTA.ASSESSMENT_ID = PAA.ASSESSMENT_ID
AND PTA.PATIENT_ID = PAA.PATIENT_ID
TO:
CROSS APPLY (SELECT TOP 1 *
FROM dbo.PTC_ASSESSMENT PTA2
WHERE PTA2.ASSESSMENT_ID = PAA.ASSESSMENT_ID
/*AND PTA2.PATIENT_ID = PAA.PATIENT_ID*/
ORDER BY PTA2.Assessment_date desc) PTA
GIVING YOU: (I left the /AND PTA2.PATIENT_ID = PAA.PATIENT_ID/ --I think you can omit this. I left the */ in place but it's not needed)
SELECT MAX(PTA.ASSESSMENT_DATE) AS Max_Date
, SAQ.QUESTION_TEXT
, SAA.ANSWER_TEXT
, dbo.PT_BASIC.PATIENT_CODE
, dbo.PT_BASIC.NAME_FULL
FROM dbo.PTC_ASSESSMENT_ANSWER AS PAA
CROSS APPLY (SELECT TOP 1 *
FROM dbo.PTC_ASSESSMENT PTA2
WHERE PTA2.ASSESSMENT_ID = PAA.ASSESSMENT_ID --I think you can omit this.
/*AND PTA2.PATIENT_ID = PAA.PATIENT_ID*/
ORDER BY PTA2.Assessment_date desc) PTA
INNER JOIN dbo.SYS_ASSESSMENT_POINTER AS SAP
ON SAP.POINTER_ID = PAA.POINTER_ID
INNER JOIN dbo.SYS_ASSESSMENT_QUESTION AS SAQ
ON SAQ.QUESTION_ID = SAP.QUESTION_ID
INNER JOIN dbo.SYS_ASSESSMENT_ANSWER AS SAA
ON SAA.ANSWER_ID = SAP.ANSWER_ID
INNER JOIN dbo.PT_BASIC
ON PTA.PATIENT_ID = dbo.PT_BASIC.PATIENT_ID
WHERE (PTA.ASSESSMENT_DATE BETWEEN CONVERT(DATETIME, '2017-09-05 00:00:00', 102) AND CONVERT(DATETIME, '2017-10-12 00:00:00', 102))
GROUP BY dbo.PT_BASIC.PATIENT_CODE
, dbo.PT_BASIC.NAME_FULL
, SAQ.QUESTION_TEXT
, SAA.ANSWER_TEXT
HAVING (SAA.ANSWER_TEXT LIKE '%LEVEL % -%')
It appears you're not concerned about patients w/o assessments as all your joins are inner or we could use OUTER APPPLY to be sure to keep all answers regardless if an assessment has been provided.
Alternatively you could use a row_number() logic ( Tab Alleman's link has this covered) and a cte; but if cross apply is available might as well use it here.
Please include order by PTA.ASSESSMENT_DATE DESC to see the latest records at the top.

SQL - Joining a list of dates with observations within a date range

I have a sample query below that uses GETDATE to pull the most recent estimates. The table has TWO date columns, effectiveDate and toDate. The problem? I want to pull a weekly value so I can have a time series of estimates. If I run the query now, I will end up with all the estimates as of today, but I also want to know what they were last week, the week before, etc.
Should I create a new table containing the dates that I want and then join them against the results of the query. This is where I am stuck. Thank you.
select GETDATE() as observeDate
, (select C.companyName from ciqCompany C where C.companyId = EP.companyId) as companyName
, (select EPT.periodTypeName from ciqEstimatePeriodType EPT where EPT.periodTypeId = EP.periodTypeId) as periodTypeName
, EP.fiscalYear
, EB.brokerName as brokerName
, EA.firstName+' '+EA.lastName as AnalystName
, EDND.tradingitemid
, (select DI.dataItemName from ciqdataitem DI where DI.dataitemid = EDND.dataitemid) as dataItemName
, (select EAS.accountingStandardDescription from dbo.ciqEstimateAccountingStd EAS where EAS.accountingStandardId = EDND.accountingStandardId) as AccountingStandard
, (select Cu.ISOCode from ciqCurrency Cu where Cu.currencyid = EDND.currencyid) as ISOCode
, (select EST.estimateScaleName from ciqEstimateScaleType EST where EST.estimateScaleId = EDND.estimateScaleId) as estimateScaleName
,EDND.dataItemValue,EDND.effectiveDate,EDND.isExcluded
from ciqEstimatePeriod EP
--- link estimate period table to detailed numeric data table
----------------------------------------------------------
join ciqEstimateDetailNumericData EDND
on EDND.estimatePeriodId = EP.estimatePeriodId
and GETDATE() between EDND.effectiveDate and EDND.toDate
----------------------------------------------------------
left outer join ciqEstimateBroker EB
on EB.estimateBrokerId = EDND.estimateBrokerId --- left outer join must be used if you receive any of the anonymous estimates packages
left outer join ciqEstimateAnalyst EA
on EA.estimateAnalystId = EDND.estimateAnalystId --- left outer join must be used if you receive any of the anonymous estimates packages
where EP.companyId = 112350 -- IBM
and EP.periodTypeId = 1 -- annual
and EDND.dataItemId = 21634 --- EPS Normalized (Detailed)
and EP.fiscalYear = 2010
order by 4,5,6,10
This query is complicated enough as it is - I would hate to add to it. I would probably turn it into a view or a stored proc and query from it as needed. Then you could also have, instead of just GETDATE(), a date range as input.

SQL to select parent that contains child specific value

I am actually creating a crystal reports v12 (2008) report but can't find the method, using Crystal, to extract the following. I thought if someone might answer in SQL language, I could piece it together.
2 Tables: hbmast, ddmast
SELECT hbmast.custno, hbmast.id, ddmast.name, ddmast.status
WHERE hbmast.custno = ddmast.custno
GROUP BY hbmast.id
pseudo code::show all hbmast values that have ddmast.status = '2'
Sample output:
J0001, 111222, PAUL JONES, 1
111222, PAUL JONES, 2
111222, PAUL JONES, 1
K0001, 555333, PETER KING, 3
555333, PETER KING, 1
I would like to have Paul show on the report with all child records but Peter should not be returned on the report since he has no child records with '2' for ddmast.status field.
Thanks for the help
I think you're looking for this:
select hb.custno, hb.id, dd.name, dd.status from hbmast hb
join ddmast dd on hb.custno = dd.custno
where hb.custno in (
select custno from ddmast
where status = '2'
)
Let me know if this returns your expected result.
The way to achieve this in Crystal would be to have your hb and dd tables then a second alias of the dd table.
So you would filter your dd alias table where status = 2 then join to your hb table and back to your dd table (not the alias). The SQL would end up looking like:
select hb.custno, hb.id, dd.name, dd.status from hbmast hb
inner join ddmast dd on hb.custno = dd.custno
inner join ddmast dd2 on hb.custno = dd2.custno
where dd2.status = '2'
Andomar makes a valid point about duplicate records appearing if there is more than 1 record per group with a status of 2. If that is the case you can either group by primary key and show row information at group footer level OR use a sql expression with a subquery in your selection formula instead of the double join method.
SQL Expression: (select count(*) from ddmast where custno = "hbmast.custno" and status = '2')
Then record selection expert: {%sqlexpression} > 0
And a different way to get the same...
SELECT hb.custno, hb.id, dd.name, dd.status
FROM hbmast hb
INNER join ddmast dd
on hb.custno = dd.custno
INNER JOIN DDMAST2 DD2
on DD2.custNo = HB.custNo
AND DD2.Status='2'