Group several tables on a datetime including same table - sql

I am trying to collate several tables into a single row display for reporting purposes - all on a date time value. I do not have an issue when joining disparate tables on say a CTE of datetime values. When I encounter a table with a different FK I get too many records for this to work. Example
Reporting Structure
DateTime EngineName EngineValue PartName PartValue Part2Name Part2Value
20160118 00:00 Engine1 100 Part1 100 Part2 200
Engine Table
DateTime EngineName EngineValue
20160118 00:00 Engine1 100
Part Table
DateTime Name(fK) Value
20160118 00:00 Part1 100
20160118 00:00 Part2 200
At this point I have tried to create a CTE of datetimes and join the logs to the CTE on the datetime. I can't get this work and I know I'm not the first to create reports like this.

If we assume Part1 and Part2 are actual names and static in number then you can do something like this:
SELECT E.DateTime, E.EngineName, E.EngineValue, P1.PartName, P1.PartValue, P2.Part2Name, P2.Part2Value
FROM Engine E
INNER JOIN Part P1
on E.DateTime = P1.DateTime
and P1.Name = 'Part1'
INNER JOIN Part P2
on E.DateTime = P1.DateTime
and P2.Name = 'Part2'
If they are more dynamic in nature then Dynamic SQL will be needed.
If there's always 2 you could use an analytic such as row_number() over (partition by datetime order by name) as RN and join on row 1 and 2 instead of part names. i.e. instead of P2.Name ='part2' use P2.RN = 2 but this limits it to always be only 2 parts, we ignore all others.
So the real question is:
Are the # of parts per engine needing to be reported always 2? always the same "part name" with different values? or dynamic in nature?

Related

how to turn a wide table into a long table

I have a wide table that looks like this:
Case REFERENCE
OUTCOME_EMP_SITUATION
MONTH1_EMP_SITUATION
MONTH1_REASON
MONTH3_EMP_SITUATION
MONTH3_REASON
MONTH6_EMP_SITUATION
MONTH6_REASON
12345
Employed
Employed
Outcome at 1 month
Employed
Outcome at 3 month
Employed
Outcome at 6 month
this is survey results that people completed after they finished employment program. They complete the survey 4 times, once immediately after finishing the program, and then after 1/3/6 month. the problem is, the results for immediately after program completion are in one table (Outcome table) and the 1/3/6 month checkpoint results are in another table (Checkpointinfo table) I would like to combine those tables to create a long table so that instead of having "Outcome" in 5 different columns, I would have it in one column and it would look like this:
Case Reference
Outcome_emp_situation
Month_Reason
12345
Employed
NULL
12345
Employed
Outcome at 1 month
12345
Employed
Outcome at 3 month
12345
Employed
Outcome at 6 month
I was wondering if anyone could please help me out to turn this wide query into a long table query.
Here is the query for the wide table:
Select
ch.CASEREFERENCE, oc.OUTCOME_DATE, oc.OUTCOME_REFERENCE_ID, oc.OUTCOME_EMP_SITUATION, oc.OUTCOME_EMPLOYMENT_TYPE, oc.OUTCOME_NUM_JOBS, oc.OUTCOME_NAICS_DESC, oc.OUTCOME_JOB_NATURE,
oc.OUTCOME_WORK_HOURS, oc.OUTCOME_WAGE, oc.OUTCOME_STUDENT_STATUS, oc.OUTCOME_GOT_SERVICE, oc.OUTCOME_RIGHT_SERVICE, oc.OUTCOME_RECOMMEND_PROGRAM,
ck1.REASONCODE AS REASONCODE1,
CASE WHEN ck1.REASONCODE = 'OT1' THEN "Outcome at 1 month" END MONTH1_REASON,
ck1.MONTH_START_DATE AS MONTH1_START_DATE, ck1.MONTH_END_DATE AS MONTH1_END_DATE, ck1.MONTH_OUTCOME_EMP_SITUATION AS MONTH1_OUTCOME_EMP_SITUATION,
ck1.MONTH_EMPLOYMENT_TYPE AS MONTH1_EMPLOYMENT_TYPE, ck1.MONTH_NUM_JOBS AS ,MONTH1_NUM_JOBS, ck1.MONTH_NAICS_DESC AS MONTH1_NAICS_DESC, ck1.MONTH_JOB_NATURE AS MONTH1_JOB_NATURE,
ck1.MONTH_WORK_HOURS AS MONTH1_WORK_HOURS, ck1.MONTH_WAGE AS MONTH1_WAGE, ck1.MONTH_STUDENT_STATUS AS MONTH1_STUDENT_STATUS, ck1.MONTH_GOT_SERVICE AS MONTH1_GOT_SERVICE,
ck1.MONTH_RIGHT_SERVICE AS MONTH1_RIGHT_SERVICE, ck1.MONTH_RECOMMEND_PROGRAM AS MONTH1_RECOMMEND_PROGRAM, ck1.MONTH_RESUBMIT_MILESTONE AS MONTH1_RESUBMIT_MILESTONE,
ck1.MONTH_MILESTONE_ACHIEVED AS MONTH1_MILESTONE_ACHIEVED, ck1.MONTH_APPROVED_DATE AS MONTH1_APPROVED_DATE,
ck3.REASONCODE AS REASONCODE3,
CASE WHEN ck3.REASONCODE = 'OT3' THEN "Outcome at 3 month" END MONTH3_REASON,
ck3.MONTH_START_DATE AS MONTH3_START_DATE, ck3.MONTH_END_DATE AS MONTH3_END_DATE, ck3.MONTH_OUTCOME_EMP_SITUATION AS MONTH3_OUTCOME_EMP_SITUATION,
ck3.MONTH_EMPLOYMENT_TYPE AS MONTH3_EMPLOYMENT_TYPE, ck3.MONTH_NUM_JOBS AS ,MONTH3_NUM_JOBS, ck3.MONTH_NAICS_DESC AS MONTH3_NAICS_DESC, ck3.MONTH_JOB_NATURE AS MONTH3_JOB_NATURE,
ck3.MONTH_WORK_HOURS AS MONTH3_WORK_HOURS, ck3.MONTH_WAGE AS MONTH3_WAGE, ck3.MONTH_STUDENT_STATUS AS MONTH3_STUDENT_STATUS, ck3.MONTH_GOT_SERVICE AS MONTH3_GOT_SERVICE,
ck3.MONTH_RIGHT_SERVICE AS MONTH3_RIGHT_SERVICE, ck3.MONTH_RECOMMEND_PROGRAM AS MONTH3_RECOMMEND_PROGRAM, ck3.MONTH_RESUBMIT_MILESTONE AS MONTH3_RESUBMIT_MILESTONE,
ck3.MONTH_MILESTONE_ACHIEVED AS MONTH3_MILESTONE_ACHIEVED, ck3.MONTH_APPROVED_DATE AS MONTH3_APPROVED_DATE,
ck6.REASONCODE AS REASONCODE6,
CASE WHEN ck6.REASONCODE = 'OT6' THEN "Outcome at 6 month" END MONTH6_REASON,
ck6.MONTH_START_DATE AS MONTH6_START_DATE, ck6.MONTH_END_DATE AS MONTH6_END_DATE, ck6.MONTH_OUTCOME_EMP_SITUATION AS MONTH6_OUTCOME_EMP_SITUATION,
ck6.MONTH_EMPLOYMENT_TYPE AS MONTH6_EMPLOYMENT_TYPE, ck6.MONTH_NUM_JOBS AS ,MONTH6_NUM_JOBS, ck6.MONTH_NAICS_DESC AS MONTH6_NAICS_DESC, ck6.MONTH_JOB_NATURE AS MONTH6_JOB_NATURE,
ck6.MONTH_WORK_HOURS AS MONTH6_WORK_HOURS, ck6.MONTH_WAGE AS MONTH6_WAGE, ck6.MONTH_STUDENT_STATUS AS MONTH6_STUDENT_STATUS, ck6.MONTH_GOT_SERVICE AS MONTH6_GOT_SERVICE,
ck6.MONTH_RIGHT_SERVICE AS MONTH6_RIGHT_SERVICE, ck6.MONTH_RECOMMEND_PROGRAM AS MONTH6_RECOMMEND_PROGRAM, ck6.MONTH_RESUBMIT_MILESTONE AS MONTH6_RESUBMIT_MILESTONE,
ck6.MONTH_MILESTONE_ACHIEVED AS MONTH6_MILESTONE_ACHIEVED, ck6.MONTH_APPROVED_DATE AS MONTH6_APPROVED_DATE
FROM PROGRAM as pg
LEFT JOIN CASEINFO as ch ON pg.CASEID = ch.CASEID
LEFT JOIN OUTCOME as oc ON pg.CASEID = oc.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT1')ck1 ON pg.CASEID = ck1.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT3')ck3 ON pg.CASEID = ck3.CASEID
LEFT JOIN ( SELECT cp.CASEID, cp.REASONCODE, cp.MONTH_OUTCOME_EMP_SITUATION, cpi.* FROM CHECKPOINT cp LEFT JOIN CHECKPOINTINFO cpi ON cp.CASEREVIEWID = cpi.CASEREVIEWID WHERE cpi.REASONCODE = 'OT6')ck6 ON pg.CASEID = ck6.CASEID
If someone could please help me turn this wide table into a long table, it would be much appreciated.
thank you
You need to do unpivot for outcome and reason columns. But first you need an extra column for overall reason. This is the query:
with a as (
select 12345 as case_reference,
'Employed' as OUTCOME_EMP_SITUATION,
'Employed' as MONTH1_EMP_SITUATION,
'Outcome at 1 month' as MONTH1_REASON,
'Employed' as MONTH3_EMP_SITUATION,
'Outcome at 3 month' as MONTH3_REASON,
'Employed' as MONTH6_EMP_SITUATION,
'Outcome at 6 month' as MONTH6_REASON
from dual
)
select
case_reference,
outcome_emp_situation,
month_reason
from (
select a.*,
cast(null as varchar2(1000)) as reason
from a
) a
unpivot(
(Outcome_emp_situation, Month_Reason)
for mon in (
(OUTCOME_EMP_SITUATION, reason) as 0,
(MONTH1_EMP_SITUATION, MONTH1_REASON) as 1,
(MONTH3_EMP_SITUATION, MONTH3_REASON) as 3,
(MONTH6_EMP_SITUATION, MONTH6_REASON) as 6
)
)
order by mon asc
CASE_REFERENCE | OUTCOME_EMP_SITUATION | MONTH_REASON
-------------: | :-------------------- | :-----------------
12345 | Employed | null
12345 | Employed | Outcome at 1 month
12345 | Employed | Outcome at 3 month
12345 | Employed | Outcome at 6 month
db<>fiddle here
UPD: The explanation below.
The tuple just after unpivot keyword is the result column names, column after for keyword identifies column group which produced that values. Tuples inside in define the columns' groups: for each group that columns' values will be passed to the corresponding (by position) columns of the result tuple and new row will be generated with the value of for column defined after as keyword.
So if you need more columns to be transferred to each row, you need to add new columns to the result tuple (after unpivot) and to each column group inside in. If for some reason you have not enough columns to pass for some groups, you can wrap your source query with outer select and add dummy (or constantly valued) columns for that groups.
Note:
Datatypes of each tuples should be the same (or convertible according to default datatype precedence). I.e. each tuple's member on the same position should have the same type, members at different positions may have different types.
You can reuse the same column in multiple groups and positions.

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

SQL Case with calculation on 2 columns

I have a value table and I need to write a case statement that touches 2 columns: Below is the example
Type State Min Max Value
A TX 2 15 100
A TX 16 30 200
A TX 31+ 500
Let say I have another table that has the following
Type State Weight Value
A TX 14 ?
So when I join the table , I need a case statement that looks at weight from table 2 , type and state - compare it to the table 1 , know that the weight falls between 2 and 15 from row 1 and update Value in table 2 with 100
Is this doable ?
Thanks
It returns 0 if there aren't rows in this range of values.
select Type, State, Weight,
(select coalesce(Value, 0)
from table_b
where table_b.Type = table_a.Type
and table_b.State = table_a.State
and table_a.Value between table_b.Min and table_b.Max) as Value
from table_a
For an Alteryx solution: (1) run both tables into a Join tool, joining on Type and State; (2) Send the output to a Filter tool where you force Weight to be between Min and Max; (3) Send that output to a Select tool, where you grab only the specific columns you want; (since the Join will give you all columns from all tables). Done.
Caveats: the data running from Join to Filter could be large, since you are joining every Type/State combination in the Lookup table to the other table. Depending on the size of your datasets, that might be cumbersome. Alteryx is very fast though, and at least we're limiting on State and Type, so if your datasets aren't too large, this simple solution will work fine.
With larger data, try to do it as part of your original select, utilizing one of the other solutions given here for your SQL query.
Considering that Min and Max columns in first table are of Integer type
You need to use INNER JOIN on ranges
SELECT *
FROM another_table a
JOIN first_table b
ON a.type = b.type
AND a.State = b.State
AND a.Weight BETWEEN b.min AND b.max

Efficiently identify all FK items with n>3 dates within any 8 week period from a SQL table?

I have a ~400,000 row table containing the dates at which a collection of ~30,000 people had appointments. Each row has the patient ID number and an appointment date. I want to efficiently select people who had at least 4 appointments in an 8 week span. Ideally, I would also flag the appointments that were within this 8 week span as I did so. I am working in a server environment that does not allow CLR aggregate functions. Is this possible to do in SQL server? If so, how?
What I've thought about:
If I could write my own aggregate function to do this via GROUP BY that would obviously be best - but I can't seem to find any way to do it with the built in aggregate functions.
I can add a column to my original table giving a date 8 weeks out from any given appointment, but can't come up with any way that doesn't involve a for loop to then ask the question row by row whether there are at least 3 other appointments within that window.
Finally, I've even though that perhaps I could just do GROUP BY but somehow create 100 new columns (as there are up to that many appointments for some patients) to create a table that contains every appointment indexed by patient, but even as a SQL newbie I'm pretty sure that as soon as I get to the point of imagining adding 100 new columns I'm going down the wrong road....
For clarity of discussion, here is some notation:
MyTable:
ApptID PatientID ApptDate (in smalldatetime)
--------------------------------------------------
Apt1 Pt1 Datetime1
Apt2 Pt1 Datetime2
Apt3 Pt2 Datetime3
... ... ...
Desired output (one option):
PatientID 4aptsIn8weeks? (Boolean) InitialApptDateForWin
Pt1 1 Datetime1
Pt2 0 NULL
Pt3 1 Datetime3
...
Desired output (another option):
ApptID PatientID ApptDate InAn8wkWindow? InitialApptDateForWin
Apt1 Pt1 Datetime1 1 Datetime1
Apt2 Pt1 Datetime2 1 Datetime1
Apt3 Pt2 Datetime3 0 NULL
... ... ...
But really, any output format that will in the end let me select patients and appointments that meet this criterion would be dandy....
Thanks for any ideas!
EDIT: Here's a slightly decompressed outline of my implementation of the selected answer below, just in case the details are helpful for anyone else (being new to SQL, it took me a couple stabs to get it working):
WITH MyTableAlias AS (
SELECT * FROM MyTable
)
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1,
MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
MyTable.PatientID = MyTableAlia.PatientID
AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
);
-- Since this gives for any given two visit dates 3 hits
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;
-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first. Now we want to look
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;
-- From here on it's just a matter of how I want to handle patients with two
-- separate V1 examples meeting criteria...
Rough outline of the query:
INNER JOIN the table ("table") with itself ("alias"), the ON clause would be:
table.patientid = alias.patientid
table.appointment_date < alias.appointment_date
datediff(table.appointment_date, alias.appointment_date) <= 8 week
Then GROUP BY table.patientid, table.appointment_date
Output table.patientid, table.appointment_date, MAX(alias.appointment_date), COUNT(*)
Add a HAVING COUNT(*) > n clause
There are some issues though:
With 400,000 rows the JOIN could produce a very large result set
It will count some date ranges twice. E.g. if there were 4 visits in 9 week period then it will return two rows (#1, #2, #3 and #2, #3, #4).

Sorting by date across two separate columns in a Full Outer Join

I have two columns of data I am lining up using a Full Outer Join but it includes two separate date columns which make it challenging to sort by.
Table 1 has sales rank data for a product.
Table 2 has actual sales data for the same product.
Each table may have entries for dates on which the other does not.
So envision after the full join, we end up with something like this simplified example:
ProdID L.Date P.Date Rank Units
101 null 2011-10-01 null 740
101 2011-10-02 2011-10-02 23 652
101 2011-10-03 null 32 null
Here is the query I am using to pull this data:
select L.ListID, L.ASIN, L.date, L.ranking, P.ASIN, P.POSdate, P.units from ListItem L
full outer join POSdata P on
L.ASIN = P.ASIN and
L.date = P.POSdate and
(L.ListID = 1 OR L.ASIN is null)
where (L.ASIN = 'xxxxxxxxxx' and L.ListID = 1) or
(P.ASIN = 'xxxxxxxxxx' and L.BookID is null)
order by POSdate, date
It's a bit more complex because products may appear on multiple lists so I have to account for that as well, but it returns the data I need. I am open to suggestions on improving it of course should someone have one.
The problem is, how can I sort this properly when both date columns are likely to have at least some NULLs in them. The way I am Ordering By now will not work when both columns have at one NULL.
Thanks.
ORDER BY ISNULL(P.POSdate,L.date) should do what you need I think?