Can you help me figure out how to pivot this table:
╔═══════════╦═════════════╦══════╦════════╦════════╗
║ Big Group ║ Small Group ║ Kids ║ Adults ║ Elders ║
╠═══════════╬═════════════╬══════╬════════╬════════╣
║ 1 ║ 1 ║ 10 ║ 20 ║ 5 ║
║ 1 ║ 2 ║ 15 ║ 10 ║ 10 ║
║ 2 ║ 1 ║ 20 ║ 0 ║ 15 ║
╚═══════════╩═════════════╩══════╩════════╩════════╝
Into something like this?
╔═══════════╦═════════════╦══════╦════════╦════════╦═════════════╦══════╦════════╦════════╗
║ Big Group ║ Small Group ║ Kids ║ Adults ║ Elders ║ Small Group ║ Kids ║ Adults ║ Elders ║
╠═══════════╬═════════════╬══════╬════════╬════════╬═════════════╬══════╬════════╬════════╣
║ 1 ║ 1 ║ 10 ║ 20 ║ 5 ║ 2 ║ 15 ║ 10 ║ 10 ║
║ 2 ║ 1 ║ 20 ║ 0 ║ 15 ║ ║ ║ ║ ║
╚═══════════╩═════════════╩══════╩════════╩════════╩═════════════╩══════╩════════╩════════╝
The number of small groups per Big group is variable, and that's what is being difficult for me to understand how to do it.
Can anyone help me?
Thanks in advance
There is a way but the overhead of using PIVOT is to provide the list of all values which needs to be pivoted.
As you also need each small group to be pivoted we need to create a virtual column between big group and small group to be used in pivot clause as you see below
with table1
as
(select 1 bg
,1 sg,10 kids
,20 adult
from dual
union all
select 1,2,15,25 from dual
union all
select 2,1,20,0 from dual
)
select *
from
(
select t1.*,t1.bg||'_'||t1.sg piv
from table1 t1
)
pivot
(
max(sg) sg,max(kids) kids,max(adult) adult
for piv in ('1_1' as bg1_sg1
,'1_2' as bg2_sg2
,'2_1' as bg2_sg1)
)
order by bg
Demo
Context
I'm using Microsoft SQL Server 2016.
There is a database table "Raw_data", that contains the status of a machine, together with it's starting time. There are several machines and each one writes it's status to the database multiple times per minute.
To reduce the data volume I'm trying to aggregate the data into 1-Minute chunks to save it for further analysis. Due to a capacity constraint, I want to execute this transition-logic every few minutes (e.g. scheduled SQL Server Agent Job), delete the raw data and just keep the aggregated data.
To simplify the example, let's assume "Raw_data" looks something like this:
╔════╦════════════╦════════╦═════════════════════╗
║ id ║ fk_machine ║ status ║ created_at ║
╠════╬════════════╬════════╬═════════════════════╣
║ 1 ║ 2222 ║ 0 ║ 2020-08-19 22:15:00 ║
║ 2 ║ 2222 ║ 3 ║ 2020-08-19 22:15:30 ║
║ 3 ║ 2222 ║ 5 ║ 2020-08-19 23:07:00 ║
║ 4 ║ 2222 ║ 1 ║ 2020-08-20 00:20:00 ║
║ 5 ║ 2222 ║ 0 ║ 2020-08-20 00:45:00 ║
║ 6 ║ 2222 ║ 5 ║ 2020-08-20 02:20:00 ║
╚════╩════════════╩════════╩═════════════════════╝
Also there are database tables "Dim_date" and "Dim_time", that look something like that:
╔══════════╦══════════════╗
║ datekey ║ date_iso8601 ║
╠══════════╬══════════════╣
║ 20200101 ║ 2020-01-01 ║
║ 20200102 ║ 2020-01-02 ║
║ ... ║ ... ║
║ 20351231 ║ 2035-12-31 ║
╚══════════╩══════════════╝
╔═════════╦══════════╦═════════════════╗
║ timekey ║ time_iso ║ min_lower_bound ║
╠═════════╬══════════╬═════════════════╣
║ 1 ║ 00:00:01 ║ 00:00:00 ║
║ 2 ║ 00:00:02 ║ 00:00:00 ║
║ ... ║ ... ║ ... ║
║ 80345 ║ 08:03:45 ║ 08:03:00 ║
║ ... ║ ... ║ ... ║
║ 134504 ║ 13:45:04 ║ 13:45:00 ║
║ 134505 ║ 14:45:05 ║ 13:45:00 ║
║ ... ║ ... ║ ... ║
║ 235959 ║ 23:59:59 ║ 23:59:59 ║
╚═════════╩══════════╩═════════════════╝
The result should look like this:
╔══════════════╦═════════════════╦════════════╦════════╦═══════════════╗
║ date_iso8601 ║ min_lower_bound ║ fk_machine ║ status ║ total_seconds ║
╠══════════════╬═════════════════╬════════════╬════════╬═══════════════╣
║ 2020-08-19 ║ 22:15:00 ║ 2222 ║ 0 ║ 30 ║
║ 2020-08-19 ║ 20:15:00 ║ 2222 ║ 3 ║ 30 ║
║ 2020-08-19 ║ 20:16:00 ║ 2222 ║ 3 ║ 60 ║
║ 2020-08-19 ║ 20:17:00 ║ 2222 ║ 3 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-19 ║ 23:06:00 ║ 2222 ║ 3 ║ 60 ║
║ 2020-08-19 ║ 23:07:00 ║ 2222 ║ 5 ║ 60 ║
║ 2020-08-19 ║ 23:08:00 ║ 2222 ║ 5 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-20 ║ 00:19:00 ║ 2222 ║ 5 ║ 60 ║
║ 2020-08-20 ║ 00:20:00 ║ 2222 ║ 1 ║ 60 ║
║ 2020-08-20 ║ 00:21:00 ║ 2222 ║ 1 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-20 ║ 00:44:00 ║ 2222 ║ 1 ║ 60 ║
║ 2020-08-20 ║ 00:45:00 ║ 2222 ║ 0 ║ 60 ║
╚══════════════╩═════════════════╩════════════╩════════╩═══════════════╝
Attempt
To calculate the duration of each status per minute I used an CTE and LEAD to fetch the starting date and time from the next status in the database table, then joined with the dimension tables and aggregated the result.
WITH CTE_MACHINE_STATES(START_DATEKEY,
START_TIMEKEY,
FK_MACHINE,
END_DATEKEY,
END_TIMEKEY)
AS (SELECT CAST(CONVERT(CHAR(8), CREATED_AT, 112) AS INT), -- ISO: yyyymmdd
CONVERT(INT, REPLACE(CONVERT(CHAR(8), READING_TIME, 108), ':', '')),
FK_MACHINE,
STATUS,
CAST(CONVERT(CHAR(8), LEAD(CREATED_AT, 1) OVER(PARTITION BY FK_MACHINE
ORDER BY CREATED_AT), 112) AS INT),
CONVERT(INT, REPLACE(CONVERT(CHAR(8), LEAD(CREATED_AT, 1) OVER(PARTITION BY FK_MACHINE
ORDER BY CREATED_AT), 108), ':', ''))
FROM RAW_DATA)
SELECT DATE_ISO8601,
MIN_LOWER_BOUND,
FK_MACHINE,
STATUS,
SUM(1) AS TOTAL_SECONDS -- Duration
FROM CTE_MACHINE_STATES
CROSS JOIN DIM_DATE
CROSS JOIN DIM_TIME
WHERE TIMEKEY >= START_TIMEKEY AND
TIMEKEY < END_TIMEKEY AND
END_TIMEKEY IS NOT NULL AND -- last entry per machine and status
DATEKEY BETWEEN START_DATEKEY AND END_DATEKEY
GROUP BY FK_MACHINE,
STATUS,
DATE_ISO8610,
MIN_LOWER_BOUND
ORDER BY DATE_ISO8610,
MIN_LOWER_BOUND;
The Problem
If the status lasts past midnight it won't be aggregated correctly. For example the status at id = 3 in "Raw_data" starts at 23:07 and ends on 00:20 the next day. Here, timekey is greater than end_timekey, so the status get's excluded from the resulting table by the filter TIMEKEY < END_TIMEKEY. I haven't come up with a solution on how to change the join-condition to include such long-lasting states, but get the expected result.
PS: I already wrote, that normally status-updates are happening every several seconds. Thus, the problem only occurs in edge cases, e.g. if a machine get's turned off.
Solution
Unfortunately I did not receive an answer on how to get the expected result using the date- and time dimension tables. But dnoeth's approach using a recursive CTE is good, so I went with it:
WITH cte_outer AS (
SELECT fk_machine,
status,
created_at,
DATEADD(minute, DATEDIFF(minute, '2000', created_at), '2000') AS min_lower_bound, --truncates seconds from start time
LEAD(created_at) OVER(PARTITION BY fk_machine ORDER BY created_at) AS end_time
FROM raw_data
),
cte_recursive AS (
SELECT fk_machine,
status,
min_lower_bound,
end_time,
CASE
WHEN end_time > DATEADD(minute, 1, min_lower_bound)
THEN DATEDIFF(s, created_at, DATEADD(minute, 1, min_lower_bound))
ELSE DATEDIFF(s, created_at, end_time)
END AS total_seconds
FROM cte_outer
UNION ALL
SELECT fk_machine,
status,
DATEADD(minute, 1, min_lower_bound), -- next time segment (minute)
end_time,
CASE
WHEN end_time >= DATEADD(minute, 2, min_lower_bound)
THEN 60
ELSE DATEDIFF(s, DATEADD(minute, 1, min_lower_bound), end_time)
END
FROM cte_recursive
WHERE end_time > DATEADD(minute, 1, min_lower_bound)
)
SELECT min_lower_bound,
fk_machine,
status,
total_seconds
FROM cte_recursive
ORDER BY fk_machine,
min_lower_bound
This is a use-case for a recursive CTE, increasing created_at by one minute per recursion:
with cte as
(
select fk_machine
,status
,start_minute
,end_time
,case
when end_time > dateadd(minute, 1,start_minute)
then datediff(s, created_at, dateadd(minute, 1,start_minute))
else datediff(s, created_at, end_time )
end as seconds
from
(
select fk_machine
,status
,created_at
,dateadd(minute, datediff(minute, 0, created_at), 0) as start_minute
,lead(created_at)
over (PARTITION BY fk_machine
order by created_at) as end_time
from tab
) as dt
union all
select fk_machine
,status
,dateadd(minute, 1,start_minute)
,end_time
,case
when end_time >= dateadd(minute, 2,start_minute)
then 60
else datediff(s, dateadd(minute, 1,start_minute), end_time)
end
from cte
where end_time > dateadd(minute, 1,start_minute)
)
select * from cte
order by 1,3,4;
See fiddle
For something like this, concatenating the keys to a single datetime isn’t as costly as it might seem. Then you can call DATEDIFF() to check for positive, negative, absolute, values for the comparison. I’ve run something similar translating instantaneous data to minute aggregates across multiple decades, and datediff really makes the difference. However, this would do much better if you simply pull the raw data and perform the calculations in a language with a good datetime library. SQL is always the answer until it isn’t.
What’s likely causing one of the problems here is the following statement:
WHERE TIMEKEY >= START_TIMEKEY AND
TIMEKEY < END_TIMEKEY AND
END_TIMEKEY IS NOT NULL AND
DATEKEY BETWEEN START_DATEKEY AND END_DATEKEY
If the date and time aren’t separated, you can say:
WHERE DateTimeKey >= START_DateTimeKey AND
DateTimeKey < END_DateTimeKey AND
END_TIME-KEY IS NOT NULL
If you are trying to aggregate by a time value, it would be helpful to eliminate any timekey table, that may be another source of problems. It may be a good idea to replace the timekey table with a recursion and a period duration. You will also need to account for these conditions:
End time of the event must always be after the start time of the aggregate period start time:
DateDiff(second, Period_Start_Time, Event_End) > 0
Start time of the event must always be before the end of the aggregate period end time:
DateDiff(second, Period_Start_Time, Event_Start) <= #Period_Duration
There are several ways to distribute the event data across the periods, but datediff helps with linear distribution as well.
There might be a better way to accomplish but here is what I have:
Environment - Plex ERP -SQL Query Editor
Back-end - SQL Server 2012
Summary
Parts have a "unit" worth based on manufacturing complexity
Some days we ship part\s. Other days we don't
The part units are summed for each day they are scheduled to ship 'Rel_Units_Calc'
The plant gets credit for 5 units a day (when open) 'Unit_multiplier'
This daily credit is summed for each day 'Unit_Capacity'
In order to prevent an overloading of capacity in a slow month, I need to prevent the plant from getting the 5 unit credit when the SUM(Unit_Capacity) will exceed the SUM(Rel_Units_Calc).
A report will be created that will use a case statement to evaluate if the Rel_Units_Calc > Unit_Capcity, then show red else green.
Detailed Scope
I'm trying to create a sales report that will prevent the sales group from overloading (exceeding the capacity) of the plant. To simplify, lets say we have 3 parts (Part A, B, & C). Part A is simple and worth 1 "Unit". Part B is a little more complex and worth 2 "Units". Part C is the most complex and worth 5 "Units". The plant can process 5 units a day that it is open.
The report will show when a day has been overloaded by showing the color Red and green when days are not overloaded. Any days in red will need to have the sales order moved out.
My approach was to take the units * order quantity to give me the 'Release_Units'. Then I am doing a sum(Release_Units) to show a tally for each day in a field called 'Release_Units_Calc'.
I have another field called 'Unit_Multiplier' that gives the 5 unit per day credit on eligible days (excludes weekends and holidays). Then I am doing a sum(Unit_Multiplier) to show a tally for each day in a field called 'Unit_Capacity'.
The color Red and Green were going to be determined by using a case statement comparing the two columns Release_Units_Calc and Unit_Capacity. When Unit_capacity = Release_Unit then green else red.
This works ok until you look at December when we have a slow down for these parts and then we start banking Unit_Capacity. The Unit_Capacity field continues to accrue the 5 units per day even after it has surpassed the Release_Units_Calc. These parts are not produced in December so think 20 business days * 5 units per day gives us 100 Units on Jan 1 which is not good. Essentially, this would cause the sales group to overwhelm the plant in January as they will have 100 banked units to draw from.
I would like for the Unit_Capacity which again, is a SUM(Unit_Multiplier) to not exceed the Release_Units_Calc which is from SUM(Release_Units).
SQL Below:
This temp table marks the days that should be included for the capacity
SELECT
DISTINCT FDPO.FULL_DATE,
----case statement below to create an include flag. It will exclude weekends unless we have a shipment going out
(CASE WHEN (DATENAME(dw, DATEADD(d,0,FDPO.Full_Date)) NOT IN
('Saturday','Sunday')) THEN 1
WHEN (DATENAME(dw, DATEADD(d,0,FDPO.Full_Date)) IN
('Saturday','Sunday')) AND FDPO.DUE_DATE IS NOT NULL THEN 1
ELSE 0 END) AS 'Include'
INTO #Capacity_Temp1
FROM #FDPO AS FDPO
This temp table uses the include flag to remove the dates that should not accrue capacity and adds a capacity column.
SELECT
CT1.FULL_DATE,
#Unit_Multiplier AS 'Unit_multiplier'
INTO #Capacity_Temp2
FROM #Capacity_Temp1 AS ct1
WHERE ct1.INCLUDE= 1
The temp table below adds the unit multiplier up for each day
SELECT
DISTINCT CT2.FULL_DATE,
CT2.Unit_multiplier,
SUM(CT2.Unit_multiplier) OVER (Order By CT2.FULL_DATE) AS 'Unit_Capacity'
INTO #Unit_Capacity
FROM #Capacity_Temp2 AS CT2
The final display query
SELECT
RUC.FULL_DATE,
RUC.Release_Units,
RUC.Release_Units_Calc,--running talley of the release units
ISNULL(UC.Unit_multiplier,0) AS 'Unit_multiplier',
-- credit units given per day except when closed
UC.Unit_Capacity --running talley of the unit multiplier
FROM #RUC AS RUC
LEFT JOIN #Unit_Capacity AS UC
ON UC.FULL_DATE = RUC.FULL_DATE
The output at present:
╔══════╦═══════════════╦════════════════╦═════════════════╦═══════════════╗
║ DATE ║ Release_Units ║ Rel_Units_Calc ║ Unit_multiplier ║ Unit_Capacity ║
╠══════╬═══════════════╬════════════════╬═════════════════╬═══════════════╣
║ 8/3 ║ 15 ║ 15 ║ 5 ║ 5 ║
║ 8/4 ║ NULL ║ 15 ║ 5 ║ 10 ║
║ 8/5 ║ 20 ║ 50 ║ 5 ║ 15 ║
║ 8/5 ║ 15 ║ 50 ║ 5 ║ 15 ║
║ 8/6 ║ NULL ║ 50 ║ 0 ║ NULL ║
║ 8/7 ║ NULL ║ 50 ║ 5 ║ 20 ║
║ 8/8 ║ NULL ║ 50 ║ 5 ║ 25 ║
║ 8/9 ║ NULL ║ 50 ║ 5 ║ 30 ║
║ 8/10 ║ NULL ║ 50 ║ 5 ║ 35 ║
║ 8/11 ║ NULL ║ 50 ║ 5 ║ 40 ║
║ 8/12 ║ 15 ║ 65 ║ 5 ║ 45 ║
║ 8/13 ║ NULL ║ 65 ║ 0 ║ NULL ║
║ 8/14 ║ NULL ║ 65 ║ 5 ║ 50 ║
║ 8/15 ║ NULL ║ 65 ║ 5 ║ 55 ║
║ 8/16 ║ 10 ║ 75 ║ 5 ║ 60 ║
║ 8/17 ║ NULL ║ 75 ║ 5 ║ 65 ║
║ 8/18 ║ NULL ║ 75 ║ 5 ║ 70 ║
║ 8/19 ║ NULL ║ 75 ║ 0 ║ NULL ║
║ 8/20 ║ NULL ║ 75 ║ 0 ║ NULL ║
║ 8/21 ║ NULL ║ 75 ║ 5 ║ 75 ║
║ 8/22 ║ NULL ║ 75 ║ 5 ║ 80 ║
║ 8/23 ║ NULL ║ 75 ║ 5 ║ 85 ║
║ 8/24 ║ NULL ║ 75 ║ 5 ║ 90 ║
║ 8/25 ║ NULL ║ 75 ║ 5 ║ 95 ║
║ 8/26 ║ 10 ║ 95 ║ 5 ║ 100 ║
║ 8/27 ║ 10 ║ 95 ║ 5 ║ 105 ║
╚══════╩═══════════════╩════════════════╩═════════════════╩═══════════════╝
The problem occurs on 8/22 where we start to exceed the Rel_Units_Calc field. This allows an order to be placed on 8/27 that will not trigger the Red because the Unit_Capacity will be greater than the Rel_Units_Calc.
Sorry for the long post. I'm open to any suggestions if there is a better way to accomplish this.
Thanks in Advance,
Mike
I have a table that looks like this:
╔═════╦═══════════╦══════╦═══════╗
║ ID ║ Attribute ║ Year ║ Month ║
╠═════╬═══════════╬══════╬═══════╣
║ 1 ║ 15.2 ║ 2014 ║ 11 ║
║ 1 ║ 13.1 ║ 2014 ║ 12 ║
║ 1 ║ 5.6 ║ 2015 ║ 1 ║
║ 2 ║ 7.9 ║ 2014 ║ 11 ║
║ 2 ║ 12.3 ║ 2014 ║ 12 ║
║ 2 ║ 45.6 ║ 2015 ║ 1 ║
║ 3 ║ 23.2 ║ 2014 ║ 11 ║
║ 3 ║ 45.7 ║ 2014 ║ 12 ║
║ ... ║ ... ║ ... ║ ... ║
╚═════╩═══════════╩══════╩═══════╝
What I would like to do is average the "Attribute" for each ID over the last year, starting with the current month and year. For example, I might want to find the average of ID = 2 from June,2015 (6/2015) to June, 2014 (6/2014). I am trying to implement this using only a query (no VBA).
I have already been able to average the current year's "Attribute", but that only includes the months passed in this year, not the previous, and the real problem I am having is that the Year and Month are separated into two fields. If they were a date, this would have been trivial.
I have also been able to get the data for the current and previous years with this:
SELECT Table.ID, Table.Year, Table.Month, Table.Attribute
FROM Table
WHERE
(((Table.ID)="Some ID Number")
AND ((Table.Year)=Year(Date())
Or (Table.Year)=Year(Date())-1));
But again, I am stuck with the months and values for each. What is the best course of action? Is there a way to combine the Year and Month field into another query and do something with that (Just throwing out ideas, I'm pretty lost)?
Maybe something like this will work:
Select id, avg(Table.Attribute)
From Table
Where (Year*100 + Month) between 201406 and 201506
Group by id, (Year*100 + Month)
Another approach would be to re-create a strong typed date using DateSerial in a derived table, which you can then use this to do the Group By Id and apply the Average aggregate:
SELECT x.ID, Avg(x.Attribute) AS AvgOfAttribute
FROM (
SELECT MyTable.ID, MyTable.Attribute, DateSerial([Year], [Month],1) AS TheDate
FROM MyTable
) AS x
WHERE (((x.TheDate) >= '2014-06-01' And (x.TheDate) < '2015-06-01'))
GROUP BY x.ID;
Obviously, if you filter by a single ID then there is no need to apply the GROUP BY.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm not sure if the title quite paints the correct picture, but I'll attempt to explain. I have a table with start and end dates, and team members IDs (sort of like projects). I need to determine when they overlap, count the number of overlaps, and determine the order of overlap (sorted by the start date). My dummy data should clarify, but it's the latter of the 3 that I really want. Here is my current table:
╔═════════════╦════════════╦════════════╗
║ Team Member ║ Start Date ║ End Date ║
╠═════════════╬════════════╬════════════╣
║ 1 ║ 01/01/2015 ║ 04/01/2015 ║
║ 1 ║ 04/01/2015 ║ 06/01/2015 ║
║ 1 ║ 06/01/2015 ║ 07/01/2015 ║
║ 2 ║ 04/01/2015 ║ 06/01/2015 ║
║ 2 ║ 06/01/2015 ║ 10/01/2015 ║
║ 3 ║ 01/01/2015 ║ 09/01/2015 ║
║ 3 ║ 11/01/2015 ║ 13/01/2015 ║
╚═════════════╩════════════╩════════════╝
And here is what I want:
╔══════════════╦═════════════╦════════════╦════════════╗
║ OverlapOrder ║ Team Member ║ Start Date ║ End Date ║
╠══════════════╬═════════════╬════════════╬════════════╣
║ 0 ║ 1 ║ 01/01/2015 ║ 04/01/2015 ║
║ 1 ║ 1 ║ 04/01/2015 ║ 06/01/2015 ║
║ 0 ║ 1 ║ 06/01/2015 ║ 07/01/2015 ║
║ 0 ║ 2 ║ 04/01/2015 ║ 06/01/2015 ║
║ 1 ║ 2 ║ 06/01/2015 ║ 10/01/2015 ║
║ 0 ║ 3 ║ 01/01/2015 ║ 09/01/2015 ║
║ 0 ║ 3 ║ 11/01/2015 ║ 13/01/2015 ║
╚══════════════╩═════════════╩════════════╩════════════╝
So you can see that team members shouldn't affect each other's overlap order.
I'm using Access SQL at the moment, but shortly moving to SQL Server, so a solution in either is the goal!
P.S. you'll see that the 2nd and 3rd data row have the same start date. The overlap order between these 2 is arbitrary; they can be either way round.
EDIT: Changed sample dataset so it covers a new highlighted possibility. The OverlapOrder column can go from 0 to however high depending on how many projects overlap.
Assuming you are able to migrate to SQL Server 2005 or above, you can try the below solution which uses CTEs to do something like what you want:
;with cte as
(select *, row_number() over (partition by id order by startdate, enddate) rn
from tbl)
select *, case when (datediff(dd,s.startdate,t.enddate) >= 0) then s.rn - 1 else 0 end
from cte s
left join cte t on s.id = t.id and t.rn = s.rn - 1
You should take this with a pinch of salt however, since this solution might well be engineered specifically to the sample data set. I have not tested it out with different cases yet.
Demo