SSIS - Sorted Aggregating - sql

I have source data at the day granularity and I need to aggregate it to week granularity. Most fields are easy sum aggregations. But, I have one field that I need to take Sunday's value (kinda like a "first" aggregation) and another field that I need to take Saturday's value.
The road I'm going down using SSIS is to Multicast my source data three times, doing a regular Aggregate for the easy fields, and then using lookup joins to a calendar table to match the other two to Saturday and Sunday respectively to grab those values.... then merge joining everything back together.
Is there a better way to do this?
example source data:
What the output should look like:

Is there a better way to do this? Yes. Don't use a complicated SSIS solution for something that is a simple SQL statement
SELECT
Day,
SUM(Sales) Sales,
MAX(
CASE WHEN DATEPART(dw,Day) = 1 THEN BOP ELSE NULL END
) As BOP,
MAX(
CASE WHEN DATEPART(dw,Day) = 7 THEN EOP ELSE NULL END
) As EOP
FROM Table
GROUP BY Table
You might need to tweak the 1 and 7 depending on your server settings but hopefully you get the idea.

You can use First_value and Last_Value for this as below:
select top 1 with ties datepart(week, [day]) as [Week],
sum(sales) over(partition by datepart(week, [day])) as Sales,
FIRST_VALUE(BOP) over(partition by datepart(week, [day]) order by [day]) as BOP
, EOP = LAST_VALUE(EOP) over(partition by datepart(week, [day]) order by [day] RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING )
from #youraggregate
Order by Row_number() over(partition by datepart(week, [day]) order by [day])

Use Derived column transformation to get the week first
DATEPART("wk", Day)
After that use Aggregate using Week Column

Related

Issue with the repeated records in SQL

My dataset looks like below:
I am trying to get Min start date & Max end date of an employee whenever there is a team change.
The problem here is, the date is not coming for repeated team.
Any help would be appreciated..
Teradata has a nice SQL extension for normalizing overlapping date ranges. This assumes that you want to get extra rows when a month is missing, i.e. there's a gap:
SELECT
emp_id
,team
-- split the Period into seperate columns again
,Begin(pd)
,last_day(add_months(End(pd),-1)) -- end of previous month
FRO
(
SELECT NORMALIZE -- normalize overlapping periods
emp_id
,team
-- NORMALIZE only works with periods, so create a Period based on current date plus one month
,PERIOD(month_end_date
,last_day(add_months(month_end_date, 1))
) AS pd
FROM vt
) AS dt;
If I understand correctly, this is a gaps-and-islands problem that can be solved using the difference of row number.
You can use:
select emp_id, team, min(month_end_date), max(month_end_date)
from (select t.*,
row_number() over (partition by emp_id order by month_end_date) as seqnum,
row_number() over (partition by emp_id, team order by month_end_date) as seqnum_2
from t
) t
group by emp_id, team, (seqnum - seqnum_2);
Note: This puts the dates on a single row, which seems more useful than your expected results.

On SQL what would be the best way to do formulas?

I have the following structure
Columns structure
The date columns contain hours as well. I need to calculate the time beetween start and date on the same case number.
Example if case 1 has 2 subcases I need to calculate from the start date of the first one until the end date of the second one and add that to another column as "actual fixed time".
The thing is there could be 2 or 10 or 20 or more subcases.
Any thoughts?
You will want to use DATEDIFF in conjunction with GROUP BY
SELECT DATEDIFF(hour, MIN(start_date), MAX(end_date)) as ActualFixedTime FROM YourTable GROUP BY Casenumber
This will give you your actual fixed time for each case. In order to display this result for each row of your subcases, you join on it when selected your table:
SELECT YourTable.*, FixedTimes.ActualFixedTime
FROM YourTable
INNER JOIN (SELECT Casenumber,
DATEDIFF(hour, MIN(start_date), MAX(end_date)) AS ActualFixedTime
FROM YourTable
GROUP BY Casenumber)
AS FixedTimes
ON YourTable.Casenumber = FixedTimes.Casenumber

SQL - Grouping a 3 Column List Issue

I have list of values in a databse. There are many redundancies and I want to get rid of them. As you can see in the list below, dates [10/1/2011 - 7/1/2011) have a value of 0. I can make that into one entry with a start date of 10/1/2011 and an end date of 6/1/2011 and a value of 0 and delete all the other rows. I can do that for all the other similar values as well.
Here is my problem. I did this by writing a query that groups these together and then takes the Min(start date) as the start date and the Max(end date) as the end date. Notice that I have two groups of 0 though. When I group this in the query, the start date is 10/1/2010 and the end date is 2/1/2013. This is a problem elsewhere in my code because whenever it looks for a value at 2/1/2012 it finds 0 but it should be finding .955186.
Any suggestions on how I can write a query to account for this problem?
This is a "gaps-and-islands" problem.
If I assume that the first column is sufficient for defining the groups, then you can use a difference of row_number()s:
select min(startdate), max(enddate), value
from (select t.*,
row_number() over (order by startdate) as seqnum,
row_number() over (partition by value order by startdate) as seqnum_v
from t
) t
group by (seqnum - seqnum_v), value;
It is a gap and islands problem. You may use the following query (using SQL Server syntax, however, it can be easily altered).
select min(startdate) startDate, max(enddate) endDate, value
from
(
select *,
row_number() over (partition by value order by startDate) - (year(startDate) * 12) - month(startDate) grp
from data
) t
group by value, grp
order by startDate
It is using just one row_number() which may be better than two since the DBMS does not have to pass the table twice to generate the sequences.

Can we do partition date wise?

I am new to the concept of Partition. I know horizontal partition but can we do partition date - wise?
in my project I want that whenever we enter in new-year, partition should be created. Can anyone explain how to do this? I am working on ERP sw and it has data of past year and I need partition on year wise.(for example APR-2011 to MAR-2012 is a year)
I think what you are looking for is something like this:
DECLARE #referenceDate datetime = '3/1/2011'
SELECT
[sub].[Period] + 1 AS [PeriodNr],
YEAR(DATEADD(YEAR, [sub].[Period], #referenceDate)) AS [PeriodStartedIn],
COUNT(*) AS [NumberOfRecords],
SUM([sub].[Value]) AS [TotalValue]
FROM
(
SELECT
*,
FLOOR(DATEDIFF(MONTH, #referenceDate, [timestamp]) / 12.0) AS [Period]
FROM [erp]
WHERE [timestamp] >= #referenceDate
) AS [sub]
GROUP BY [sub].[Period]
ORDER BY [sub].[Period] ASC
The fiddle is found here.
When you say partitioning I am curious if you mean in a windowed function or just grouping data in general. Let me show you two ways to aggregate data with partitioning it in a self demonstrating example:
declare #Orders table( id int identity, dt date, counts int)
insert into #Orders values ('1-1-12', 2),('1-1-12', 3),('1-18-12', 1),('2-11-12', 5),('3-1-12', 2),('6-1-12', 8),('10-1-12', 2),('1-13-13', 8)
-- To do days I need to do a group by
select
dt as DayDate
, SUM(counts) as sums
from #Orders
group by dt
-- To do months I need to group differently
select
DATEADD(month, datediff(month, 0, dt), 0) as MonthDate
-- above is a grouping trick basically stating count from 1/1/1900 the number of months of difference from my date field.
--This will always yield the current first day of the month of a date field
, SUM(counts) as sums
from #Orders
group by DATEADD(month, datediff(month, 0, dt), 0)
-- well that is great but what if I want to group different ways all at once?
-- why windowed functions rock:
select
dt
, counts
, SUM(counts) over(partition by DATEADD(year, datediff(year, 0, dt), 0)) as YearPartitioning
, SUM(counts) over(partition by DATEADD(month, datediff(month, 0, dt), 0)) as MonthPartitioning
-- expression above will work for year, month, day, minute, etc. You just need to add it to both the dateadd and datediff functions
, SUM(counts) over(partition by dt) as DayPartitioning
from #Orders
The important concepts on grouping is the traditional group by clause which you MUST LIST that which is NOT performing a math operation on as a pivot to do work on. So in my first select I just chose date and then said sum(counts). It then saw on 1-1-12 it had two values so it added both of them and on everything else it added them individually. On the second select method I perform a trick on the date field to make it transform to the first day of the month. Now this is great but I may want all of this at once.
Windowed functions do groupings inline, meaning they don't need a grouping clause as that is what the over() portion is doing. It however may repeat the values since you are not limiting your dataset. This means that if you look at the third column of the third select 'YearPartitioning' it repeats the number 23 seven times. WHy? Well because you never told the statement to do any grouping outside the function so it is showing every row. The number 23 will occur as long as the expression is true that the year is the same for all values. Just remember this when selecting from a windowed expression.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)