Change data from aggregated to granular level

Change data from aggregated to granular level - sql

My data is in the form:
To reproduce:
DROP TABLE IF EXISTS SALARY;
CREATE TEMP TABLE salary
(
Employee varchar(100),
Salary1 numeric(38,12),
Salary2 numeric(38,12)
);
INSERT INTO salary (Employee, Salary1 ,Salary2)
VALUES ('A1',100,300),('A2',200,300),('A3',300,450),('A4',400,600);
I want to divide it evenly (as we have data for 2 days of salary aggregated into 1 column) and cast it into a daily level data as below:
Hence, if you see for employee A2-Sum of salary for 3rd and 4th may would be 300 (150+150 from the 2nd table).
Any help/leads appreciated.

A materialized calendar table with the desired dates will facilitate generating the dates needed for the query. Without one, a tally table or CTE (as in the below example) is an alternative method.
DECLARE
#StartDate date = '2022-05-01'
, #DaysPerSalary int = 2;
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,salary1 AS (
SELECT Employee, Salary1 / #DaysPerSalary AS Salary
FROM SALARY
)
,salary2 AS (
SELECT Employee, Salary2 / #DaysPerSalary AS Salary
FROM SALARY
)
SELECT DATEADD(day, tally.num-1, #StartDate), Employee, Salary
FROM tally
CROSS JOIN salary1
WHERE tally.num <= #DaysPerSalary
UNION ALL
SELECT DATEADD(day, tally.num-1 + #DaysPerSalary, #StartDate), Employee, Salary
FROM tally
CROSS JOIN salary2
WHERE tally.num <= #DaysPerSalary
ORDER BY Employee, Salary;

Related

How to extrapolate dates in SQL Server to calculate the daily counts?

This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?

For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),

Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date

You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic

Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt

Find the max date to last one year transaction for each group

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate

To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;

You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).

this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;

Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

Oracle query to find employee who has taken maximum number of leaves in last 1 month

I have these tables with the following columns :
Employee24( EMPLOYEEID, FIRSTNAME, LASTNAME, GENDER );
Leave25( EMPLOYEEID,LEAVEID, LEAVETYPE, STARTDATE, ENDDATE, NOOFDAYS );
I want to write a query to find employee who has taken maximum number of leaves in last 1 month
SELECT *
FROM EMPLOYEE24
WHERE EMPLOYEEID IN (SELECT EMPLOYEEID
FROM LEAVE25
WHERE STARTDATE < ADD_MONTHS(SYSDATE, -1));

If your DB version is 12c, you may use Row Limiting Clause for Top-N Queries as below :
SELECT e.*, l.max_leaves
FROM (SELECT employeeid, count(1) as max_leaves
FROM LEAVE25
WHERE startdate >= add_months(sysdate, -1)
GROUP BY employeeid
) l JOIN
EMPLOYEE24 e
ON ( e.employeeid = l.employeeid )
ORDER BY l.max_leaves DESC
FETCH FIRST 1 ROWS WITH TIES; -- including the same highest leave owners
If version is 11g, then use Dense_Rank and Count with a nested query as below :
SELECT e.*, l.max_leaves
FROM (SELECT employeeid, count(1) as max_leaves,
dense_rank() over (order by count(1) desc) dr
FROM LEAVE25
WHERE startdate >= add_months(sysdate, -1)
GROUP BY employeeid
) l JOIN
EMPLOYEE24 e
ON ( e.employeeid = l.employeeid )
WHERE l.dr = 1;
SQL Fiddle Demo for 11g

Query to apply rate from the interval of dates

Let's say I have two tables in my oracle database
Table A : stDate, endDate, salary
For example:
03/02/2010 28/02/2010 2000
05/03/2012 29/03/2012 2500
Table B : DateOfActivation, rate
For example:
01/01/2010 1.023
01/11/2011 1.063
01/01/2012 1.075
I would like to have a SQL query displaying the sum of salary of table A with each salary multiplied by the rate of table B depending on the activation date.
Here, for the first salary the good rate is the first one (1.023) because the second rate has a date of activation that is later than stDate and endDate interval.
For the second salary, the third rate is applied because activation date of the rate was before the interval of dates of the second salary.
so the sum is this one : 2000 * 1.023 + 2500 * 1.075 = 4733.5
I hope I am clear
Thanks

Assuming the rate must be active before the beginning of the interval (i.e. DateOfActivation < stDate), you could do something like this (see fiddle):
SELECT SUM(salary*
(SELECT rate from TableB WHERE DateOfActivation=
(SELECT MAX(DateOfActivation) FROM TableB WHERE DateOfActivation < stDate)
)) FROM TableA;

This problem becomes much easier if DateofActivation is a true effective dated table with rate_start_date and rate_end_date such that a new row cannot be created where its start date or end_date will lie within an existing rate_start_date -- rate_end_date pair. The currently active row typically would have a NULL value for rate_end_date. In addition, Likely, you would want an EMP_ID on the salary table to be able to sum the rows to finish the calculation; and one needs to consider the following cases:
Start Date is between rate_start and rate_end
End Date is between rate_start and rate_end
Rate_start and Rate_end are between start_date and end_date (sandwiched)
If you run the following snippet you will see we can artificially create our rate_end_dates as follows:
SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE
ORDER BY D.ACTIVEDATE
Proposed code is as follows:
SELECT DISTINCT * FROM
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.STDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.ENDDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE T.ACTIVEDATE BETWEEN S.STDATE AND S.ENDDATE)

The first thing to do is to transform Table B (Table2 in the query) to have, for each row, the start and end date
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2
Now we can join this table with Table A (Table1 in the query)
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate < r.endDate AND s.endDate > r.startDate
The JOIN condition get every row in Table A that are at least partially in the activation period of the rate, if you need it to be inclusive you can alter it as in the following query
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate >= r.startDate AND s.endDate <= r.endDate

SQL Query to calculate median and group by

I have following table.
DECLARE #TBL_RESULT Table
(
ID varchar(10),
CreateDate DateTime,
PEOPLE_CODE_ID varchar(10),
CONVERSION_DATE DateTime,
CAMPUS varchar(20),
DAYS_TOOK int
);
This table has records from January 01,2013 to date of all the leads that were received and converted.
I initially needed to find the Median time it took to convert leads that arrived in last 10 weeks and group them by Campus I was able to do that Using the SQL Query below
WITH CTE_RESULT
AS ( SELECT *
FROM #TBL_RESULT
WHERE CreateDate > DATEADD(WEEK, -10, GETDATE())
)
SELECT Campus ,
AVG(DAYS_TOOK) AS MedianTime
FROM ( SELECT CAMPUS ,
Days_Took ,
ROW_NUMBER() OVER ( PARTITION BY Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER ( PARTITION BY CAMPUS ) AS CampusCount
FROM CTE_RESULT
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.Campus
I now need to plot this trend on a graph i.e. find records the previous 10 weeks buckets and plot the median on a line chart - where each line is one campus. (Grouped by campus)
Is cursor my only option? where I will find the leads of first 10 week starting from Jan 01, do the above SQL query to get median, push it to a temp table and then find the next 10 weeks and so on.
Or is there anything better i can do?

Without trying to optimise your query, if you need to produce the same result across multiple 10-WEEK periods, you can expand your current (10 week ago to today) ranges to as many ranges as required, threading a PeriodEndDate throughout the query as shown below.
SQL Fiddle
MS SQL Server 2012 Schema Setup:
Query 1:
DECLARE #TBL_RESULT Table
(
ID varchar(10),
CreateDate DateTime,
PEOPLE_CODE_ID varchar(10),
CONVERSION_DATE DateTime,
CAMPUS varchar(20),
DAYS_TOOK int
);
-- fill the table with some dummy data from 2013-01-01
INSERT #TBL_RESULT (CreateDate, Campus, Days_Took)
SELECT DATEADD(D, A.Number, '20130101'), 'Campus' + Right(B.Number, 10),
ABS(CAST(NEWID() AS binary(6)) % 130) + 1
FROM master..spt_values A
JOIN master..spt_values B on B.type='P' and B.number < 50 -- 50 campuses
WHERE A.type='P'
AND DATEADD(D, A.Number, '20130101') <= GetDate();
-- This first CTE is used to create the required number of 10-week periods
WITH N(NUMBER) AS (
SELECT 0
union all
select number+1 from N
where Number <= DATEDIFF(WEEK, '20130101', GETDATE())
),
-- and from below here it's your query with the PeriodEndDate threaded through
CTE_RESULT AS (
SELECT DATEADD(WEEK, -Number, GETDATE()) PeriodEndDate,
T.*
FROM #TBL_RESULT T
CROSS JOIN N
-- you see the range built up dynamically here
WHERE CreateDate > DATEADD(WEEK, -Number-10, GETDATE())
AND CreateDate < DATEADD(WEEK, -Number, GETDATE()) +1
)
SELECT PeriodEndDate, Campus ,
AVG(DAYS_TOOK) AS MedianTime
FROM (
SELECT PeriodEndDate, CAMPUS ,
Days_Took ,
ROW_NUMBER() OVER ( PARTITION BY PeriodEndDate, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER ( PARTITION BY PeriodEndDate, CAMPUS ) AS CampusCount
FROM CTE_RESULT
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.PeriodEndDate, x.Campus
ORDER BY x.PeriodEndDate, x.Campus;

It seems that you solved the hard part of the problem.
To get what you want, you need to introduce a grouping variable. In this case, I measure the number of weeks in the past and divide by 10 (SQL Server does integer division so this produces an integer).
You just then judiciously use this in the partition by and group by statements:
WITH CTE_RESULT AS (
SELECT t.*,
DATEDIFF(week, CreateDate, GETDATE()) / 10 as groupnum
FROM #TBL_RESULT t
)
SELECT Campus, groupnum, MIN(CreateDate), MAX(CreateDate),
AVG(DAYS_TOOK) AS MedianTime
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY groupnum, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER (PARTITION BY groupnum, CAMPUS) AS CampusCount
FROM CTE_RESULT t
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.Campus, groupnum
I haven't tested this, so it might have a syntax error or two.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Change data from aggregated to granular level - sql

Related

How to extrapolate dates in SQL Server to calculate the daily counts?

Find the max date to last one year transaction for each group

Oracle query to find employee who has taken maximum number of leaves in last 1 month

Query to apply rate from the interval of dates

SQL Query to calculate median and group by

Categories

Resources