Calculate data in Pivot - sql

I have the following SQL table with 4 columns.
Table Name: tblTimeTransaction
Columns: EmployeeNumber, TransactionDate, CodeType, TimeShowninSeconds
CodeType has values : REG, OT1, OT2, OT3 respectively
I want it to show like this using pivot using 15 days incrementals starting from Jan 1 2020 onwards:
Employee Number | Effective Date | REG | OT1 | OT2 | OT3
E12345 | Between 10-1 till 10-15 | 200 | 100 | 50 | 45
E15000 | Between 10-1 till 10-15 | 400 | 600 | 903 | 49
E12345 | Between 10-15 till 10-31 | 200 | 100 | 50 | 45
E15000 | Between 10-15 till 10-31 | 400 | 600 | 903 | 49
E12346 | Between 11-1 till 11-15 | 4200 | 100 | 50 | 45
E15660 | Between 11-1 till 11-15 | 1200 | 600 | 6903 | 49
My SQL Code so far:
SELECT
Employee Number,
[TransactionDate] as [Effective Date],
[REG],
[OT1],
[OT2],
[OT3]
FROM
( SELECT Employee Number, TransactionDate, CodeType, TimeInSeconds
FROM [tblTimetransaction]
) ps
PIVOT
( SUM (TimeInSeconds)
FOR CodeType IN ( [REG], [OT1], [OT2], [OT3])
) AS pvt
where TransactionDate between '2020-01-01' and '2020-12-31'

If I follow you correctly, you can truncate the effective_date to either the 1st of 15th of the month depending on their day of the month, then use conditional aggregation to compute the total time_in_seconds for each code_type:
select employee_number,
datefromparts(year(effective_date), month(effective_date), case when day(effective_date) < 15 then 1 else 15 end) as dt,
sum(case when code_type = 'REG' then time_in_seconds else 0 end) as reg,
sum(case when code_type = 'OT1' then time_in_seconds else 0 end) as ot1,
sum(case when code_type = 'OT2' then time_in_seconds else 0 end) as ot2,
sum(case when code_type = 'OT3' then time_in_seconds else 0 end) as ot3
from tblTimetransaction
where effective_date >= '20200101' and effective_date < '20210101'
group by employee_number,
datefromparts(year(effective_date), month(effective_date), case when day(effective_date) < 15 then 1 else 15 end)

Related

Oracle Pivot/Decode

Sample Table
EmployeeID | AssignmentID | WageCode | CompanyName | BillRate | BillTotal
1 | 1 | Regular | CompanyOne | 10 | 400
1 | 2 | Regular | CompanyTwo | 11 | 440
1 | 1 | Overtime | CompanyOne | 15 | 150
1 | 1 | Mileage | CompanyOne | 0 | 20
2 | 3 | Regular | CompanyThree| 20 | 800
2 | 3 | Regular | CompanyThree| 20 | 800
2 | 3 | Overtime | CompanyThree| 30 | 90
2 | 3 | Mileage | CompanyThree| 0 | 60
I want to only show rows with a WageCode of 'Regular', grouped by EmployeeID, WageCode, AssignmentID, CompanyName and BillRate, and pivot the other wage codes into columns.
The final result should look like this:
EmployeeID | AssignmentID | CompanyName | RegBillRate | RegBill | OTBillRate | OTBill | MileageBill
1 | 1 | CompanyOne | 10 | 400 | 15 | 150 | 20
1 | 2 | CompanyTwo | 11 | 440 | 0 | 0 | 0
2 | 3 | CompanyThree| 20 | 1600 | 30 | 90 | 60
What's a cleaner way to do this that's not a bunch of with statements like this:
with regular as
(select EmployeeID, AssignmentID, CompanyName, BillRate, sum(BillTotal) Total from SampleTable where wage code = 'Regular' group by EmployeeID, AssignmentID, CompanyName, BillRate
),
overtime as
(select EmployeeID, AssignmentID, CompanyName, BillRate, sum(BillTotal) Total from SampleTable where wage code = 'Overtime' group by EmployeeID, AssignmentID, CompanyName, BillRate
),
mileage as
(select EmployeeID, AssignmentID, CompanyName, BillRate, sum(BillTotal) Total from SampleTable where wage code = 'Mileage' group by EmployeeID, AssignmentID, CompanyName, BillRate
)
select r.*, o.BillRate, o.Total, m.Total
from regular r
left outer join overtime o
on r.EmployeeID = o.EmployeeID and r.AssignmentID= o.AssignmentID and r.CompanyName= o.CompanyName and r.BillRate= o.BillRateand
left outer join mileage m
on r.EmployeeID = m.EmployeeID and r.AssignmentID= m.AssignmentID and r.CompanyName= m.CompanyName and r.BillRate= m.BillRateand
The query above is paraphrased and probably doesn't work.
What's a better way to do this with some combination of decode and pivot? Is a single pivot table possible?
PIVOT: The Oracle PIVOT clause allows you to write a cross-tabulation query starting in Oracle 11g. This means that you can aggregate your results and rotate rows into columns.
DECODE: The Oracle/PLSQL DECODE function has the functionality of an IF-THEN-ELSE statement.
For your use case, you can use pivot and decode in the following way:
SELECT
EmployeeID, AssignmentID, CompanyName,
decode(REG_BILLRATE, NULL, 0, REG_BILLRATE) AS REG_BILLRATE,
decode(REG_FILL, NULL, 0, REG_FILL) AS REG_FILL,
decode(OT_BILLRATE, NULL, 0, OT_BILLRATE) AS OT_BILLRATE,
decode(OT_FILL, NULL, 0, OT_FILL) AS OT_FILL,
decode(MILEAGE_FILL, NULL, 0, MILEAGE_FILL) AS MILEAGE_FILL
FROM nbitra.tmp
pivot
(
max(BillRate) AS BillRate, sum(BillTotal) AS Fill
for WageCode IN ('Regular' Reg , 'Overtime' OT , 'Mileage' Mileage )
);
Note: The code replaces nulls with 0.
I think you just want conditional aggregation:
select EmployeeID, AssignmentID, CompanyName,
sum(case when WageCode = 'Regular' then billrate end) as regular_billrate,
sum(case when WageCode = 'Regular' then BillTotal end) as regular_billtotal,
sum(case when WageCode = 'Overtime' then billrate end) as ot_billrate,
sum(case when WageCode = 'Overtime' then BillTotal end) as ot_billtotal,
sum(case when WageCode = 'Mileage' then billrate end) as mileage_billrate,
sum(case when WageCode = 'Mileage' then BillTotal end) as mileage_billtotal
from SampleTable st
group by EmployeeID, AssignmentID, CompanyName;

Vertica dynamic pivot/transform

I have a table in vertica :
id Timestamp Mask1 Mask2
-------------------------------------------
1 11:30 50 100
1 11:35 52 101
2 12:00 53 102
3 09:00 50 100
3 22:10 52 105
. . . .
. . . .
Which I want to transform into :
id rows 09:00 11:30 11:35 12:00 22:10 .......
--------------------------------------------------------------
1 Mask1 Null 50 52 Null Null .......
Mask2 Null 100 101 Null Null .......
2 Mask1 Null Null Null 53 Null .......
Mask2 Null Null Null 102 Null .......
3 Mask1 50 Null Null Null 52 .......
Mask2 100 Null Null Null 105 .......
The dots (...) indicate that I have many records.
Timestamp is for a whole day and is of format hours:minutes:seconds starting from 00:00:00 to 24:00:00 for a day (I have just used hours:minutes for the question).
I have defined just two extra columns Mask1 and Mask2. I have about 200 Mask columns to work with.
I have shown 5 records but in real I have about a million record.
What I have tried so far:
Dumping each records based on id in a csv file.
Applying transpose in python pandas.
Joining the transposed tables.
The possible generic solution may be pivoting in vertica (or UDTF), but I am fairly new to this database.
I am struggling with this logic for couple of days. Can anyone please help me. Thanks a lot.
Below is the solution as I would code it for just the time values that you have in your data examples.
If you really want to be able to display all 86400 of '00:00:00' through '23:59:59', though, you won't be able to. Vertica's maximum number of columns is 1600.
You could, however, play with the Vertica function TIME_SLICE(timestamp::TIMESTAMP,1,'MINUTE')::TIME
(TIME_SLICE takes a timestamp as input and returns a timestamp, so you have to cast (::) back and forth), to reduce the number of rows to 1440 ...
In any case, I would start with SELECT DISTINCT timestamp FROM input ORDER BY 1;, and then, in the final query, would generate one line per found timestamp (hoping they won't be more than 1598....), like the ones actually used for your data, into your query:
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
SQL in general has no variable number of output columns from any given query. If the number of final columns varies depending on the data, you will have to generate your final query from the data, and then run it.
Welcome to SQL and relational databases ..
Here's the complete script for your data. I pivot vertically first, along the "Mask-n" column names, and then I re-pivot horizontally, along the timestamps.
\pset null Null
-- ^ this is a vsql command to display nulls with the "Null" string
WITH
-- your input, not in final query
input(id,Timestamp,Mask1,Mask2) AS (
SELECT 1 , TIME '11:30' , 50 , 100
UNION ALL SELECT 1 , TIME '11:35' , 52 , 101
UNION ALL SELECT 2 , TIME '12:00' , 53 , 102
UNION ALL SELECT 3 , TIME '09:00' , 50 , 100
UNION ALL SELECT 3 , TIME '22:10' , 52 , 105
)
,
-- real WITH clause starts here
-- need an index for your 200 masks
i(i) AS (
SELECT MICROSECOND(ts) FROM (
SELECT TIMESTAMPADD(MICROSECOND, 1,TIMESTAMP '2000-01-01') AS tm
UNION ALL SELECT TIMESTAMPADD(MICROSECOND,200,TIMESTAMP '2000-01-01') AS tm
)x
TIMESERIES ts AS '1 MICROSECOND' OVER(ORDER BY tm)
)
,
-- verticalised masks
vertical AS (
SELECT
id
, i
, CASE i
WHEN 1 THEN 'Mask001'
WHEN 2 THEN 'Mask002'
WHEN 200 THEN 'Mask200'
END AS rows
, timestamp
, CASE i
WHEN 1 THEN Mask1
WHEN 2 THEN Mask2
WHEN 200 THEN 0 -- no mask200 present
END AS val
FROM input CROSS JOIN i
WHERE i <=2 -- only 2 masks present currently
)
-- test the vertical CTE ...
-- SELECT * FROM vertical order by id,rows,timestamp;
-- out id | i | rows | timestamp | val
-- out ----+---+---------+-----------+-----
-- out 1 | 1 | Mask001 | 11:30:00 | 50
-- out 1 | 1 | Mask001 | 11:35:00 | 52
-- out 1 | 2 | Mask002 | 11:30:00 | 100
-- out 1 | 2 | Mask002 | 11:35:00 | 101
-- out 2 | 1 | Mask001 | 12:00:00 | 53
-- out 2 | 2 | Mask002 | 12:00:00 | 102
-- out 3 | 1 | Mask001 | 09:00:00 | 50
-- out 3 | 1 | Mask001 | 22:10:00 | 52
-- out 3 | 2 | Mask002 | 09:00:00 | 100
-- out 3 | 2 | Mask002 | 22:10:00 | 105
SELECT
id
, rows
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
FROM vertical
GROUP BY
id
, rows
ORDER BY
id
, rows
;
-- out Null display is "Null".
-- out id | rows | 09:00 | 11:30 | 11:35 | 12:00 | 22:10
-- out ----+---------+-------+-------+-------+-------+-------
-- out 1 | Mask001 | Null | 50 | 52 | Null | Null
-- out 1 | Mask002 | Null | 100 | 101 | Null | Null
-- out 2 | Mask001 | Null | Null | Null | 53 | Null
-- out 2 | Mask002 | Null | Null | Null | 102 | Null
-- out 3 | Mask001 | 50 | Null | Null | Null | 52
-- out 3 | Mask002 | 100 | Null | Null | Null | 105
-- out (6 rows)
-- out
-- out Time: First fetch (6 rows): 28.143 ms. All rows formatted: 28.205 ms
You can use union all to unpivot the data and then conditional aggregation:
select id, which,
max(case when timestamp >= '09:00' and timestamp < '09:30' then mask end) as "09:00",
max(case when timestamp >= '09:30' and timestamp < '10:00' then mask end) as "09:30",
max(case when timestamp >= '10:00' and timestamp < '10:30' then mask end) as "10:00",
. . .
from ((select id, timestamp,
'Mask1' as which, Mask1 as mask
from t
) union all
(select id, timestamp, 'Mask2' as which, Mask2 as mask
from t
)
) t
group by t.id, t.which;
Note: This includes the id on each row. I strongly recommend doing that, but you could use:
select (case when which = 'Mask1' then id end) as id
If you really wanted to.

Count between two different dates from same column and table

I have a table with this information
ACCTCCODE | ACCTDESCRIPTION | ISSUEDATE
----------+-----------------+----------------
1031 | Blahdescription | 2018-03-11
1032 | Blahdescription | 2017-04-18
1033 | Blahdescription | 2018-04-15
1034 | Blahdescription | 2018-011-04
I want to try and get the dates between two times? For two separate columns. E.g.
ACCTCCODE | ACCTDESCRIPTION | FIRSTCOUNT | SECOUNDCOUNT
----------+-----------------+-------------+--------------
1031 | Blahdescription | 150 23
1032 | Blahdescription | 75 101
1033 | Blahdescription | 3 78
1034 | Blahdescription | 11 23
I've tried to create a query with a SELECT within a SELECT but am new to sql so having a bit of trouble making it work.
Here's what I've come up with which works for first count but doesn't work quite right for the second count.
SELECT DISTINCT
account AS ACCTCODE, Description AS ACCTDESCRIPTION,
COUNT(issueDate) AS FIRSTCOUNT,
(SELECT COUNT(issueDate)
FROM Table1
WHERE issueDate BETWEEN CONVERT(DATETIME, '2018-2-31')
AND CONVERT(DATETIME, '2018-04-03')
AND account <> '') AS SECONDCOUNT
FROM
Table1
WHERE
issueDate BETWEEN CONVERT(DATETIME, '2017-11-31')
AND CONVERT(DATETIME, '2018-02-01')
AND account <> ''
GROUP BY
account, Description
ORDER BY
account, Description ASC
Use SUM() with CASE once for each date interval
SELECT account AS ACCTCODE, description AS ACCTDESCRIPTION,
SUM(CASE WHEN issuedate BETWEEN CONVERT(DATE, '20171131', 112) AND CONVERT(DATE, '20180201', 112)
THEN 1 ELSE 0 END) as FIRSTCOUNT,
SUM(CASE WHEN issuedate BETWEEN CONVERT(DATE, '20180228', 112) AND CONVERT(DATE, '20180403', 112)
THEN 1 ELSE 0 END) as SECONDCOUNT
FROM Table1
GROUP BY account, description
you can try like below by using case when
select ACCTCCODE,ACCTDESCRIPTION ,sum(case when issueDate>='2017-11-31' and issueDate<='2018-02-01' then 1 else 0 end)
as FIRSTCOUNT ,count(*) as SECOUNDCOUNT
FROM Table1
group by ACCTCCODE,ACCTDESCRIPTION

Summing By Count

I'm trying to create a Summation based on the Count number for a particular column. If you looks at the last line in the Select below you'll see that I tried implementing a CASE statement. However, it produces all NULL values. Which I believe I understand why (each row has a unique set of values) but I'm not sure how to fix my problem.
SELECT
TotalFilesProduced.ReviewDate,
TotalFilesProduced.FileReviewedByUserID,
TotalFilesProduced.FileSource,
TotalFilesProduced.FilesIndexed TotalIndexed,
TotalFilesProduced.FileNumberofPages TotalFileNumberofPages,
TotalFilesProduced.FilesProduced,
CASE WHEN COUNT(DISTINCT FileReviewedByUserID) > 1 THEN SUM(TotalFilesProduced.FilesIndexed) END
FROM
(SELECT
CAST(ibfp.FileReviewedDate AS DATE) ReviewDate,
ibfp.FileReviewedByUserID,
FileSource,
COUNT(*) FilesProduced,
COUNT(DISTINCT ibf.InboundFileID) FilesIndexed,
SUM(CASE WHEN ibfp.FromPage = ibfp.ToPage THEN 1
ELSE ibfp.ToPage-ibfp.FromPage + 1 END) [FileNumberofPages]
FROM
dbo.InboundFilePartitions ibfp
INNER JOIN dbo.InboundFiles ibf ON ibfp.InboundFileID = ibf.InboundFileID
WHERE
CAST(ibfp.FileReviewedDate AS DATE) >= '10/22/2014'
and CAST(ibfp.FileReviewedDate AS DATE) <= '10/22/2014'
and ibf.ProjectID in (110)
GROUP BY
CAST(ibfp.FileReviewedDate AS DATE),
ibfp.FileReviewedByUserID,
FileSource
) TotalFilesProduced
GROUP BY
TotalFilesProduced.ReviewDate,
TotalFilesProduced.FileReviewedByUserID,
TotalFilesProduced.FileSource,
TotalFilesProduced.FilesIndexed,
TotalFilesProduced.FileNumberofPages,
TotalFilesProduced.FilesProduced
Here is an example for further clarification - here the UserID 1036 producing a NULL is fine since it appear only once but for 804 - I would like to sum the TotalIndexed column so the NULL area should read 139 (for both instances that 804 appears)
ReviewDate | FilereviewedByUserID | FileSource | TotalIndexed | TotalFileNumberofPages | FilesProduced | (No Column Name) /*My Sum*/
------------------------------------------------------------------------------------------------------------------------------------
2014-10-22 | 804 | 1 | 1 | 67 | 1 | NULL
------------------------------------------------------------------------------------------------------------------------------------
2014-10-22 | 1036 | 1 | 1 | 17 | 1 | NULL
------------------------------------------------------------------------------------------------------------------------------------
2014-10-22 | 804 | 2 | 138 | 3322 | 184 | NULL
As stated in the comment
This will always be false
CASE WHEN COUNT(DISTINCT FileReviewedByUserID) > 1
Because of
GROUP BY ibfp.FileReviewedByUserID
And you have some other strange stuff
CAST(ibfp.FileReviewedDate AS DATE) >= '10/22/2014'
and CAST(ibfp.FileReviewedDate AS DATE) <= '10/22/2014'
is the same as
CAST(ibfp.FileReviewedDate AS DATE) = '10/22/2014'
More strange stuff
SUM(CASE WHEN ibfp.FromPage = ibfp.ToPage THEN 1
ELSE ibfp.ToPage-ibfp.FromPage + 1 END) [FileNumberofPages]
is the same as
SUM(ibfp.ToPage-ibfp.FromPage + 1) [FileNumberofPages]
not sure what you are trying to do but a group by on a group by is not common

Get Month columns from datetime column and count entries

I have the following table:
| ID | Name | DateA | TimeToWork | TimeWorked |
|:--:|:----:|:----------:|:----------:|:----------:|
| 1 |Frank | 2013-01-01 | 8 | 5 |
| 2 |Frank | 2013-01-02 | 8 | NULL |
| 3 |Frank | 2013-01-03 | 8 | 7 |
| 4 |Jules | 2013-01-01 | 4 | 9 |
| 5 |Jules | 2013-01-02 | 4 | NULL |
| 6 |Jules | 2013-01-03 | 4 | 3 |
The table is very long, every person has an entry for every day in a year. For each person I have the Date he worked (DateA), the hours he has to work according to contract (TimeToWork) and the hours he worked (TimeWorked). As you can see some days a person didnt work on a day he had to. This is when a person took a full day overtime.
What I try to accomplish is to get the following table out of the first one above.
| Name | January | Feburary | March | ... | Sum |
|:----:|:----------:|:--------:|:-----:|:---:|:---:|
|Frank | 2 | 0 | 1 | ... | 12 |
|Jules | 5 | 1 | 3 | ... | 10 |
For each month I want to count all days where a person took A FULL day off and sum all up in the Sum column.
I tried something like Select (case when Datetime(month, DateA = 1 then count(case when timetowork - (case when timeworked then 0 end) = timetowork then 1 else 0 end) end) as 'January' but my TSQL is just not that good and the code doent work at all. Btw using this my select command would be about 40 lines.
I really would appreciate if anyone could help me or give me a link to a good source so I can read myself into it.
If I understand the question right, than Gordon Linoff's answer is a good beginning, but doesn't deal with "full day off".
select Name,
sum(case when month(DateA) = 01 and TimeWorked is null then 1 else 0 end) as Jan,
sum(case when month(DateA) = 02 and TimeWorked is null then 1 else 0 end) as Feb,
...
sum(case when month(DeteA) = 12 and TimeWorked is null then 1 else 0 end) as Dec,
sum(case when TimeWorked is null then 1 else 0 end) as Sum
from table T
where year(DateA) = 2013
group by name
This method solves the problem?
The correct syntax is conditional aggregation:
select name,
sum(case when month(datea) = 1 then timeworked else 0 end) as Jan,
sum(case when month(datea) = 2 then timeworked else 0 end) as Feb,
. . .
sum(case when month(datea) = 12 then timeworked else 0 end) as Dec,
sum(timeworked)
from table t
where year(datea) = 2013
group by name;
The CASE can be removed using bit logic
SELECT name
, January = SUM((1 - CAST(MONTH(DateA) - 1 as bit))
* (1 - CAST(COALESCE(TimeWorked, 0) as bit)))
, February = SUM((1 - CAST(MONTH(DateA) - 2 as bit))
* (1 - CAST(COALESCE(TimeWorked, 0) as bit)))
...
, December = SUM((1 - CAST(MONTH(DateA) - 12 as bit))
* (1 - CAST(COALESCE(TimeWorked, 0) as bit)))
, Total = SUM((1 - CAST(COALESCE(TimeWorked, 0) as bit)))
FROM table1
GROUP BY name;
To check if there is a dayoff the formula is:
(1 - CAST(COALESCE(TimeWorked, 0) as bit))
that is equivalent to TimeWorked IS NULL: the CAST to BIT return 1 for every value different from 0, 1 - BIT invert those values.
The month filter is:
(1 - CAST(MONTH(DateA) - %month% as bit))
using the same idea as before this formula return 1 only for the given month (the cast give 1 for every other month, the 1 - BIT invert that result)
Multipling the two formulas we have the days off only for the given month
You can get your required result by using pivot also. You can get more information about pivot here http://technet.microsoft.com/en-in/library/ms177410(v=sql.105).aspx
Also you can get your output using the following query. I did it for up to April only. You can extend it up to December.
Select [Name], [January], [February], [March], [April]
From
(
Select Name, MName, DaysOff from
(
select Name, DATENAME(MM, dateA) MName,
count(case isnull(timeworked,0) when 0 then 1 else null end) DaysOff
from tblPivot
Where Year(DateA) = 2013
group by Name, DATENAME(MM, dateA)
) A ) As B
pivot(Count(DaysOff)
For MName in ([January], [February],[March],[April])
) As Pivottable;