PIVOT table returning all null data - sql

Consider this query:
SELECT *
FROM (
SELECT
userid,
FORMAT(datecreated, 'yyyy-MM') AS purchasemonth,
COALESCE(amount + tip, 0) AS amt
FROM invoice
) AS SourceTable
which produces output like this:
And this pivot query in which I am trying to sum over each month:
SELECT
userid,
COALESCE([2016-08-01], 0) AS [2016-08-01],
COALESCE([2016-09-01], 0) AS [2016-09-01]
FROM (
SELECT
userid,
FORMAT(datecreated, 'yyyy-MM') AS purchasemonth,
COALESCE(amount + tip, 0) AS amt
FROM invoice
) AS SourceTable
PIVOT
(
SUM(amt)
FOR purchasemonth IN ([2016-08-01], [2016-09-01])
) AS PivotTable
which produces output like this:
There is no NULL data at all in the original query's output. The PIVOT query's output is nothing but null data (coalesced to 0). But I can't figure out why the PIVOT is not summing the data as I expected. I'm expecting there to be no NULL data in the PIVOT output either.
How can I fix the query to behave as expected?

PurchaseMonth in your Derived Table is a String without DAYs in it and you are comparing it to a Date with days in it for the values/column names:
So the main issue is this line:
FOR purchasemonth IN ([2016-08-01], [2016-09-01])
TO
FOR purchasemonth IN ([2016-08], [2016-09])
Once you change that you would need to change the COALESCE() statements too
and you should get what you want.
SELECT
userid,
COALESCE([2016-08], 0) AS [2016-08-01],
COALESCE([2016-09], 0) AS [2016-09-01]
FROM (
SELECT
userid,
FORMAT(datecreated, 'yyyy-MM') AS purchasemonth,
COALESCE(amount + tip, 0) AS amt
FROM invoice
) AS SourceTable
PIVOT
(
SUM(amt)
FOR purchasemonth IN ([2016-08], [2016-09])
) AS PivotTable
If you want the 01 to remain for days then simply change up the Derived Table definition to be a date or include the day in the format
So if you want to go this route change this line:
FORMAT(datecreated, 'yyyy-MM') AS purchasemonth,
To
DATEADD(day,1-DAY(datecreated),datecreated) AS purchasemonth,
you could also use this
FORMAT(datecreated, 'yyyy-MM-dd') AS purchasemonth,
But FORMAT has performance impacts that you have no reason to introduce if you don't need to.
SELECT
userid,
COALESCE([2016-08-01], 0) AS [2016-08-01],
COALESCE([2016-09-01], 0) AS [2016-09-01]
FROM (
SELECT
userid,
DATEADD(day,1-DAY(datecreated),datecreated) AS purchasemonth,
COALESCE(amount + tip, 0) AS amt
FROM invoice
) AS SourceTable
PIVOT
(
SUM(amt)
FOR purchasemonth IN ([2016-08-01], [2016-09-01])
) AS PivotTable

Related

Invalid identifier in pivot table

Here is my query:
SELECT NVL(REVENUE,0), OUTNO, MONTH_NAME FROM
(
SELECT ROUND((RETURNDATE-STARTDATE)*DAILYRATE) AS REVENUE,
OUTNO,
EXTRACT(MONTH FROM RETURNDATE)AS MONTH_NAME
FROM RAGREEMENT LEFT JOIN VEHICLE ON
RAGREEMENT.LICENSENO=VEHICLE.LICENSENO
AND EXTRACT(YEAR FROM RETURNDATE)=EXTRACT(YEAR FROM SYSDATE)-1
)
PIVOT (
SUM(REVENUE)
FOR OUTNO IN (1,2,3,4,5,6,-1 AS TOTAL)
)
ORDER BY MONTH_NAME;
and here is the error
SELECT NVL(REVENUE,0), OUTNO, MONTH_NAME FROM
*
ERROR at line 1:
ORA-00904: "OUTNO": invalid identifier
I fail to understand why this happens when SELECT * works perfectly
What comes out from a pivot is a whole new set of columns names, and for each one of those new columns, you will need an NVL() or COALESCE(). This is because a completely new "matrix" is formed and many positions in this can be null. You cannot overcome this by using NVL() in the inner subquery.
Assuming you want months as columns your query might look more like this:
SELECT
OUTNO
, NVL('M1', 0)
, NVL('M2', 0)
, NVL('M3', 0)
, NVL('M4', 0)
, NVL('M5', 0)
, NVL('M6', 0)
, NVL('M7', 0)
, NVL('M8', 0)
, NVL('M9', 0)
, NVL('M10', 0)
, NVL('M11', 0)
, NVL('M12', 0)
FROM (
SELECT
ROUND((RETURNDATE - STARTDATE) * DAILYRATE) AS REVENUE
, OUTNO
, 'M' || EXTRACT(MONTH FROM RETURNDATE) AS MONTH_NAME
FROM RAGREEMENT
LEFT JOIN VEHICLE ON RAGREEMENT.LICENSENO = VEHICLE.LICENSENO
AND EXTRACT(YEAR FROM RETURNDATE) = EXTRACT(YEAR FROM SYSDATE) - 1
)
PIVOT(
SUM(REVENUE)
FOR MONTH_NAME IN ('M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12')
)
ORDER BY OUTNO;
This line produces the new columns:
FOR MONTH_NAME IN ('M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12')
and it is each one of these you will need to "fix" for nulls in the select clause.
To put OUTNO values into columns requires a similar pattern but you need to know what the distinct set of values will be from that originating column. This MIGHT be 1,2,3,4,5,6,-1 but I wasn't certain.
nb: I prefixed 'M' in columns headings as many systems object to numbers as heading names.

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

oracle: pivot on dynamic dates

I have this query:
select pvt1.*
from
(
select
TO_CHAR(DateAppointment, 'yyyy-mm-dd') as currentDay,
count(*) myCounter
[...]
from (
select
[...]
from myTable
) a
group by [...]
order by DateAppointment
) source1
PIVOT
(
max(myCounter)
--FOR currentDay IN ('2012-08-20', '2012-08-21', '2012-08-27', '2012-09-03')
FOR currentDay IN (
SELECT LISTAGG(datevalue, ', ')
WITHIN GROUP (ORDER BY datevalue)
FROM DATESLIST
)
) pvt1;
This subquery just get the list of my dates from another table (DATESLIST), but when i run the first query, Oracle returns an error.
SELECT LISTAGG(datevalue, ', ')
WITHIN GROUP (ORDER BY datevalue)
FROM DATESLIST
But when i use instead the following code, i get the correct results:
FOR currentDay IN ('2012-08-20', '2012-08-21', '2012-08-27', '2012-09-03')
Any ideas?
Thanks in advance.

Filling in missing dates DB2 SQL

My initial query looks like this:
select process_date, count(*) batchCount
from T1.log_comments
order by process_date asc;
I need to be able to do some quick analysis for weekends that are missing, but wanted to know if there was a quick way to fill in the missing dates not present in process_date.
I've seen the solution here but am curious if there's any magic hidden in db2 that could do this with only a minor modification to my original query.
Note: Not tested, framed it based on my exposure to SQL Server/Oracle. I guess this gives you the idea though:
*now amended and tested on DB2*
WITH MaxDateQry(MaxDate) AS
(
SELECT MAX(process_date) FROM T1.log_comments
),
MinDateQry(MinDate) AS
(
SELECT MIN(process_date) FROM T1.log_comments
),
DatesData(ProcessDate) AS
(
SELECT MinDate from MinDateQry
UNION ALL
SELECT (ProcessDate + 1 DAY) FROM DatesData WHERE ProcessDate < (SELECT MaxDate FROM MaxDateQry)
)
SELECT a.ProcessDate, b.batchCount
FROM DatesData a LEFT JOIN
(
SELECT process_date, COUNT(*) batchCount
FROM T1.log_comments
) b
ON a.ProcessDate = b.process_date
ORDER BY a.ProcessDate ASC;

SQL Grouping Issues

I'm attempting to write a query that will return any customer that has multiple work orders with these work orders falling on different days of the week. Every work order for each customer should be falling on the same day of the week so I want to know where this is not the case so I can fix it.
The name of the table is Core.WorkOrder, and it contains a column called CustomerId that specifies which customer each work order belongs to. There is a column called TimeWindowStart that can be used to see which day each work order falls on (I'm using DATENAME(weekday, TimeWindowStart) to do so).
Any ideas how to write this query? I'm stuck here.
Thanks!
Select ...
From WorkOrder As W
Where Exists (
Select 1
From WorkOrder As W1
And W1.CustomerId = W.CustomerId
And DatePart( dw, W1.TimeWindowStart ) <> DatePart( dw, W.TimeWindowStart )
)
SELECT *
FROM (
SELECT *,
COUNT(dp) OVER (PARTITION BY CustomerID) AS cnt
FROM (
SELECT DISTINCT CustomerID, DATEPART(dw, TimeWindowStart) AS dp
FROM workOrder
) q
) q
WHERE cnt >= 2
SELECT CustomerId,
MIN(DATENAME(weekday, TimeWindowStart)),
MAX(DATENAME(weekday, TimeWindowStart))
FROM Core.WorkOrder
GROUP BY CustomerId
HAVING MIN(DATENAME(weekday, TimeWindowStart)) != MAX(DATENAME(weekday, TimeWindowStart))