Proper way to create a pivot table with crosstab - sql

How do I convert the following query into a pivot table using crosstab?
select (SUM(CASE WHEN added_customer=false
THEN 1
ELSE 0
END)) AS CUSTOMERS_NOT_ADDED, (SUM(CASE WHEN added_customer=true
THEN 1
ELSE 0
END)) AS CUSTOMERS_ADDED,
(select (SUM(CASE WHEN added_sales_order=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS SALES_ORDER_NOT_ADDED,
(select (SUM(CASE WHEN added_sales_order=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS SALES_ORDER_ADDED,
(select (SUM(CASE WHEN added_fulfillment=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS ITEM_FULFILLMENT_NOT_ADDED,
(select (SUM(CASE WHEN added_fulfillment=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS ITEM_FULFILLMENT_ADDED,
(select (SUM(CASE WHEN added_invoice=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS INVOICE_NOT_ADDED,
(select (SUM(CASE WHEN added_invoice=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS INVOICE_ADDED,
(select (SUM(CASE WHEN added_ra=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS RA_NOT_ADDED,
(select (SUM(CASE WHEN added_ra=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS RA_ADDED,
(select (SUM(CASE WHEN added_credit_memo=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS CREDIT_MEMO_NOT_ADDED,
(select (SUM(CASE WHEN added_credit_memo=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS CREDIT_MEMO_ADDED
FROM shipments_data;
This query gives me data in a standard row format however I would like to show this as a pivot table in the following format:
Added Not_Added
Customers 100 0
Sales Orders 50 50
Item Fulfillemnts 0 100
Invoices 0 100
...
I am using Heroku PostgreSQL, which is running v9.1.6
Also, I'm not sure if my above query can be optimized or if this is poor form. If it can be optimized/improved I would love to learn how.

The tablefunc module that supplies crosstab() is available for 9.1 (like for any other version this side of the millennium). Doesn't Heroku let you install additional modules? Have you tried:
CREATE EXTENSION tablefunc;
For examples how to use it, refer to the manual or this related question:
PostgreSQL Crosstab Query
OR try this search - there are a couple of good answers with examples on SO.
To get you started (like most of the way ..) use this largely simplified and re-organized query as base for the crosstab() call:
SELECT 'added'::text AS col
,SUM(CASE WHEN added_customer THEN 1 ELSE 0 END) AS customers
,SUM(CASE WHEN added_sales_order THEN 1 ELSE 0 END) AS sales_order
,SUM(CASE WHEN added_fulfillment THEN 1 ELSE 0 END) AS item_fulfillment
,SUM(CASE WHEN added_invoice THEN 1 ELSE 0 END) AS invoice
,SUM(CASE WHEN added_ra THEN 1 ELSE 0 END) AS ra
,SUM(CASE WHEN added_credit_memo THEN 1 ELSE 0 END) AS credit_memo
FROM shipments_data
UNION ALL
SELECT 'not_added' AS col
,SUM(CASE WHEN NOT added_customer THEN 1 ELSE 0 END) AS customers
,SUM(CASE WHEN NOT added_sales_order THEN 1 ELSE 0 END) AS sales_order
,SUM(CASE WHEN NOT added_fulfillment THEN 1 ELSE 0 END) AS item_fulfillment
,SUM(CASE WHEN NOT added_invoice THEN 1 ELSE 0 END) AS invoice
,SUM(CASE WHEN NOT added_ra THEN 1 ELSE 0 END) AS ra
,SUM(CASE WHEN NOT added_credit_memo THEN 1 ELSE 0 END) AS credit_memo
FROM shipments_data;
If your columns are defined NOT NULL, you can further simplify the CASE expressions.
If performance is crucial, you can get all aggregates in a single scan in a CTE and split values into two rows in the next step.
WITH x AS (
SELECT count(NULLIF(added_customer, FALSE)) AS customers
,sum(added_sales_order::int) AS sales_order
...
,count(NULLIF(added_customer, TRUE)) AS not_customers
,sum((NOT added_sales_order)::int) AS not_sales_order
...
FROM shipments_data
)
SELECT 'added'::text AS col, customers, sales_order, ... FROM x
UNION ALL
SELECT 'not_added', not_customers, not_sales_order, ... FROM x;
I also demonstrate two alternative ways to build your aggregates - both built on the assumption that all columns are boolean NOT NULL. Both alternatives are syntactically shorter, but not faster. In previous testes all three methods performed about the same.

Related

Invalid column name 'total_counted'

I'm querying the bin table to get the total active bins, total counted bins and calculate the percent of bins counted. Here's my query:
SELECT bin.location_id
,SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active
,SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
--,(total_counted / total_active) as pct_counted
From bin
Group by bin.location_id
Order by bin.location_id
When I try to use the code to create my pct_counted, it tells me "invalid column name" for both of the columns I'm using to calculate that value. Data looks like below.
location_id total_active total_counted
2 11502 484
6 2281 108
15 1772 253
Can anyone help?
You need to repeat the expressions (or use a subquery or CTE). I would recommend:
SELECT bin.location_id,
SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active,
SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
(SUM(CASE WHEN bin.date_last_counted > 0 THEN 1.0 ELSE 0 END) /
SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 END)
) AS as pct_counted
FROM bin
GROUP BY bin.location_id
ORDER BY bin.location_id;
Note that I removed the ELSE clause for the second expression. This avoids divide-by-zero. The 1.0 also ensures decimal division even if your database does integer division.
I would format the sub-query this way:
Select
a.location_id
total_counted/total_active as pct_counted
from(select
bin.location_id
,SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active
,SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
From bin
Group by bin.location_id
Order by bin.location_id) a

I need to Count the Number of Columns that have value more than 0 in SQL

I want to create a calculated field at the end of the columns where it will count all the Columns having values greater than 0.
Below is a sample Data Set.
Account_number DAY_0 DAY_30 DAY_60 DAY_90 DAY_120
acc_001 99 10 0 0.2 0
You can use case expressions:
select t.*,
( (case when day_0 > 0 then 1 else 0 end) +
(case when day_30 > 0 then 1 else 0 end) +
(case when day_60 > 0 then 1 else 0 end) +
(case when day_90 > 0 then 1 else 0 end) +
(case when day_120 > 0 then 1 else 0 end)
) as num_gt_zero
from t;
That said, you probably constructed this from a group by query. You might be able to put this logic directly into that query. If that is the case, ask a new question, with sample data, desired results, and an appropriate database tag.

SAS/SQL sum amounts distinctly for each group by object?

I have the following code:
proc sql;
CREATE TABLE temp AS
(SELECT asofdt,
SUM(CASE WHEN trans_state ='cur_cur' THEN 1 ELSE 0 END) AS _cur_cur,
SUM(CASE WHEN trans_state ='cur_worse' THEN 1 ELSE 0 END) AS _cur_worse,
SUM(CASE WHEN trans_state ='cur_pre' THEN 1 ELSE 0 END) AS _cur_pre,
SUM(CASE WHEN trans_state ='30_better' THEN 1 ELSE 0 END) AS _30_better,
SUM(CASE WHEN trans_state ='30_30' THEN 1 ELSE 0 END) AS _30_30,
SUM(CASE WHEN trans_state ='60_90' THEN 1 ELSE 0 END) AS _60_90
FROM PERFORMANCE_TRANS_STATES_CLEAN
GROUP BY asofdt);
run;
The problem is it is adding the value from the previous group by asofdt onto the next one. So it is a cumulative sum as I go down the group bys. I would like the sum to be specific to each group by object. Any ideas on how?
Here's a picture of my output.
Your program seems fine to me. I reproduced it below with fewer observations and did not find that the total was cumulative.
data df;
input asofdt MMDDYY8. trans_state $;
datalines;
01/01/16 cur_cur
01/02/16 cur_pre
01/02/16 cur_pre
01/02/16 cur_cur
01/03/16 cur_pre
;
run;
proc sql;
CREATE TABLE temp AS
(SELECT asofdt,
SUM(CASE WHEN trans_state ='cur_cur' THEN 1 ELSE 0 END) AS _cur_cur,
SUM(CASE WHEN trans_state ='cur_worse' THEN 1 ELSE 0 END) AS _cur_worse,
SUM(CASE WHEN trans_state ='cur_pre' THEN 1 ELSE 0 END) AS _cur_pre,
SUM(CASE WHEN trans_state ='30_better' THEN 1 ELSE 0 END) AS _30_better,
SUM(CASE WHEN trans_state ='30_30' THEN 1 ELSE 0 END) AS _30_30,
SUM(CASE WHEN trans_state ='60_90' THEN 1 ELSE 0 END) AS _60_90
FROM df
GROUP BY asofdt);
quit;
You might want to check your data, as this query is fine. It is indeed running separately on each ASOFDT. You can check that trivially by comparing a single line with a WHERE (WHERE ASOFDT='01OCT2016'd or WHERE ASOFDT='10/01/2016' depending on the type of that variable).
proc sql;
CREATE TABLE temp AS
(SELECT stock,
SUM(CASE WHEN month(date)=01 THEN 1 ELSE 0 END) AS _jan,
SUM(CASE WHEN month(date)=02 THEN 1 ELSE 0 END) AS _feb,
SUM(CASE WHEN month(date)=03 THEN 1 ELSE 0 END) AS _mar,
SUM(CASE WHEN month(date)=04 THEN 1 ELSE 0 END) AS _apr
FROM sashelp.stocks
GROUP BY stock);
quit;
Nothing about that should be cumulative. Unless your data is cumulative, which it sort of makes sense it would be with "ASOFDT"?

Combinations of Products as single count

I need to count combinations of products within transactions differently to other products and I'm struggling with how to do this within a single select statement from SQL 2008. This would then become a data set to manipulate in Reporting Services
raw data looks like this
txn, prod, units
1, a, 2
1, c, 1
2, a, 1
2, b, 1
2, c, 1
3, a, 2
3, b, 1
4, a, 3
4, c, 2
So a+b should = one if in same trans number, however a or b should equal one if not paired. So a=1 and b=1 but a+b=1, a+b+a=2, a+b+a+b=2 given the example data here is my desired result with an explanation of why
txn 1 is 3 units -- 2a + c
txn 2 is 2 units -- (a+b) + c
txn 3 is 2 units -- (a+b) + a
txn 4 is 5 units -- 3a + 2c
My query is more complex than this and includes other aggregates so I would like to group by transaction which I can't do as I need to manipulate at a lower grain
Update Progress :
Possible solution, I've generated columns based on the products I'm measuring. This allows me to group on Txn as I am now aggregating that field. Unsure if there's a better way to do it as it does take a little while
CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)=0
THEN SUM(CASE WHEN Prod='a' then 1 else 0 end)
ELSE 0 END AS MixProd
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)!=0
THEN ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
ELSE 0 END AS NotMixProd
I will then need to sort out the current unit aggregate to remove the extras but this certainly gives me a start
Update Progress 2 :
This failed to handle 0 correctly where a or b was 0 it would still give a value for mix because a-b was not zero. I reverted to an earlier draft that I lost and expanded as per below
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end) = 0 THEN 0
WHEN SUM(CASE WHEN Prod='b' then 1 else 0 end) = 0 THEN 0
WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)=0
THEN SUM(CASE WHEN Prod='a' then 1 else 0 end)
ELSE ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
END AS MixProd
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)!=0
THEN ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
ELSE 0 END AS NotMixProd
UPDATE: This should work in SQL Server 2008 (based on LAG solution from here).
Here is the demo: http://rextester.com/GNI23706
WITH CTE AS
(
select txn, prod, units,
row_number() over (partition by txn order by prod) rn,
(row_number() over (partition by txn order by prod))/2 rndiv2,
(row_number() over (partition by txn order by prod)+1)/2 rnplus1div2,
count(*) over (partition by txn) partitioncount
from test_data
)
select
txn,
sum(case when prev_prod = 'a' and prod = 'b' and prev_units >= units then 0
when prev_prod = 'a' and prod = 'b' and prev_units < units then units - prev_units
else units
end) units
from
(
select
txn,
prod,
units,
CASE WHEN rn%2=1
THEN MAX(CASE WHEN rn%2=0 THEN prod END) OVER (PARTITION BY txn,rndiv2)
ELSE MAX(CASE WHEN rn%2=1 THEN prod END) OVER (PARTITION BY txn,rnplus1div2)
END AS prev_prod,
CASE WHEN rn%2=1
THEN MAX(CASE WHEN rn%2=0 THEN units END) OVER (PARTITION BY txn,rndiv2)
ELSE MAX(CASE WHEN rn%2=1 THEN units END) OVER (PARTITION BY txn,rnplus1div2)
END AS prev_units
from cte
) temp
group by txn
For SQL Server 2012+, use LAG:
select
txn,
sum(
case when prev_prod = 'a' and prod = 'b' and prev_units >= units then 0
when prev_prod = 'a' and prod = 'b' and prev_units < units then units - prev_units
else units
end) units
from
(
select
txn,
prod,
units,
lag(prod) over (partition by txn order by prod) prev_prod,
lag(units) over (partition by txn order by prod) prev_units
from test_data
) temp
group by txn
I decided in the end that a temp table was the best way to go, because I couldn't group on a collation. So I eventually tweaked the code above as it was failing to pick up the spare items correctly
SUM(Units) AS OldUnits
SUM(Units) -
(CASE WHEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) = 0 THEN 0 WHEN
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) = 0 THEN 0 WHEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) -
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) = 0 THEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) WHEN
(SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) -
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END)) < 0 THEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) ELSE
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) END) AS NewUnits
This was stored in a temptable that I could then collate on Trans as the next step. Works fine for my purposes and helped me overcome a mild irrational fear I have of temptables

How to do addition and division of aliased columns in a query?

I am using SQL Server 2008.
I am trying to do some basic math in some basic queries. I need to add up wins, losses, total, and percentages. I usually ask for the raw numbers and then do the calculations once I return my query to page. I would like to give SQL Server the opportunity to work a little harder.
What I want to do is something like this:
SELECT SUM(case when vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when vote = 0 then 1 else 0 end) as TotalLosses,
TotalWins + TotalLosses as TotalPlays,
TotalPlays / TotalWins as PctWins
Here's what I am doing now:
SELECT SUM(case when vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when vote = 0 then 1 else 0 end) as TotalLosses,
SUM(case when vote = 1 then 1 else 0 end) + SUM(case when vote = 0 then 1 else 0 end) as Votes
What is the easiest, cleanest way to do simple math calculations like this in a query?
*EDIT: *
While I got some great answers, I didn't get what I was looking for.
The scores that I will be calculating are for a specific team, so, my results need to be like this:
TeamID Team Wins Losses Totals
1 A's 5 3 8
2 Bee's 7 9 16
3 Seas 1 3 4
SELECT T.TeamID,
T.Team,
V.TotalWins,
V.TotalLosses,
V.PctWins
FROM Teams T
JOIN
SELECT V.TeamID,
SUM(case when vote = 1 then 1 else 0 end) as V.TotWin,
SUM(case when vote = 0 then 1 else 0 end) as V.TotLoss
FROM Votes V
GROUP BY V.TeamID
I tried a bunch of things, but don't quite know what wrong. I am sure the JOIN part is where the problem is though. How do I bring these two resultsets together?
One way is to wrap your query in an external one:
SELECT TotalWins,
TotalLosses,
TotalWins + TotalLosses as TotalPlays,
TotalPlays / TotalWins as PctWins
FROM
( SELECT SUM(case when vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when vote = 0 then 1 else 0 end) as TotalLosses
FROM ...
)
Another way (suggested by #Mike Christensen) is to use Common Table Expressions (CTE):
; WITH Calculation AS
( SELECT SUM(case when vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when vote = 0 then 1 else 0 end) as TotalLosses
FROM ...
)
SELECT TotalWins,
TotalLosses,
TotalWins + TotalLosses as TotalPlays,
TotalPlays / TotalWins as PctWins
FROM
Calculation
Sidenote: No idea if this would mean any preformance difference in SQL-Server but you can also write these sums:
SUM(case when vote = 1 then 1 else 0 end)
as counts:
COUNT(case when vote = 1 then 1 end) --- the ELSE NULL is implied
try
select a, b, a+b as total
from (
select
case ... end as a,
case ... end as b
from realtable
) t
To answer your second question, this is the code you put forward with corrections to the syntax:
SELECT
T.TeamID,
T.Team,
V.TotalWins,
V.TotalLosses,
PctWins = V.TotalWins * 100 / CAST(V.TotalWins + V.TotalLosses AS float)
FROM Teams T
JOIN (
SELECT
TeamID,
SUM(case when vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when vote = 0 then 1 else 0 end) as TotalLosses
FROM Votes
GROUP BY TeamID
) as V on T.TeamID = V.TeamID
Note the brackets around the inner select.
It might help you if you're doing this sort of thing more than once to create a view...
CREATE VIEW [Totals]
SELECT
SUM(case when T.vote = 1 then 1 else 0 end) as TotalWins,
SUM(case when T.vote = 0 then 1 else 0 end) as TotalLosses,
T.SomeGroupColumn
FROM SomeTable T
GROUP BY T.SomeGroupColumn