SQL How to Pivot this table? - sql

I have a very hard time understanding how to pivot something.
I have this simple query
select
year
,AVG(Quantity) Quantity
,AVG(Price) Price
,CAST(Datepart(wk,Date) as nvarchar) + '-' + RIGHT(CAST(year(Date) as NVARCHAR),2) Week
from Yearly
GROUP BY Year, CAST(Datepart(wk,Date) as nvarchar) + '-' + RIGHT(CAST(year(Date) as NVARCHAR),2)
Which results in this table
+------+----------+---------------+------+
| year | Quantity | Price | Week |
+------+----------+---------------+------+
| 16 | 877814 | 68636081.39 | 6-20 |
| 17 | 436029 | 2635873.72 | 6-20 |
| 18 | 3793464 | 65971353.61 | 6-20 |
| 19 | 23552519 | 478741292.122 | 6-20 |
| 20 | 6973687 | 34658140.815 | 6-20 |
| Z01 | 7776508 | 54949609.221 | 6-20 |
+------+----------+---------------+------+
Right now I only have the one week, but as the days go by, I have a job that is going to build those 6 rows for 7-20, 8-20, 9-20, etc.
I want my table to look like
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
| | 16 | 17 | 18 | 19 | 20 | Z01 | Total | |
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
| Week | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price |
| 6-20 | 877814 | 68636081.39 | 436029 | 2635873.72 | 3793464 | 65971353.61 | 23552519 | 478741292.1 | 6973687 | 34658140.82 | 7776508 | 54949609.22 | 43410021 | 705592350.9 |
| 7-20 | | | | | | | | | | | | | | |
| 8-20 | | | | | | | | | | | | | | |
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
Should I use Pivot or is there a better way to do this? If

This is a variation on pwilcox's answer, but more concise:
select v.week,
avg(case when year = 16 then quantity end) as quantityYr16,
avg(case when year = 16 then price end) as priceYr16,
avg(case when year = 17 then quantity end) as quantityYr17,
avg(case when year = 17 then price end) as priceYr17,
. . .
sum(quantity) as totalQuantity,
sum(price) as totalPrice
from yearly cross apply
(values (concat(datename(week, date), '-', datename(year, date)))
) v(week)
group by v.week
order by v.week;
Notes:
Never use varchar() without a length. The default length varies by context and may not be long enough.
datename() is a convenient function that returns strings and not numbers.
When using the date part functions, spell out the full names of the date parts -- week, year. This makes the code easier to read.

For a multi column pivot of the sort that you're wanting, you're going to have to take advantage of the fact that aggregate operations don't consider null values. So place case statements inside your averages that give the quantity or price value if associated with any given year, and null otherwise.
select ap.week,
quantityYr16 = avg(case when year = 16 then quantity end),
priceYr16 = avg(case when year = 16 then price end),
quantityYr17 = avg(case when year = 17 then quantity end),
priceYr17 = avg(case when year = 17 then price end),
...
from yearly
cross apply (select week =
cast(datepart(wk,date) as nvarchar) + '-' +
right(cast(year(date) as nvarchar),2)
) ap
group by ap.week
However, this structure is for reporting. SQL doesn't handle it as well as reporting tools such as HTML, SSRS or Excel. I would do this operation with whatever reporting tool you ultimately report this with.

Here is a PIVOT. Assumed you did not need Dynamic
Example
Select *
From (
Select A.Week
,B.*
From (
-- YOUR ORIGINAL QUERY HERE (without the Order By) ---
) A
Cross Apply ( values (concat(year,'_Qty') ,[Quantity])
,(concat(year,'_Price'),[Price])
,(concat('Total','_Qty'),[Quantity])
,(concat('Total','_Price'),[Price])
) B(item,value)
) src
Pivot (sum(Value) for Item in ([16_Qty],[16_Price],[17_Qty],[17_Price],[18_Qty],[18_Price],[19_Qty],[19_Price],[20_Qty],[20_Price],[Z01_Qty],[Z01_Price],[Total_Qty],[Total_Price]) ) pvt
Returns

Related

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.
I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.
I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Get range (min - max) of values concatenated in a single row

given the following table
+-----------------------------+
| id | type | price | item_id |
|-----------------------------|
| 1 | 1 | 20 | 22 |
|-----------------------------|
| 2 | 1 | 22 | 22 |
|-----------------------------|
| 3 | 2 | 19 | 22 |
|-----------------------------|
| 4 | 2 | 11 | 22 |
|-----------------------------|
| 5 | 1 | 08 | 22 |
|-----------------------------|
| 6 | 2 | 25 | 22 |
+-----------------------------+
I am trying to select the data to create a view as follows in a single row
+-------------------------------------+
| type1_range | type2_range | item_id |
|-------------------------------------|
| 08 - 22 | 11 - 25 | 22 |
+-------------------------------------+
type1_range and type2_range are the minimum and maximum price for each types.
I can get the data in couple of rows using
SELECT type, MAX (price) , MIN (price)
FROM table
where item_id=22 GROUP BY type;
+----------------------------+
| type | max | min | item_id |
|----------------------------|
| 1 | 22 | 08 | 22 |
|----------------------------|
| 2 | 25 | 11 | 22 |
+----------------------------+
But I am trying to concat the rows like this:
+-------------------------------------+
| type1_range | type2_range | item_id |
|-------------------------------------|
| 08 - 22 | 11 - 25 | 22 |
+-------------------------------------+
What would be sql required for this?
Something like this:
SELECT
CONCAT(
MIN(CASE WHEN type = 1 THEN price END),
' - ',
MAX(CASE WHEN type = 1 THEN price END)
) as type1range,
CONCAT(
MIN(CASE WHEN type = 2 THEN price END),
' - ',
MAX(CASE WHEN type = 2 THEN price END)
) as type2range.
item_id
FROM table
WHERE item_id = 22
GROUP BY item_id
You've tagged two different database systems (please avoid doing this) but I believe they do both support CONCAT() for string concatenation
If you want to omit the item_id from the select list (you already know it's item 22) you can remove the GROUP BY. Alternatively if you remove the WHERE and leave the group by you'll get a row for each item_id
To get more of an idea as to how it works, remove the concat and the min/max - you'll see that the case when causes the price to show up only if the type is 1 (in the type 1 range column) otherwise it's null. It's the. Trivial for the min and max to work on just type 1 or just type 2 data for each column. It's actually a form of pivot query if you want to read up on them more
A straight forward approach would be having type1_range and type2_range as two sub-queries and join with the distinct id's like shown below,
SELECT t.item_id,type1_range,type2_range
FROM (Select distinct item_id from table) t
LEFT join
(SELECT item_id,type, concat(MIN(price),'-' ,MAX(price) ) as type1_range
FROM table
where type=1
GROUP BY item_id,type)type1 on type1.item_id=t.item_id
LEFT join
(SELECT item_id,type, concat(MIN(price),'-' ,MAX(price) ) as type2_range
FROM table
where type=2
GROUP BY item_id,type)type2 on type2.item_id=t.item_id

SQL Server 2008 version of OVER(... Rows Unbounded Preceding)

Looking for help in converting this to SQL Server 2008 friendly as I just can't work it out. I've tried cross applies and inner joins (not saying I did them right) to no avail... Any suggestions?
What this essentially does is have a table of stock and a table of orders.
and combine the two to show me what to pick once the stock is taken away (see my previous question for more details More Details)
WITH ADVPICK
AS (SELECT 'A' AS PlaceA,
placeb,
CASE
WHEN picktime = '00:00' THEN '07:00'
ELSE ISNULL(picktime, '12:00')
END AS picktime,
Cast(product AS INT) AS product,
prd_description,
-qty AS Qty
FROM t_pick_orders
UNION ALL
SELECT 'A' AS PlaceA,
placeb,
'0',
Cast(code AS INT) AS product,
NULL,
stock
FROM t_pick_stock),
STOCK_POST_ORDER
AS (SELECT *,
Sum(qty)
OVER (
PARTITION BY placeb, product
ORDER BY picktime ROWS UNBOUNDED PRECEDING ) AS new_qty
FROM ADVPICK)
SELECT *,
CASE
WHEN new_qty > qty THEN new_qty
ELSE qty
END AS order_shortfall
FROM STOCK_POST_ORDER
WHERE new_qty < 0
ORDER BY placeb,
picktime,
product
Now the whole sum over partition by order by is SQL Server 2012+ however I have two servers that run on 2008 and so need it converted...
Expected Results:
+--------+--------+----------+---------+-----------+-------+---------+-----------------+
| PlaceA | PlaceB | Picktime | product | Prd_Descr | qty | new_qty | order_shortfall |
+--------+--------+----------+---------+-----------+-------+---------+-----------------+
| BW | AMES | 16:00 | 1356 | Product A | -1330 | -17 | -17 |
| BW | AMES | 16:00 | 17 | Product B | -48 | -42 | -42 |
| BW | AMES | 17:00 | 1356 | Product A | -840 | -857 | -840 |
| BW | AMES | 18:00 | 1356 | Product A | -770 | -1627 | -770 |
| BW | AMES | 18:00 | 17 | Product B | -528 | -570 | -528 |
| BW | AMES | 19:00 | 1356 | Product A | -700 | -2327 | -700 |
| BW | AMES | 20:00 | 1356 | Product A | -910 | -3237 | -910 |
| BW | AMES | 20:00 | 8009 | Product C | -192 | -52 | -52 |
| BW | AMES | 20:00 | 897 | Product D | -90 | -10 | -10 |
+--------+--------+----------+---------+-----------+-------+---------+-----------------+
One straight-forward way to do it is to use a correlated sub-query in CROSS APPLY.
If your table is more or less large, then your next question would be how to make it fast. Index on PlaceB, Product, PickTime INCLUDE (Qty) should help. But, if your table is really large, cursor would be better.
WITH
ADVPICK
AS
(
SELECT 'A' as PlaceA,PlaceB, case when PickTime = '00:00' then '07:00' else isnull(picktime,'12:00') end as picktime, cast(Product as int) as product, Prd_Description, -Qty AS Qty FROM t_pick_orders
UNION ALL
SELECT 'A' as PlaceA,PlaceB, '0', cast(Code as int) as product, NULL, Stock FROM t_pick_stock
)
,stock_post_order
AS
(
SELECT
*
FROM
ADVPICK AS Main
CROSS APPLY
(
SELECT SUM(Sub.Qty) AS new_qty
FROM ADVPICK AS Sub
WHERE
Sub.PlaceB = Main.PlaceB
AND Sub.Product = Main.Product
AND T.PickTime <= Main.PickTime
) AS A
)
SELECT
*,
CASE WHEN new_qty > qty THEN new_qty ELSE qty END AS order_shortfall
FROM
stock_post_order
WHERE
new_qty < 0
ORDER BY PlaceB, picktime, product;
Oh, and if (PlaceB, Product, PickTime) is not unique, you'll get somewhat different results to original query with SUM() OVER. If you need exactly same results, you need to use some extra column (like ID) to resolve the ties.

SQL query to select today and previous day's price

I have historic stock price data that looks like the below. I want to generate a new table that has one row for each ticker with the most recent day's price and its previous day's price. What would be the best way to do this? My database is Postgres.
+---------+------------+------------+
| ticker | price | date |
+---------+------------+------------|
| AAPL | 6 | 10-23-2015 |
| AAPL | 5 | 10-22-2015 |
| AAPL | 4 | 10-21-2015 |
| AXP | 5 | 10-23-2015 |
| AXP | 3 | 10-22-2015 |
| AXP | 5 | 10-21-2015 |
+------- +-------------+------------+
You can do something like this:
with ranking as (
select ticker, price, dt,
rank() over (partition by ticker order by dt desc) as rank
from stocks
)
select * from ranking where rank in (1,2);
Example: http://sqlfiddle.com/#!15/e45ea/3
Results for your example will look like this:
| ticker | price | dt | rank |
|--------|-------|---------------------------|------|
| AAPL | 6 | October, 23 2015 00:00:00 | 1 |
| AAPL | 5 | October, 22 2015 00:00:00 | 2 |
| AXP | 5 | October, 23 2015 00:00:00 | 1 |
| AXP | 3 | October, 22 2015 00:00:00 | 2 |
If your table is large and have performance issues, use a where to restrict the data to last 30 days or so.
Best bet is to use a window function with an aggregated case statement which is used to create a pivot on the data.
You can see more on window functions here: http://www.postgresql.org/docs/current/static/tutorial-window.html
Below is a pseudo code version of where you may need to head to answer your question (sorry I couldn't validate it due to not have a postgres database setup).
Select
ticker,
SUM(CASE WHEN rank = 1 THEN price ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price ELSE 0 END) yesterday
FROM (
SELECT
ticker,
price,
date,
rank() OVER (PARTITION BY ticker ORDER BY date DESC) as rank
FROM your_table) p
WHERE rank in (1,2)
GROUP BY ticker.
Edit - Updated the case statement with an 'else'