Get full drilldown of a sparce fact table - sql

I have a transaction fact table and product,time and location as a dimension tables. This fact table is sparse so if no Pizzas sold in January there is no record for Pizza in fact table.
When I drill down by product aggregated results for Pizza which is not in the result. But I want it with 0 values as units_sold = 0.
A solution is to join product table to fact table with a left outer join. Then I can get the desired result.
But when I cut with another dimension such as location or time, again those products are missing in the result.
Outer join provides empty column for other dimensional foreign keys so WHERE clause will remove them again.
How can I solve the problem? (I use ROLAP)
Use join condition is a good idea as some people answered. But I need more general solution.
For example,
Table1
person birth year death year
a 1950 2006
b 1952 2008
c 1960 2007
d 1953 1990
I want to get year by year count of the people that born between 1950-1953 and died in 2006-2008.
Like
birth = 1950 death = 2006 count = 1
birth = 1951 death = 2006 count = 0
...
Can we handle this scenario by using join conditions and where conditions appropriately.

You want LEFT JOIN, and then LEFT JOIN again. Then conditions go in the on clause. Something like this:
select . . .
from products p left join
fact f
on p.product_id = f.product_id left join
timedim td
on f.time_id = td.time_id and
td.month = 'January'

Related

Join two tables with additional field in one table [duplicate]

This question already has answers here:
SQL JOIN and different types of JOINs
(6 answers)
Closed 3 years ago.
I would like to join together two tables with additional columns.
First table is for number of products despatched by product
** Table 1 - Despatches **
Month ProductID No_despatched
Jan abc 10
Jan def 15
Jan xyz 12
The second table is for the number of products returned by product, but also an additional column by return reason
** Table 2 - Returns **
Month ProductID No_returned Return_reason
Jan abc 2 Too big
Jan abc 3 Too small
Jan xyz 1 Wrong colour
I would like to join the tables to show returns and despatched on the same row with the number of despatched being duplicated if there are multiple return reasons for the same product.
** Desired output **
Month ProductID No_despatched No_returned Return_reason
Jan abc 10 2 Too big
Jan abc 10 3 Too small
Jan xyz 12 1 Wrong colour
Hope this makes sense...
Thanks in advance!
afk
This seems like a basic JOIN:
select r.month, r.productid, d.no_despathed, r.no_returned, r.return_reason
from returns r join
despatches d
on r.month = d.month and r.productid = d.productid;
The results don't seem particularly useful, because some products are missing (those with no returns). And the amounts are duplicated if there is more than one return record.
just use join
select a.*,b.No_returned,.Return_reason from
table1 join table2 on a.ProductID=b.ProductID
and a.month=b.month
In case of duplicate you may use distinct
Changing the order of clauses in your question produces the result.
with additional columns.
SELECT Table1.Month, Table1.ProductID, Table1.NoDespatched, Table2.NoReturned, Table2.ReturnReason
join two tables
FROM Table1 LEFT JOIN Table2
ON Table1.Month=Table2.Month AND Table1.ProductID=Table2.ProductID
We use a LEFT JOIN because, presumably a product can be dispatched without being returned, but nobody can return a product you didn't send out.

Postgresql: Values of multiple rows in one row

I have the following database:
Car: {[CarID, HorsePower, Brand, HeadDesigner]}
DesignsCar:{[CarID, DesID]}
Designer:{[DesID, Name]}
You should note that while every Car has only 1 HeadDesigner, multiple people can design cars (as in work on them).
Say I have 10 cars in my database. For CarID (1..9) only one DesID per CarID in DesignsCar.
However, for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
Say I do this:
select *
from car c
left outer join designscar ds on c.carid = ds.carid
left outer join designer d on frb.persnr = r.persnr
This gives me 12 rows, when I only want 10. The reason why this gives me 12 rows should be clear: for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
I hope I've done a good job explaining this problem, so here comes my question:
How do I modify the query above so I get 10 Rows. For CarID 10 I'd like the 3 designers to be written in one column (like, comma separated but anything works as long it's in one column).
Is that possible?
You need to aggregate the values. Here is one possibility:
select c.*,
array_agg(d.name) as designer_names
from car c left outer join
designscar ds
on c.carid = ds.carid left outer join
designer d
on frb.persnr = r.persnr
group by c.carid ; -- allowed assuming `carid` is the primary key

Apportioning data into new columns

Morning,
I am quite new to SQL Server 2008 so I was wondering if you could help me.
I currently have:
SELECT
c.code, d.date, d.date_previous,
CAST(d.date-date_previous as int) AS Days,
d.units, d.cost
FROM table1 AS d
INNER JOIN table2 AS p ON d.ID = p.ID
INNER JOIN table3 AS c ON p.c_id = c.ID
WHERE date_previous > '31/12/2012'
This is bringing back one row per invoice received after 31/12/2012. The aim is to get the following columns:
Code Jan data Feb data Mar data etc...
one unique code per line (so I'm assuming row partitioning is required)
Where a bill has a period of 3 months with, for example, 300 units, I'd like that separated out across 3 months (100 in each)
I'm aware I'd probably need to use a pivot function and some temp tables but I'm not that advanced yet.

Having problems fully understanding GROUP BY

I'm going over some practise questions for an exam that I have coming up and I'm having a problem fully understanding group by. I see GROUP BY as the following: group the result set by one or more columns.
I have the following database schema
My query
SELECT orders.customer_numb, sum(order_lines.cost_line), customers.customer_first_name, customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb, order_lines.cost_line, customers.customer_first_name, customers.customer_last_name
ORDER BY order_lines.cost_line DESC
What I'm struggling to understand
Why can't I simply use just GROUP BY orders.cost_line and group the data by cost_line?
What I'm trying to achieve
I'd like to achieve the name of the customer who has spent the most money. I just don't fully understand how to achieve this. I understand how joins work, I just can't seem to get my head around why I can't simply GROUP BY customer_numb and cost_line (with sum() used to calculate the amount spent). I seem to always get "not a GROUP BY expression", if someone could explain what I'm doing wrong (not just give me the answer), that would be great - I'd really appreciate that, and of course any resources that you have for using GROUP by properly.
Sorry for the long essay and If I've missed anything I apologise. Any help would be greatly appreciated.
I just can't seem to get my head around why I can't simply GROUP BY
customer_numb and cost_line (with sum() used to calculate the amount
spent).
When you say group by customer_numb you know that customer_numb uniquely identifies a row in the customer table (assuming customer_numb is either a primary or alternate key), so that any given customers.customer_numb will have one and only one value for customers.customer_first_name and customers.customer_last_name. But at parse time Oracle does not know, or at least acts like it does not know that. And it says, in a bit of panic, "What do I do if a single customer_numb has more than one value for customer_first_name?"
Roughly the rule is, expressions in the select clause can use expressions in the group by clause and/or use aggregate functions. (As well as constants and system variables that don't depend on the base tables, etc.) And by "use" I mean be the expression or part of the expression. So once you group on first name and last name, customer_first_name || customer_last_name would be a valid expression also.
When you have a table, like customers and are grouping by a primary key, or a column with a unique key and not null constraint, you can safely include them in group by clause. In this particular instance, group by customer.customer_numb, customer.customer_first_name, customer.customer_last_name.
Also note, that the order by in the first query will fail, since order_lines.cost_line doesn't have a single value for the group. You can order on sum(order_lines.cost_line) or use an column alias in the select clause and order on that alias
SELECT orders.customer_numb,
sum(order_lines.cost_line),
customers.customer_first_name,
customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb,
customers.customer_first_name,
customers.customer_last_name
ORDER BY sum(order_lines.cost_line)
or
SELECT orders.customer_numb,
sum(order_lines.cost_line) as sum_cost_line,
. . .
ORDER BY sum_cost_line
Note: I've heard that some RDBMSes will imply additional expressions for the grouping without them being explicitly stated. Oracle is not one of those RDBMSes.
As for grouping by both customer_numb and cost_line Consider a DB with two customers, 1 and 2 with two orders of one line each:
Customer Number | Cost Line
1 | 20.00
1 | 20.00
2 | 35.00
2 | 30.00
select customer_number, cost_line, sum(cost_line)
FROM ...
group by customer_number, cost_line
order by sum(cost_line) desc
Customer Number | Cost Line | sum(cost_line)
1 | 20.00 | 40.00
2 | 35.00 | 35.00
2 | 30.00 | 30.00
The first row with highest sum(cost_line) is not the customer who spent the most.
I understand how joins work, I just can't seem to get my head around
why I can't simply GROUP BY customer_numb and cost_line (with sum()
used to calculate the amount spent).
This should give you the sum for every customer.
SELECT orders.customer_numb, sum(order_lines.cost_line)
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
Note that every column in the SELECT clause that's not an argument to an aggregate function is also a column in the GROUP BY clause.
Now you can join that with other tables to get more detail. Here's one way using a common table expression. (There are other ways to express what you want.)
with customer_sums as (
-- We give the columns useful aliases here.
SELECT orders.customer_numb as customer_numb,
sum(order_lines.cost_line) as total_orders
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
)
select c.customer_numb, c.customer_first_name, c.customer_last_name, cs.total_orders
from customers c
inner join customer_sums cs
on cs.customer_numb = c.customer_numb
order by cs.total_orders desc
Why can't I simply use just GROUP BY orders.cost_line and group the
data by cost_line?
Applying GROUP BY to order_lines.cost_line will give you one row for each distinct value in order_lines.cost_line. (The column orders.cost_line doesn't exist.) Here's what that data might look like.
OL.ORDER_NUMB OL.COST_LINE O.CUSTOMER_NUMB C.CUSTOMER_FIRST_NAME C.CUSTOMER_LAST_NAME
--
1 1.45 2014 Julio Savell
1 2.33 2014 Julio Savell
1 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
3 13.00 2014 Julio Savell
You can group by order_lines.cost_line, but it won't give you any useful information. This query
select order_lines.cost_line, orders.customer_numb
from order_lines
inner join orders on orders.customer_numb = order_lines.customer_numb
group by order_lines.cost_line;
should return something like this.
OL.COST_LINE O.CUSTOMER_NUMB
--
1.45 2014
2.33 2014
13.00 2014
Not terribly useful.
If you're interested in the sum of the order line items, you need to decide what column or columns to group (summarize) by. If you group (summarize) by order number, you'll get three rows. If you group (summarize) by customer number, you'll get one row.

Sorting by date across two separate columns in a Full Outer Join

I have two columns of data I am lining up using a Full Outer Join but it includes two separate date columns which make it challenging to sort by.
Table 1 has sales rank data for a product.
Table 2 has actual sales data for the same product.
Each table may have entries for dates on which the other does not.
So envision after the full join, we end up with something like this simplified example:
ProdID L.Date P.Date Rank Units
101 null 2011-10-01 null 740
101 2011-10-02 2011-10-02 23 652
101 2011-10-03 null 32 null
Here is the query I am using to pull this data:
select L.ListID, L.ASIN, L.date, L.ranking, P.ASIN, P.POSdate, P.units from ListItem L
full outer join POSdata P on
L.ASIN = P.ASIN and
L.date = P.POSdate and
(L.ListID = 1 OR L.ASIN is null)
where (L.ASIN = 'xxxxxxxxxx' and L.ListID = 1) or
(P.ASIN = 'xxxxxxxxxx' and L.BookID is null)
order by POSdate, date
It's a bit more complex because products may appear on multiple lists so I have to account for that as well, but it returns the data I need. I am open to suggestions on improving it of course should someone have one.
The problem is, how can I sort this properly when both date columns are likely to have at least some NULLs in them. The way I am Ordering By now will not work when both columns have at one NULL.
Thanks.
ORDER BY ISNULL(P.POSdate,L.date) should do what you need I think?