I have a challenge which I can't seem to resolve on my own and now need help!
I have a requirement to show parallel year date sales via SQL and by that I mean if today (20/08/2015) Customer A has purchased products worth 500, I want to know how much Customer A spent on the same day last year (so 20/08/2014).
Here's a SQL fiddle where I've built everything (I reckoned that would be easiest for you guys). I have 3 dimensions (DimProduct, DimDate and DimCustomer), a fact table (FactSales) and a view (VW_ParallelSales) which I've built on top. I have also left a query on the right hand side with what I'm trying to achieve. If you run the query you will see that for Antonio, the SaleAmount on 20140820 was 3500 and if you look at the very bottom of the table, you can see there's one more record for Antonio in the fact table on 20150820 for 6500. So esentially, what I want is to have that 3500 which was sold on 20140820 (which is the parallel year date of 20150820) under the column ParallelSales (which at the moment is showing as NULL).
It all works like a charm if I don't include the ProductKey in the view and have just the CustomerKey (see this fiddle). However, as soon as I add the Product Key, because there is no exact match of CustomerKey-ProductKey that has happened in the past, I'm getting NULLS for ParallelSales (or at least that's what I think the reason is).
What I want to be able to do is then use the view and join on both DimCustomer and DimProduct and run queries both ways, i.e.:
Query 1: How much did Customer A spend today vs today last year?
Query 2: How much of Product A did we sell today vs today last year?
At the moment, as is, I need to have 2 views for that - one that joins the two sub-queries in the view on CustomerKey and the other one - on ProductKey (and obviously the dates).
I know it's a lot to ask but I do need to get this to work and would appreciate your help immensely! Thanks :)
For customer sales in diferent years.
SQL Fiddle Demo
SELECT DimCustomer.CustomerName,
VW_Current.Saledate,
VW_Current.ParallelDate,
VW_Current.CurrentSales,
VW_Previous.CurrentSales as ParallelSale
FROM DimCustomer
INNER JOIN VW_ParallelSales VW_Current
ON DimCustomer.CustomerKey = VW_Current.CustomerKey
LEFT JOIN VW_ParallelSales VW_Previous
ON VW_Current.ParallelDate = VW_Previous.Saledate
AND DimCustomer.CustomerKey = VW_Previous.CustomerKey
ORDER BY 1, 2
For productkey
SQL Fiddle Demo
With sales as (
SELECT
DimProduct.ProductKey,
DimProduct.ProductName,
VW_ParallelSales.Saledate,
VW_ParallelSales.ParallelDate,
VW_ParallelSales.CurrentSales,
VW_ParallelSales.ParallelSales
FROM DimProduct INNER JOIN VW_ParallelSales ON DimProduct.ProductKey =
VW_ParallelSales.ProductKey
)
SELECT
s_recent.ProductName,
s_recent.Saledate ThisYear,
s_old.Saledate PreviousYear,
s_recent.CurrentSales CurrentSales,
s_old.CurrentSales ParallelSales
FROM
SALES s_recent
left outer join SALES s_old
on s_recent.saledate = s_old.saledate + 10000
and s_recent.ProductKey = s_old.ProductKey
Related
I am trying to find the number of sales per petrol company. However I also want to include petrol companies that have made no sales but I cannot figure out how to do it. The stations table includes all the stations however sales only includes stations which actually had any sales.
This is how I am finding the number of sales per petrol station, but this doesnt include companies with 0 sales:
select stations.company,count(sales.sale)
from stations
join sales on stations.id=sales.stationid
group by stations.company;
My idea is to create a union with another query which just finds the companies with 0 sales but I don't know how to get a column with a 0 value in it. I tried to add having count(sales.sale) = 0 but since the stations with no sales just don't appear in the sales table that doesn't work.
I have looked at similar stack overflow questions but they all seem to reference using a different type of join however I have tried using left/right outer/inner joins with no luck.
You haven't provided a minimum reproducible example so I can't be entirely certain, but seems like a case of needing a LEFT JOIN. Try the query below :
SELECT stations.company, count(sales.sale) AS TotalSales
FROM stations
LEFT JOIN sales ON stations.id=sales.stationid
GROUP BY stations.company
Please propose an approach I should follow since I am obviously missing the point. I am new to SQL and still think in terms of MS Access. Here's an example of what I'm trying to do: Like I said, don't worry about the detail, I just want to know how I would do this in SQL.
I have the following tables:
Hrs_Worked (staff, date, hrs) (200 000+ records)
Units_Done (team, date, type) (few thousand records)
Rate_Per_Unit (date, team, RatePerUnit) (few thousand records)
Staff_Effort (staff, team, timestamp) (eventually 3 - 4 million records)
SO I need to do the following:
1) Calculate what each team earned by multiplying their units with RatePerUnit and Grouping on Team and Date. I create a view TeamEarnPerDay:
Create View teamEarnPerDay AS
SELECT
,Units_Done.Date,
,Units_Done.TeamID,
,Sum([Units_Done]*[Rate_Per_Unit.Rate]) AS Earn
FROM Units_Done INNER JOIN Rate_Per_Unit
ON (Units_Done.quality = Rate_Per_Unit.quality)
AND (Units_Done.type = Rate_Per_Unit.type)
AND (Units_Done.TeamID = Rate_Per_Unit.TeamID)
AND (Units_Done.Date = Rate_Per_Unit.Date)
GROUP BY
Units_Done.Date,
Units_Done.TeamID;
2) Count the TEAM's effort by Grouping Staff_Effort on Team and Date and counting records. This table has a few million records.
I have to cast the timestamp as a date....
CREATE View team_effort AS
SELECT
TeamID
,CAST([Timestamp] AS Date) as TeamDate,
,Count(Staff_EffortID) AS TeamEffort
FROM Staff_Effort
GROUP BY
TeamID
,CAST([Timestamp] AS Date);
3) Calculate the Team's Rate_of_pay: (1) Team_earnings / (2) Team_effort
I use the 2 views I created above. This view's performance drops but is still acceptable to me.
Create View team_rate_of_pay AS
SELECT
tepd.Date
,tepd.TeamID
,tepd.Earn
,tepd.TeamBags
,[Earn]/[TeamEffort] AS teamRate
FROM teamEarnPerDay
INNER JOIN team_effort
ON (teamEarnPerDay.Date = team_effort.TeamDate)
AND (teamEarnPerDay.TeamID = team_effort.TeamID);
4) Group Staff_Effort on Date and Staff and count records to get each individuals's effort. (share of the team effort)
I have to cast the Timestamp as a date....
Create View staff_effort AS
SELECT
TeamID
,StaffID
,CAST([Timestamp] AS Date) as StaffDate
,Count(Staff_EffortID) AS StaffEffort
FROM Staff_Effort
GROUP BY
,TeamID
,StaffID
,CAST([Timestamp] AS Date);
5) Calculate Staff earnings by: (4) Staff_Effort x (3) team_rate_of_pay
Multiply the individual's effort by the team rate he worked at on the day.
This one is ridiculously slow. In fact, it's useless.
CREATE View staff_earnings AS
SELECT
staff_effort.StaffDate
,staff_effort.StaffID
,sum(staff_effort.StaffEffort) AS StaffEffort
,sum([StaffEffort]*[TeamRate]) AS StaffEarn
FROM staff_effort INNER JOIN team_rate_of_pay
ON (staff_effort.TeamID = team_rate_of_pay.TeamID)
AND (staff_effort.StaffDate = team_rate_of_pay.Date)
Group By
staff_effort.StaffDate,
staff_effort.StaffID;
So you see what I mean.... I need various results and subsequent queries are dependent on those results.
What I tried to do is to write a view for each of the above steps and then just use the view in the next step and so on. They work fine but view nr 3 runs slower than the rest, even though still acceptable. View nr 5 is just ridiculously slow.
I actually have another view after nr.5 which brings hours worked into play as well but that just takes forever to produce a few rows.
I want a single line for each staff member, showing what he earned each day calculated as set out above, with his hours worked each day.
I also tried to reduce the number of views by using sub-queries instead but that took even longer.
A little guidance / direction will be much appreciated.
Thanks in advance.
--EDIT--
Taking the query posted in the comments. Did some formatting, added aliases and a little cleanup it would look like this.
SELECT epd.CompanyID
,epd.DATE
,epd.TeamID
,epd.Earn
,tb.TeamBags
,epd.Earn / tb.TeamBags AS RateperBag
FROM teamEarnPerDay epd
INNER JOIN teamBags tb ON epd.DATE = tb.TeamDate
AND epd.TeamID = tb.TeamID;
I eventually did 2 things:
1) Managed to reduce the nr of nested views by using sub-queries. This did not improve performance by much but it seems simpler with fewer views.
2) The actual improvement was caused by using LEFT JOIN in stead of Inner Join.
The final view ran for 50 minutes with the Inner Join without producing a single row yet.
With LEFT JOIN, it produced all the results in 20 seconds!
Hope this helps someone.
Say I have a database with products and revenue. I know that for the product 'Apple', we have many kinds of appples and roughly 70% of sales are granny smith and 30% are golden delicious.
select
delivery_month_id,
sales_order_id,
product_id,
product_nm,
net_cost_distributed_amt,
from dw.op_sales_order
where delivery_month_id >= 201601
What I have now is
I'm trying to get something like this
I'm assuming I need some case whens and sub queries but not entirely sure how to go about this.
You need a table (or similar row source, e.g., a WITH clause) with the product details. Call it, for example DW.PRODUCT_DETAILS. It should have three columns: PRODUCT_DETAIL_ID, PRODUCT_NM, and ALLOCATION_PCT, where PRODUCT_NM is the name as what appears in your OP_SALES_ORDER table.
Then, you can left join this table into your query to get your desired results:
SELECT so.delivery_month_id,
so.sales_order_id,
so.product_id,
so.product_nm,
so.net_cost_distributed_amt,
so.net_cost_distributed_amt * NVL (pd.allocation_pct,1) rev_revised
FROM dw.op_sales_order so
LEFT JOIN dw.product_details pd ON pd.product_nm = so.product_nm
WHERE so.delivery_month_id >= 201601
With the left join, things like oranges and grapefruits, which do not have any details, will not need to be in the PRODUCT_DETAILS table.
I have seen some close answers and I have been trying to adapt them to Access 2013, but I can't seem to get it to work. I have two queries:
First query returns
original_staff_data
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
It pulls this from tables staff, and salary_by_month and employee_name and number_of_days_at_spec_building (this records where they check in when they work)
transaction_data_by_staff.total
Month
Year
staff_uid
total_revenue
totat_profit
this also pulls information from staff, but sums up over multiple dates in a transaction table creating a cumulative value for each staff_uid so I can't combine the two queries directly.
My problem is I want to create a query that brings results from both. However, not all staff members in Q1 will be in Q2 every day/week/month (vacations, etc) and since I want to ultimately create a final results:
Final_Result
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
total_revenue
total_profit
The SQL:
SELECT
original_staff_data.*
, transaction_data_by_staff.total_rev
, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff
RIGHT JOIN original_staff_data
ON (
transaction_data_by_staff.year = original_staff_data.year
AND transaction_data_by_staff.month = original_staff_data.month
) WHERE transaction_data_by_staff.[staff_uid] = [original_staff_data].[staff_uid];
I would like it if there is no revenue or profit that month from that employee, it makes those values 0. I have tried join (specifically RIGHT join with Q1 as the RIGHT join) and it doesn't seem to work, I still only get the subset. There are originally in the original_staff_data query 750 entries so therefore there should be in the final query 750 entries, I am only getting 252, which is the total in transaction_data_by_staff. Any clue on how the ACCESS 2013 SQL should look?
Thanks
Jon
Move the link by stuff_uid to the ON clause, like this:
SELECT original_staff_data.*, transaction_data_by_staff.total_rev, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff RIGHT JOIN original_staff_data ON (transaction_data_by_staff.year = original_staff_data.year) AND (transaction_data_by_staff.month = original_staff_data.month)
AND (((transaction_data_by_staff.[staff_uid])=[original_staff_data].[staff_uid]));
Im writing a query that sums order values broken down by product groups - problem is that when I add joins the aggregated SUM gets greatly inflated - I assume its because its adding in duplicate rows. Im kinda new to SQL, but I think its because I need to construct the query with sub selects or nested joins?
All data returns as expected, and my joins pull out the needed data, but the SUM(inv.item_total) AS Value returned is much higher that it should be - SQL below
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name,
agents.short_desc, stock_type.short_desc AS Type
FROM SORDER as so
JOIN company AS co ON co.company_id = so.company_id
JOIN invoice AS inv ON inv.Sorder_id = so.Sorder_id
JOIN sorder_item AS soitem ON soitem.sorder_id = so.Sorder_id
JOIN STOCK AS stock ON stock.stock_id = soitem.stock_id
JOIN stock_type AS stock_type ON stock_type.stype_id = stock.stype_id
JOIN AGENTS AS AGENTS ON agents.agent_id = co.agent_id
WHERE
co.last_ordered >'01-JAN-2012' and so.Sotype_id='1'
GROUP BY so.Company_id,co.company_name,agents.short_desc, stock_type.short_desc
Any guidence on how I should structure this query to pull out an "un-duplicated" SUM(inv.item_total) AS Value much appreciated.
To get an accurate sum, you want only the joins that are needed. So, this version should work:
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name
FROM SORDER so JOIN
company co
ON co.company_id = so.company_id JOIN
invoice inv
ON inv.Sorder_id = so.Sorder_id
group by so.Company_id, co.company_name
You can then add in one join at a time to see where the multiplication is taking place. I'm guessing it has to do with the agents.
It sounds like the joins are not accurate.
First suspect join
For example, would an agent be per company, or per invoice?
If it is per order, then should the join be something along the lines of
JOIN AGENTS AS AGENTS ON agents.agent_id = inv.agent_id
Second suspect join
Can one order have many items, and many invoices at the same time? That can cause problems as well. Say an order has 3 items and 3 invoices were sent out. According to your joins, the same item will show up 3 times means a total of 9 line items where there should be only 3. You may need to eliminate the invoices table
Possible way to solve this on your own:
I would remove all the grouping and sums, and see if you can filter by one invoice produce an unique set of rows for all the data.
Start with an invoice that has just one item and inspect your result set for accuracy. If that works, then add another invoice that has multiple and check the rows to see if you get your perfect dataset back. If not, then the columns that have repeating values (Company Name, Item Name, Agent Name, etc) are usually a good starting point for checking up on why the duplicates are showing up.