Data Mart - how to handle one to many relation? - sql

I have a following situation that I am not sure how to handle:
There is a table Inovice_Item, Service and ServiceLang. Invoice_item table has FK_Service key (one to one). Service table has FK_Service_Lang key. ServiceLang table has FK_Service key so it makes it many to many relation.
In other words, Invoice_Item can have multiple ServiceLang records, which means that when I make a join query, invoice_item records get duplicated. What are the options to handle such situations?
I would like to have ServiceLang dimension in the cube, but I am not sure how to handle duplicates caused by join.
EDIT
I've made an example:
The queries are as following:
-- One lang for service A, two langs for service B
select * from ServiceLang
-- Two records: A and B
select * from Service
-- Total amount is 20
select * from InvoiceItem
-- Query to populate Fact table
-- Total amount is 30
select *
from InvoiceItem II
inner join Service S on II.FK_Service = S.PK_Service
inner join ServiceLang SL on S.PK_Service = SL.FK_Service
So, if there are two Service_Lang records related to one service than there is a duplicate row meaning that total services amount would be 30 but it should be 20. So, my question is how to handle these situations?

From the description you are mistaken. Each Invoice_Item has one and only one Service and each Service has one and only one Service_Lang. However each Service_Lang has many Service records and each Service has many Invoice_Item records
The relationships are
Invoice (n) <- (1) Service (n) <- (1) Service_Item
Thus the JOIN would be
Select Invoice_Item.*, Service_Lang.WhateverColumnYouWant
From Invoice_Item
Inner Join Service On Service.Key = Invoice_Item.FK_Service_Key
Inner Join Service_Lang On Service_Lang.Key = Service.FK_Service_Lang_Key
Edit: So the Service table does not have a FK_Service_Lang key on it, in which case you can only select one of the possible values for languages associated with the service. You could select the Min, the Max or some derivation based upon your preferred language, some examples...
Select InvoiceItem.*,
Case When Exists (Select 1 From ServiceLang
Where ServiceLang.FK_Service = InvoiceItem.FK_Service
And ServiceLang.Name = 'English')
Then 'English'
Else (Select Max(Name) From ServiceLang
Where ServiceLang.FK_Service = InvoiceItem.FK_Service)
End As ServiceLanguage,
(Select Max(Name) From ServiceLang
Where ServiceLang.FK_Service = InvoiceItem.FK_Service) As MaxLanguage,
(Select Min(Name) From ServiceLang
Where ServiceLang.FK_Service = InvoiceItem.FK_Service) As MinLanguage
From InvoiceItem
I've no idea how big your ServiceLang table is but good practice would be ensure there is a key on the FK_Service column

Related

Explain what means to join same table twice

I was preparing for exam and I have this exercise that I don't understand
I have table of Clients that have ClientID,
and also I have table of Transactions that have Foreign Key referenced to Clients, SenderID and RecieverID (refering to ClientID)
I need to create view that will show Transactions with Sender name and Reciever Name, and I did it but I don't understand how it works and why
Here is code:
SELECT CS.Name [SenderName], CR.Name [RecieverName]
FROM Transactions T
INNER JOIN Clients CS
ON CS.ClientID = T.SenderID
INNER JOIN Clients CR
ON CR.ClientID = T.RecieverID
Each time you need the name (for sender or recevier ) you need a relation based on the key between the the transaction table and the clients table
you need the name of the sender ( first join with Clients ) and the name for recevier ( second join with Clients )
for avoid confusion between the two (same name) table you need an alias that let you join the specific related tabe .. you use CS and CR as table nale alias
in this way is as you work with two differente table name (or with a logical duplication of the same table)
SELECT CS.Name [SenderName], CR.Name [RecieverName]
FROM Transactions T
INNER JOIN Clients CS ON CS.ClientID = T.SenderID
INNER JOIN Clients CR ON CR.ClientID = T.RecieverID
You can thinks at the table content as a set of data ..so you use two time the same set of data extracting the row mathcing your relation each time.
Each row in the table Transactions contains:
a SenderID which points to a row in the table Clients and
a RecieverID which points to another row in the table Clients.
So you must make one join of Transactions to Clients using SenderID to get the sender's name and another join to Clients using RecieverID to get the reciever's name.

exists(A) and not exists(negA) vs custom aggregation

Many times, I have to select the customers that have made {criteria set A} of transactions and not any OTHER type of transactions. Sample data:
create table customer (name nvarchar(max))
insert customer values
('George'),
('Jack'),
('Leopold'),
('Averel')
create table trn (id int,customer nvarchar(max),product char(1))
insert trn values
(1,'George','A'),
(2,'George','B'),
(3,'Jack','B'),
(4,'Leopold','A')
Let's say we want to find all customers who bought product 'A' and not anything else (in this case, B).
The most typical way to do this includes joining the transaction table with itself:
select * from customer c
where exists(select 1 from trn p where p.customer=c.name and product='A')
and not exists(select 1 from trn n where n.customer=c.name and product='B')
I was wondering if there is a better way to do this. Keep in mind that the transaction table should typically be huge.
What about this alternative:
select * from customer c
where exists
(
select 1
from trn p
where p.customer=c.name
group by p.customer
having max(case when product='B' then 2 when product='A' then 1 else 0 end)=1
)
Will the fact that the transaction table is used only once offset the aggregation calculation needed?
You need to test performance on your data. If you have an index on trn(customer, product), then the exists would generally have very reasonable performance.
This is particularly true when you are using the customers table.
How well does the aggregation version compare? First, the best aggregation would be:
select customer
from trn
where product in ('a', 'b')
group by customer
having min(product) = 'a' and max(product) = 'b';
If you have an index on product -- and there are lots of products (or few customers that have "a" and "b"), then this can be faster than the not exists version.
In general, I advocate using the group by, even though its performance is not always best on a couple of products. Why?
The use of the having clause is quite flexible for handling all different "set-within-set" conditions.
Adding additional conditions doesn't have a large effect on performance.
If you are not using a customer table but instead doing something like (select distinct customer from trn), then the exists/not exists version is likely to be more expensive.
That said, I advocate using group by and having because it is more flexible. That means that under the right circumstances, other solutions should be used.
You could try the following statement. It may be faster than your statements under certain circumstances, since it will always determine first the customers with product A transactions and then looks only for these customers if there are transactions for other products. If there is really a benefit at all depends on the data and indexes of your real tables, so you have to try.
WITH customerA AS (SELECT DISTINCT customer FROM trn WHERE product = 'A')
SELECT DISTINCT customer.*
FROM customerA JOIN customer ON customerA.customer = customer.name
WHERE not exists(select 1 from trn n where n.customer = customerA.customer and
product <> 'A')

Get row that matches all multiple rows from another table

I have 2 tables:
One table contains customer ids and a service id that the customer subscribes to.
The second table contains the service id's and a service description for all types of services.
What I am trying to do is print out all the customer ids from the first table that have at least 5 matching unique services. Here is what I came up with but its super hacky:
select * from customers left join services where customer.serviceid = services.sid group by servicesid having count(servicesid) >= 5;
is there a better way of doing this?
Assuming that you have a properly formed database, then any value of serviceid should be valid.
If you want matching customers, use group by:
select c.customerid
from customers c
group by c.customerid
having count(servicesid) >= 5;
If there could be duplicates in the customers table, then use count(distinct servicesid) >= 5.

SQL Query Add to total sum if.. else dont add

How Do I achieve this result ? What I need is calculate total cost of a Product when a product is made up of components. New for me, I should add a 100$ to total cost if customer chooses for a service called Delivery.
This is what I have tried so far.
Select Sum(Component.Cost*ProductComponent.Quantity) as TotCost from ProductComponent Left Join Component on ProductComponent.ComponentId = Component.ComponentId
I Guess this will get me total cost of a product.
Now There is another table Service which has a many to many relationship with Order. Order has a many to many relationship with Service. What I need is I need to add another 100$ in total cost if there is 'deliverly' used in service.
I have attached an ER diagram of my database structure. I hope my question is clear.
The basic idea is that you'd have to put a case statement in there to add the $100 if there is a record in the Service table for a given order. The below query should get you most of the way there, but looking at the cardinality of your relationships you may need to group the results or use subqueries to chop it down to one row.
SELECT CASE WHEN sa.ServiceID IS NOT NULL THEN SUM(Component.Cost*ProductComponent.Quantity) + 100
ELSE SUM(Component.Cost*ProductComponent.Quantity) END AS TotCost
FROM ProductComponent pc
LEFT JOIN Component on ProductComponent.ComponentId = Component.ComponentId
JOIN OrderLine o ON o.ProductID = pc.ProductID
JOIN StaffAssignment sa ON sa.SaleID = o.SaleID
It looks like multiple StaffAssignments could give Service for one order so that's where you might want to use a subquery like SELECT Top 1 ServiceID FROM StaffAssignment WHERE SaleID = o.SaleID AND ServiceID IS NOT NULL
I don't have time to test this at the moment but hopefully it gives you some ideas to get this solved.
You can see the service as another product and add it to the ones already there with a UNION
SELECT TotCost = SUM(LineCost)
FROM (SELECT c.Cost * pc.Quantity as LineCost
FROM OrderLine ol
INNER JOIN ProductComponent p ON ol.ProductID = pc.ProductID
LEFT JOIN Component c on pc.ComponentId = c.ComponentId
Where ol.SaleID = #ID
UNION ALL
SELECT 100
FROM StaffAssignment sa
INNER JOIN Service s ON sa.ServiceID = s.ServiceID
Where Name = 'Delivery'
And sa.SaleID = #ID) a
Adding a field to the Services table would be useful, if the services have a flat values, to avoid to have magic constants in your code/queries

Some SQL Questions

I have been using SQL for years, but have mostly been using the query designer within SQL Studio (etc.) to put together my queries. I've recently found some time to actually "learn" what everything is doing and have set myself the following fairly simple tasks. Before I begin, I'd like to ask the SOF community their thoughts on the questions, possible answers and any tips they may have.
The questions are;
Find all records w/ a duplicate in a particular column (e.g. a linking id is in more than 1 record throughout table)
SUM price from a linked table within the same query (select within a select?)
Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Copy data from one table to another based on SELECT and WHERE criteria
Input welcomed & appreciated.
Chris
I recommend that you start by following some tutorials on this topic. Your questions are not uncommon questions for someone moving from a beginner to intermediate level in SQL. SQLZoo is an excellent resource for learning SQL so consider following that.
In response to your questions:
1) Find all records with a duplicate in a particular column
There are two steps here: find duplicate records and select those records. To find the duplicate records you should be doing something along the lines of:
select possible_duplicate_field, count(*)
from table
group by possible_duplicate_field
having count(*) > 1
What we're doing here is selecting everything from a table, then grouping it by the field we want to check for duplicates. The count function then gives me a count of the number of items within that group. The HAVING clause indicates that we want to filter AFTER the grouping to only show the groups which have more than one entry.
This is all fine in itself but it doesn't give you the actual records that have those values on them. If you knew the duplicate values then you'd write this:
select * from table where possible_duplicate_field = 'known_duplicate_value'
We can use the SELECT within a select to get a list of the matches:
select *
from table
where possible_duplicate_field in (
select possible_duplicate_field
from table
group by possible_duplicate_field
having count(*) > 1
)
2) SUM price from a linked table within the same query
This is a simple JOIN between two tables with a SUM of the two:
select sum(tableA.X + tableB.Y)
from tableA
join tableB on tableA.keyA = tableB.keyB
What you're doing here is joining two tables together where those two tables are linked by a key field. In this case, this is a natural join which operates as you would expect (i.e. get me everything from the left table which has a matching record in the right table).
3) Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Consider two tables A and B. The concept of "LEFT" and "RIGHT" in this case are slightly clearer if you read your SQL from left to right. So, when I say:
select x from A join B ...
The left table is "A" and the right table is "B". Now, when you explicitly say "LEFT" the SQL statement you are declaring which of the two tables you are joining is the primary table. What I mean by this is: Which table do I scan through first? Incidentally, if you omit the LEFT or RIGHT, then SQL implicitly uses LEFT.
For INNER and OUTER you are declaring what to do when matches don't exist in one of the tables. INNER declares that you want everything in the primary table (as declared using LEFT or RIGHT) where there is a matching record in the secondary table. Hence, if the primary table contains keys "X", "Y" and "Z", and the secondary table contains keys "X" and "Z", then an INNER will only return "X" and "Z" records from the two tables.
When OUTER is used, we're saying: Give me everything from the primary table and anything that matches from the secondary table. Hence, in the previous example, we'd get "X", "Y" and "Z" records in the output record set. However, there would be NULLs in the fields which should have come from the secondary table for key value "Y" as it doesn't exist in the secondary table.
4) Copy data from one table to another based on SELECT and WHERE criteria
This is pretty trivial and I'm surprised you've never encountered it. It's a simple nested SELECT in an INSERT statement (this may not be supported by your database - if not, try the next option):
insert into new_table select * from old_table where x = y
This assumes the tables have the same structure. If you have different structures then you'll need to specify the columns:
insert into new_table (list, of, fields)
select list, of, fields from old_table where x = y
Let's say you have 2 tables named :
[OrderLine] with the columns [Id, OrderId, ProductId, Qty, Status]
[Product] with [Id, Name, Price]
1) all orderline of command having more than 1 line (it's technically the same as looking for duplicates on OrderId :) :
select OrderId, count(*)
from OrderLine
group by OrderId
having count(*) > 1
2) total price for all order line of the order 1000
select sum(p.Price * ol.Qty) as Price
from OrderLine ol
inner join Product p on ol.ProductId = p.Id
where ol.OrderId = 1000
3) difference between joins:
a inner join b => take all a that has a match with b. if b is not found, a will be not be returned
a left join b => take all a, match them with b, include a even if b is not found
a righ join b => b left join a
a outer join b => (a left join b) union ( a right join b)
4) copy order lines to a history table :
insert into OrderLinesHistory
(CopiedOn, OrderLineId, OrderId, ProductId, Qty)
select
getDate(), Id, OrderId, ProductId, Qty
from
OrderLine
where
status = 'Closed'
To answer #4 and to perhaps show at least some understanding of SQL and the fact this isn't HW, just me trying to learn best practise;
SET NOCOUNT ON;
DECLARE #rc int
if #what = 1
BEGIN
select id from color_mapper where product = #productid and color = #colorid;
select #rc = ##rowcount
if #rc = 0
BEGIN
exec doSavingSPROC #colorid, #productid;
END
END
END