Joining two tables on SQL Server - sql

I have two SQL Server tables: ORDR (orders) and RDR1 (order's items). I'm trying to create a report which shows:
DocEntry, CardName, DocDueDate: info about the order
pTot: total amount of items in the order
ItemCode: item's code (any of them, only one is needed)
Dscription: item's name
My last attempt was:
SELECT
dbo.ORDR.DocEntry, dbo.ORDR.CardName, dbo.ORDR.DocDueDate,
SUM(dbo.RDR1.Quantity) AS pTot,
dbo.RDR1.ItemCode,
dbo.RDR1.Dscription
FROM
dbo.ORDR
INNER JOIN
dbo.RDR1 ON dbo.ORDR.DocEntry = dbo.RDR1.DocEntry
GROUP BY
dbo.ORDR.DocEntry, dbo.ORDR.CardName, dbo.ORDR.DocDueDate,
dbo.RDR1.ItemCode, dbo.RDR1.Dscription
Items' code/name in one order are very similar so I need only the first RDR1's record associated to that order
I have 2 problems:
I'm getting one row for each RDR1 record
pTot is not summing the amount of items
Can you show me how to join these tables properly?

You could use ROW_NUMBER to get the first RDR1 item for each ORDR and SUM OVER to get the total amount of items.
SELECT
o.DocEntry,
o.CardName,
o.DocDueDate,
r.pTot,
r.ItemCode,
r.Dscription
FROM dbo.ORDR o
INNER JOIN (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY DocEntry ORDER BY ItemCode),
pTot = SUM(Quantity) OVER(PARTITION BY DocEntry)
FROM dbo.RDR1
) r
ON r.DocEntry = o.DocEntry
WHERE r.rn = 1
Additionally, you might want to use meaningful table aliases to improve readability.

Here is my proposed solution.
SELECT
[rowno] = ROW_NUMBER() OVER(PARTITION BY DocEntry ORDER BY ItemCode),
O.DocEntry,
O.CardName,
O.DocDueDate,
SUM(Quantity) AS pTot,
O.ItemCode,
O.Dscription
INTO #TEMP_ORDER
FROM dbo.ORDR O
INNER JOIN dbo.RDR1 R
ON O.DocEntry = dbo.RDR1.DocEntry
GROUP BY O.DocEntry, O.CardName, O.DocDueDate, R.ItemCode, R.Dscription
SELECT
DocEntry,
CardName,
DocDueDate,
pTot,
ItemCode,
Dscription
FROM #TEMP_ORDER
WHERE roWno = 1
DROP TABLE #TEMP_ORDER

Related

Cross Selling Matrix - In Snowflake

I am trying to build a cross selling matrix with the following structure pivoted as seen below where X is the % of frequency in a basket with the other product:
I need to pivot this data in excel or another tool afterwards so I assume the query in Snowflake needs to output tabular dataset ready for pivoting, and I am struggling with its logic.
This is what I have so far:
SELECT FCT.TRANSACTION_ID,
PRD.PRODUCT_TYPE,
COUNT(DISTINCT FCT.PRODUCT_ID),
COUNT(DISTINCT FCT1.PRODUCT_ID)
FROM TRANSACTION_ORDERS FCT
INNER JOIN DIM_PRODUCT PRD ON FCT.PRODUCT_ID = PRD.PRODUCT_ID
LEFT JOIN FACT_TRANSACTION_ORDERS FCT1 ON FCT.TRANSACTION_ID = FCT1.TRANSACTION_ID
AND FCT.PRODUCT_ID != FCT1.PRODUCT_ID
GROUP BY FCT.TRANSACTION_ID, FCT.PRODUCT_ID, FCT1.PRODUCT_ID
Is the joining even correct? Or should I be doing a cross join? Also, how to capture percent frequency of both products in the same basket?
Many thanks!
EDIT: I am trying to capture the frequency of different product types appearing in the same basket.
The values are the same for combinations in both directions. ProductType1 intersection with column ProductType2 is the same value as column Product Type1 row ProductType2.
When in a basket cross analysis they should vary. It is not the same per direction. In other words, baskets with ProductType1 may have ProductType2 X % of the time but baskets with ProductType2 should have ProductType1 with Y% of the time.
You want a self join. I would expect the products to be in the same order, but you seem be using the same transaction. In any case, this is the structure of the query:
WITH TP AS (
SELECT T.*, P.PRODUCT_TYPE
FROM TRANSACTION_ORDERS T JOIN
DIM_PRODUCT P
ON T.PRODUCT_ID = P.PRODUCT_ID
)
SELECT TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE,
COUNT(DISTINCT TP.TRANSACTION_ID) as NUM_ORDERS
FROM TP JOIN
TP TP2
ON TP2.TRANSACTION_ID = TP.TRANSACTION_ID
GROUP BY TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE;
If this were per order, you would just change the ON clause in the outer query to use the order id.
Note that this uses COUNT(DISTINCT) rather than COUNT(*) because a transaction/order could have multiple products of the same type. Presumably, you want that counted only once.
EDIT:
If you want to divide by the number of transactions that have either product type (which makes sense to me), then I would approach this as:
WITH TP AS (
SELECT DISTINCT T.TRANSACTION_ID, P.PRODUCT_TYPE
FROM TRANSACTION_ORDERS T JOIN
DIM_PRODUCT P
ON T.PRODUCT_ID = P.PRODUCT_ID
)
SELECT TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE,
COUNT(*) as NUM_ORDERS,
( MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP.PRODUCT_TYPE) +
MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP2.PRODUCT_TYPE) -
COUNT(*)
) as Num_Orders_Either,
( COUNT(*) * 1.0 /
( MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP.PRODUCT_TYPE) +
MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP2.PRODUCT_TYPE) -
COUNT(*)
) as ratio
FROM TP JOIN
TP TP2
ON TP2.TRANSACTION_ID = TP.TRANSACTION_ID
GROUP BY TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE;
This calculates the total orders containing both products using the sum of the orders with either product minus the number with both.

Returning ID's from two other tables or null if no IDs found using using a left join SQL Server

I am wondering if someone could hep me. I am trying to make a join on two tables and return an id if an id is there but if there is no id return null but still return the row for that product and not ignore it. My query below returns twice the amount the records to which I can not figure out why.
SELECT
T2.ProductID, FirstChild.SupplierID, SecondChild.AccountID
FROM
Products T2
LEFT OUTER JOIN
(
SELECT TOP(1) SupplierID, Reference,CompanyID, Row_Number() OVER (Partition By SupplierID Order By SupplierID) AS RowNo FROM Suppliers
)
FirstChild ON T2.SupplierReference = FirstChild.Reference AND RowNo = 1AND FirstChild.CompanyID =T2.CompanyID
LEFT OUTER JOIN
(
SELECT TOP(1) AccountID, SageKey,CompanyID, Row_Number() OVER (Partition By AccountID Order By AccountID) AS RowNo2 FROM Accounts
)
SecondChild ON T2.ProductAccountReference = SecondChild.Reference AND RowNo2 = 1 AND SecondChild.CompanyID =T2.CompanyID
Example of what I am trying to do
ProductID SupplierID AccountID
1 5 2
2 6 NULL
3 NULL NULL
OUTER APPLY and ditching the ROW_NUMBER Seems like a better choice here:
SELECT
p.ProductId
,FirstChild.SupplierId
,SecondChild.AccountId
FROM
Products p
OUTER APPLY (SELECT TOP (1) s.SupplierId
FROM
Suppliers s
WHERE
p.SupplierReference = s.SupplierReference
AND p.CompanyId = s.CompanyId
ORDER BY
s.SupplierId
) FirstChild
OUTER APPLY (SELECT TOP (1) a.AccountId
FROM
Accounts
WHERE
p.ProductAccountReference = a.Reference
AND p.CompanyId = a.CompanyId
ORDER BY
a.AccountID
) SecondChild
The way your query is written above there is no correlation for the derived tables. Which means you would always get what ever SupplierId SQL chooses based on optimization and if that doesn't happen to always be Row1 you wont get the value. You need to relate your Table and select top 1, adding an ORDER BY in your derived table is like identifying the row number you want.
If it's just showing duplicate records, wouldn't an inelegant solution just be to add distinct in the select line?

Find MAX with JOIN where Field also shows up in another Table

I have 3 tables: Master, Paper and iCodes. For a certain set of Master.Ref's, I need to find Max(Paper.Date), where the Paper.Code is also in the iCodes table (i.e., Paper.Code is a type of iCode). Master is joined to Paper by the File field.
EDIT:
I only need the Max(Paper.Date) its corresponding Code; I do not need all of the Codes.
I wrote the following but it is very slow. I have a few hundred ref #'s to look for. What is a better way to do this?
SELECT Master.Ref,
Paper.Code,
mp.MaxDate
FROM ( SELECT p.File ,
MAX(p.Date) AS MaxDate ,
FROM Paper AS p
LEFT JOIN Master AS m ON p.File = m.File
WHERE m.Ref IN ('ref1', 'ref2', 'ref3', 'ref4', 'ref5', 'ref6'... )
AND p.Code IN ( SELECT DISTINCT i.iCode
FROM iCodes AS i
)
GROUP BY p.File
) AS mp
LEFT JOIN Master ON mp.File = Master.File
LEFT JOIN Paper ON Master.File = Paper.File
AND mp.MaxDate = Paper.Date
WHERE Paper.Code IN ( SELECT DISTINCT iCodes.iCode
FROM iCodes
)
Does this do what you want?
SELECT m.Ref, p.Code, max(p.date)
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
GROUP BY m.Ref, p.Code;
EDIT:
To get the code on the max date, then use window functions:
select ref, code, date
from (SELECT m.Ref, p.Code, p.date
row_number() over (partition by m.Ref order by p.date desc) as seqnum
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
) mp
where seqnum = 1;
The function row_number() assigns a sequential number starting at 1 to a group of rows. The groups are defined by the partition by clause, so in this case everything with the same m.Ref value would be in a single group. Within the group, rows are assigned the number based on the order by clause. So, the one with the biggest date gets the value of 1. That is the row you want.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

SQL Selecting Distinct Count of items where 2 conditions are met

I am struggling to get a DISTINCT COUNT working with SQL DISTINCT SELECT
Not sure if I should even be using distinct here, but I have got it correct using a subquery, though it is very heavy processing wise.
This query does what I ultimately want results wise (without the weight)
SELECT DISTINCT
product_brandNAME,
product_classNAME,
(SELECT COUNT(productID) FROM products
WHERE products.product_classID = product_class.product_classID
AND products.product_brandID = product_brand.product_brandID) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
This gets close, and is much more efficient, but I can't get the count working, it only counts (obviously) the distinct count which is 1.
SELECT DISTINCT product_brandNAME, product_classNAME, COUNT(*) as COUNT
FROM products
JOIN product_brand
JOIN product_class
ON products.product_brandID = product_brand.product_brandID
AND products.product_classID = product_class.product_classID
GROUP BY productID
ORDER BY product_brandNAME
Any suggestions, I'm sure its small, and have been researching the net for hours for an answer to no avail for 2 conditions to match.
Thanks,
Have you tried following query
Edit
SELECT product_brandNAME
, product_classNAME
, COUNT(*)
FROM products
JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY
product_brandNAME
, product_classNAME
When using GROUP BY you do not need to use a DISTINCT clause. Try the following:
SELECT productID,
product_brandNAME,
product_classNAME,
COUNT(*) as COUNT
FROM products JOIN product_brand ON products.product_brandID = product_brand.product_brandID
JOIN product_class ON products.product_classID = product_class.product_classID
GROUP BY productID,
product_brandNAME,
product_classNAME
ORDER BY product_brandNAME