Joining invoices and shipping records based on partial info

Joining invoices and shipping records based on partial info - sql

I hope this is just a case of me not knowing the terminology to search for, but I haven't found any hint of how to solve this yet.
I am trying to join two tables (invoices and shipping records) where some of the info is missing. In particular the account code and order number which I would usually use to join on.
Given that each order is fairly unique in the exact mix of products and quantities I am hoping it is possible to join the tables by comparing the composition of the orders.
For example given the data below it ought to be possible to identify that the shipping record for order_ref A1 is related to invoice_num 500 as it contains the same products in exactly the same quantities.
shipping_id | order_ref | product | quantity
-------------|-----------|---------|----------
100 | A1 | Apple | 1
101 | A1 | Banana | 1
102 | A1 | Carrot | 2
invoice_num | line_num | product | quantity
-------------|----------|---------|----------
500 | 1 | Apple | 1
500 | 2 | Banana | 1
500 | 3 | Carrot | 2
501 | 1 | Apple | 10
501 | 2 | Banana | 1
501 | 3 | Carrot | 2

You can create a key for each group, and join with this key.
In your sample, Apple_1_Banana_1_Carrot_2_ key will create for order_ref = "A1" of shipping and invoice_num = "500" of invoice.
DECLARE #shipping TABLE (shipping_id INT, order_ref VARCHAR(10), product VARCHAR(10), quantity INT)
INSERT INTO #shipping VALUES
(100 , 'A1', 'Apple', 1),
(101 , 'A1', 'Banana', 1),
(102 , 'A1', 'Carrot', 2)
DECLARE #invoice TABLE (invoice_num INT, line_num INT, product VARCHAR(10), quantity INT)
INSERT INTO #invoice VALUES
(500, 1 ,'Apple', 1 ),
(500, 2 ,'Banana', 1 ),
(500, 3 ,'Carrot', 2 ),
(501, 1 ,'Apple', 10 ),
(501, 2 ,'Banana', 1 ),
(501, 3 ,'Carrot', 2 )
SELECT * FROM (
SELECT * FROM #shipping s
CROSS APPLY(SELECT product + '_' + CAST(quantity AS varchar(10)) + '_'
FROM #shipping s2 WHERE s.order_ref = s2.order_ref
ORDER BY product , quantity FOR XML PATH('')) X(group_key)
) A
INNER JOIN
(SELECT * FROM #invoice i
CROSS APPLY(SELECT product + '_' + CAST(quantity AS varchar(10)) + '_'
FROM #invoice i2 WHERE i.invoice_num = i2.invoice_num
ORDER BY product , quantity FOR XML PATH('')) X(group_key)
)B ON A.group_key = B.group_key
AND A.product = B.product
AND A.quantity = B.quantity
Result:
shipping_id order_ref product quantity line_num invoice_num line_num product quantity
----------- ---------- ---------- ----------- -------------------- ----------- ----------- ---------- -----------
100 A1 Apple 1 1 500 1 Apple 1
101 A1 Banana 1 2 500 2 Banana 1
102 A1 Carrot 2 3 500 3 Carrot 2

join on product and quantity instead:
select table_a.*, table_b.*
from table_a
join table_b on table_a.product = table_b.product
and table_a.quantity = table_b.quantity

I don't think there is a proper SQL way to join like this but you could do something like the following:
SELECT order_ref, STRING_AGG(product + quantity, '_') as product_list
FROM
(SELECT * FROM shipping_records ORDER BY product) AS inner_shipping_records
GROUP BY
order_ref
and then
SELECT invoice_num, STRING_AGG(product + quantity, '_') as product_list
FROM
(SELECT * FROM invoices ORDER BY product) AS inner_invoices
GROUP BY
invoice_num
and then do your join on the product_list fields:
SELECT * FROM
( SELECT order_ref, STRING_AGG(.... ) as a_products JOIN
( SELECT invoice_num, STRING_AGG(.... ) as a_shipping_records
ON a_products.product_list = a_shipping_records.product_list
I haven't tested this on SQL Server but it should work. I don't think it would be fast but you could work out some kind of functional index or views that could speed this up.

Related

Using join to include null values in same table

Following is my table structure.
AttributeMaster - This table is a master attribute table that will be available for each and every request.
AttrMasterId
AttrName
1
Expense Items
2
Business Reason
AttributeValue - When the user fills the data from grid, if a column is empty we don't store its value in the database.
For each request, there are multiple line items (TaskId). Every task must have attributes from attribute master. Now, if the user doesn't an attribute, then we don't store it in the database.
AttrValId
RequestId
TaskId
AttrMasterId
AttrValue
RecordStatus
1
200
1
1
Furniture
A
2
200
2
1
Infra
A
3
200
2
2
Relocation
A
In the above scenario, for request 200, for task Id - 1, I only have value for one attribute.
For task Id - 2, I have both attributes filled.
The query result should give me 4 rows, 2 for each task ID, with null placeholders in AttrValue column.
select * from AttributeMaster cam
left join AttributeValue cav on cam.AttrMasterId = cav.AttrMasterId
and cav.requestId = 36498 and cav.recordStatus = 'A'
right outer join (select distinct AttrMasterId from attrValue cav1 where cav1.requestId = 36498 ) ctI on cti.AttrMasterId = cav.AttrMasterId;
So far, I've tried different joins, tried to self join attribute value table as above, still no results to fill the empty rows.
Any help or pointers would be appreciated. Thanks.
Edit 1:
Expected Output is as follows:
RequestId
TaskId
AttrMasterId
AttrValue
RecordStatus
200
1
1
Furniture
A
200
1
2
NULL
NULL
200
2
1
Infra
A
200
2
2
Relocation
A

Working Fiddle for SQL Server
Since there really should be a Task table, I added that as a CTE term in the first solution. The second form just uses your existing tables directly, with the same result.
WITH Task (TaskId) AS (
SELECT DISTINCT TaskId FROM AttributeValue
)
, pairs (TaskId, AttrMasterId) AS (
SELECT Task.TaskId, AttributeMaster.AttrMasterId
FROM AttributeMaster CROSS JOIN Task
)
SELECT pairs.*
, AttributeMaster.*
, cav.*
FROM pairs
JOIN AttributeMaster
ON pairs.AttrMasterId = AttributeMaster.AttrMasterId
LEFT JOIN AttributeValue AS cav
ON pairs.AttrMasterId = cav.AttrMasterId AND pairs.TaskId = cav.TaskId
AND cav.requestId = 200 AND cav.recordStatus = 'A'
ORDER BY pairs.TaskId, pairs.AttrMasterId
;
+--------+--------------+--------------+-----------------+-----------+-----------+--------+--------------+------------+--------------+
| TaskId | AttrMasterId | AttrMasterId | AttrName | AttrValId | RequestId | TaskId | AttrMasterId | AttrValue | RecordStatus |
+--------+--------------+--------------+-----------------+-----------+-----------+--------+--------------+------------+--------------+
| 1 | 1 | 1 | Expense Items | 1 | 200 | 1 | 1 | Furniture | A |
| 1 | 2 | 2 | Business Reason | NULL | NULL | NULL | NULL | NULL | NULL |
| 2 | 1 | 1 | Expense Items | 2 | 200 | 2 | 1 | Infra | A |
| 2 | 2 | 2 | Business Reason | 3 | 200 | 2 | 2 | Relocation | A |
+--------+--------------+--------------+-----------------+-----------+-----------+--------+--------------+------------+--------------+
The second form is without the added Task CTE term...
WITH pairs AS (
SELECT DISTINCT AttributeValue.TaskId, AttributeMaster.AttrMasterId
FROM AttributeMaster CROSS JOIN AttributeValue
)
SELECT pairs.*
, AttributeMaster.*
, cav.*
FROM pairs
JOIN AttributeMaster
ON pairs.AttrMasterId = AttributeMaster.AttrMasterId
LEFT JOIN AttributeValue AS cav
ON pairs.AttrMasterId = cav.AttrMasterId AND pairs.TaskId = cav.TaskId
AND cav.requestId = 200 AND cav.recordStatus = 'A'
ORDER BY pairs.TaskId, pairs.AttrMasterId
;

Here is another solution that does not require a CTE.
This also uses the TaskID like #jon-armstrong 's answer
declare #AttributeMaster table (MasterID int, Name varchar(50))
declare #AttributeValues table (ValueID int, RequestID int, TaskID int, MasterID int, Value varchar(50), Status varchar(1))
insert into #AttributeMaster (MasterID, Name)
values (1, 'Expense'), (2, 'Business')
insert into #AttributeValues (ValueID, RequestID, TaskID, MasterID, Value, Status)
values (1, 200, 1, 1, 'Furniture', 'A'),
(2, 200, 2, 1, 'Infra', 'A'),
(3, 200, 2, 2, 'Relocation', 'A')
select t.RequestID, t.TaskID, t.MasterID, v.Value, v.Status, t.Name
from ( select distinct m.MasterID, v.TaskID, v.RequestID, m.Name
from #AttributeMaster m cross join #AttributeValues v
) t
left join #AttributeValues v on v.MasterID = t.MasterID and v.TaskID = t.TaskID
and v.RequestID = 200 and v.Status = 'A'
order by t.TaskID, t.MasterID
the result is
RequestID TaskID MasterID Value Status Name
200 1 1 Furniture A Expense
200 1 2 NULL NULL Business
200 2 1 Infra A Expense
200 2 2 Relocation A Business

Sql query to partition and sum the records grouping by their bill number and Product code

Below are two tables where there are parent bill number like 1, 4 and 8. These parents bill references to nothing/NULL values. They are referenced by one or more child bill number. For eg parent bill 1 is referenced by child bill 2, 3 and 6.
Table B also has the bill no column with prod code with actual service (ST values) and associated service values (SV). SV are the additional cost to ST.
Same ST may occur in multiple bill numbers. Here Bill number is only unique.
For eg, ST1 are in bill number 1 and 8. Also same SV may reference same or different ST.
SV1, SV2 and SV3 are referencing to ST1 corresponding to bill no. 1 and SV2 and SV4 are referencing to ST2 corresponding to bill no.2.
How can we get below expected output?
Table A:
| bill no | ref |
+----------------------------------------+
| 1 | |
| 2 | 1 |
| 3 | 1 |
| 4 | |
| 5 | 4 |
| 6 | 1 |
| 7 | 4 |
| 8 | |
| 9 | 8 |
Table B:
| bill no | Prod code | cost |
+-----------------------------------------------------+
| 1 | ST1 | 10
| 2 | SV1 | 20
| 3 | SV2 | 30
| 4 | ST2 | 10
| 5 | SV2 | 20
| 6 | SV3 | 30
| 7 | SV4 | 40
| 8 | ST1 | 50
| 9 | SV1 | 10
Expected output:
| bill no | Prod code | ST_cost | SV1 | SV2 | SV3 |
+---------------------------------------------------------------------------------------------+
| 1 | ST1 | 10 | 20 | 30 | 30 |
| 4 | ST2 | 10 | 20 | 40 | |
| 8 | ST1 | 50 | 10 | | |

Here's a script that should get you there:
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.TableA;
CREATE TABLE dbo.TableA
(
BillNumber int NOT NULL PRIMARY KEY,
Reference int NULL
);
GO
INSERT dbo.TableA (BillNumber, Reference)
SELECT *
FROM (VALUES (1,NULL),
(2,1),
(3,1),
(4,NULL),
(5,4),
(6,1),
(7,4),
(8,NULL),
(9,8)) AS a(BillNumber, Reference);
GO
DROP TABLE IF EXISTS dbo.TableB;
CREATE TABLE dbo.TableB
(
BillNumber int NOT NULL PRIMARY KEY,
ProductCode varchar(10) NOT NULL,
Cost int NOT NULL
);
GO
INSERT dbo.TableB (BillNumber, ProductCode, Cost)
SELECT BillNumber, ProductCode, Cost
FROM (VALUES (1, 'ST1', 10),
(2, 'SV1', 20),
(3, 'SV2', 30),
(4, 'ST2', 10),
(5, 'SV2', 20),
(6, 'SV3', 30),
(7, 'SV4', 40),
(8, 'ST1', 50),
(9, 'SV1', 10)) AS b(BillNumber, ProductCode, Cost);
GO
WITH ParentBills
AS
(
SELECT b.BillNumber, b.ProductCode, b.Cost AS STCost
FROM dbo.TableB AS b
INNER JOIN dbo.TableA AS a
ON b.BillNumber = a.BillNumber
WHERE a.Reference IS NULL
),
SubBills
AS
(
SELECT pb.BillNumber, pb.ProductCode, pb.STCost,
b.ProductCode AS ChildProduct, b.Cost AS ChildCost
FROM ParentBills AS pb
INNER JOIN dbo.TableA AS a
ON a.Reference = pb.BillNumber
INNER JOIN dbo.TableB AS b
ON b.BillNumber = a.BillNumber
)
SELECT sb.BillNumber, sb.ProductCode, sb.STCost,
MAX(CASE WHEN sb.ChildProduct = 'SV1' THEN sb.ChildCost END) AS [SV1],
MAX(CASE WHEN sb.ChildProduct = 'SV2' THEN sb.ChildCost END) AS [SV2],
MAX(CASE WHEN sb.ChildProduct = 'SV3' THEN sb.ChildCost END) AS [SV3]
FROM SubBills AS sb
GROUP BY sb.BillNumber, sb.ProductCode, sb.STCost
ORDER BY sb.BillNumber;

You could write a function that creates you query based on your SV number.
And use "Execute Immediate" to execute the Query String and then "PIPE ROW" to generate the result.
Check This PIPE ROW EXAMPLE

I don't understand where the "SV1" value comes from on the second row.
But your problem is basically conditional aggregation:
with ab as (
select a.*, b.productcode, b.cost,
coalesce(a.reference, a.billnumber) as parent_billnumber
from a join
b
on b.billnumber = a.billnumber
)
select parent_billnumber,
max(case when reference is null then productcode end) as st,
sum(case when reference is null then cost end) as st_cost,
sum(case when productcode = 'SV1' then cost end) as sv1,
sum(case when productcode = 'SV2' then cost end) as sv2,
sum(case when productcode = 'SV3' then cost end) as sv3
from ab
group by parent_billnumber
order by parent_billnumber;
Here is a db<>fiddle.
Note this works because you have only one level of child relationships. If there are more, then recursive CTEs are needed. I would recommend that you ask a new question if this is possible.
The CTE doesn't actually add much to the query, so you can also write:
select coalesce(a.reference, a.billnumber) as parent_billnumber ,
max(case when a.reference is null then productcode end) as st,
sum(case when a.reference is null then b.cost end) as st_cost,
sum(case when b.productcode = 'SV1' then b.cost end) as sv1,
sum(case when b.productcode = 'SV2' then b.cost end) as sv2,
sum(case when b.productcode = 'SV3' then b.cost end) as sv3
from a join
b
on b.billnumber = a.billnumber
group by coalesce(a.reference, a.billnumber)
order by parent_billnumber;

Subquery with every value in group-by

I'm having problem selecting multiple values generated by a GROUP BY clause.
I try to set up a simplified example of what I have:
Table CUSTOMERS Table PRODUCTS Table ORDERS
ID | NAME ID | DESCR | PROMO ID_P | ID_C
---+--------- ---+-------+------- -----+-----
1 | Alice 1 | prod1 | gold 1 | 1
2 | Bob 2 | prod2 | gold 2 | 3
3 | Charlie 3 | prod3 | silver 1 | 2
4 | prod4 | silver 3 | 1
From this I'd lik to join every product and every customer in a single cell
Results
PROMO | products | CUSTOMERS
-------+--------------+--------------------
gold | prod1, prod2 | Alice, Bob, Charlie
silver | prod3 | Alice
Something like:
SELECT PRODUCTS.PROMO
, CONCAT(PRODUCTS.DESCR)
, STUFF(
(SELECT ' / ' + CUSTOMERS.NAME
FROM CUSTOMERS
WHERE CUSTOMERS.ID = ORDERS.ID
FOR XML PATH (''))
, 1, 3, '')
FROM PRODUCTS
INNER JOIN ORDERS
ON PRODUCTS.ID = ORDERS.ID_P
WHERE PRODUCTS.ID < 3
GROUP BY PRODUCTS.PROMO
Can this be achieved in SQL?

You can use this.
DECLARE #CUSTOMERS Table(ID INT, NAME VARCHAR(20))
INSERT INTO #CUSTOMERS VALUES
( 1 ,'Alice'),
( 2 ,'Bob'),
( 3 ,'Charlie')
DECLARE #PRODUCTS TABLE (ID INT, DESCR VARCHAR(10), PROMO VARCHAR(10))
INSERT INTO #PRODUCTS VALUES
( 1 ,'prod1','gold'),
( 2 ,'prod2','gold'),
( 3 ,'prod3','silver'),
( 4 ,'prod4','silver')
DECLARE #ORDERS TABLE ( ID_P INT, ID_C INT)
INSERT INTO #ORDERS VALUES
( 1 ,1 ),
( 2 ,3 ),
( 1 ,2 ),
( 3 ,1 )
;WITH CTE AS (
SELECT O.*, P.PROMO, P.DESCR, C.NAME FROM #ORDERS O
INNER JOIN #PRODUCTS P ON O.ID_P = P.ID
INNER JOIN #CUSTOMERS C ON O.ID_C = C.ID
)
SELECT DISTINCT T.PROMO,
STUFF(Product.Descrs,1,1,'') products,
STUFF(Customer.Names,1,1,'') CUSTOMERS
FROM CTE T
CROSS APPLY (SELECT DISTINCT ',' + T1.DESCR FROM CTE T1 WHERE T.PROMO = T1.PROMO FOR XML PATH('')) AS Product(Descrs)
CROSS APPLY (SELECT DISTINCT ',' + T1.NAME FROM CTE T1 WHERE T.PROMO = T1.PROMO FOR XML PATH('')) AS Customer(Names)
Result:
PROMO products CUSTOMERS
---------- -------------- -------------------------
gold prod1,prod2 Alice,Bob,Charlie
silver prod3 Alice

SQL Server Sum Columns From Two Tables With Condition

I have two tables:
Sales:
SaleID | DayNumber | Quantity | ProductID
1 | 1 | 10 | 1
2 | 1 | 150 | 4
3 | 1 | 70 | 6
4 | 2 | 30 | 2
5 | 2 | 40 | 3
6 | 2 | 45 | 5
7 | 2 | 15 | 8
and Products:
ProductID | Price
1 | 12
2 | 52
3 | 643
4 | 42
5 | 75
6 | 53
7 | 2
8 | 7
So I wanna do get some results, but I have no idea how can I do them. I want to calculate for example the sum of the sold products for days 2,3 and 4. And also the average earnings for each day.

i think you need this simple sum with join query
create table #sales
(id int not null identity(1,1),
DayNUmber int,
Quantity int,
ProductID int)
create table #Products(
id int not null identity(1,1),
Price money
)
insert #sales
values
(1,10,1),
(1,150,4),
(1,70,6),
(2,30,2),
(2,40,3),
(2,45,5),
(2,5,8);
insert #Products
values
(12),
(52),
(643),
(42),
(75),
(53),
(2),
(5);
select
sum(s.Quantity*p.Price) as [sum],
s.DayNumber from #sales s inner join #Products p
on s.ProductID=p.id where s.DayNumber in (2,3)
group by s.DayNUmber

Join on common ID.
Take sum of Quantity * Price.
Group by Day and filter as needed.
SELECT
SUM(Quantity*Price)
,DayNumber
FROM Sales s
INNER JOIN PRODUCT p ON p.ProductID = s.ProductID
WHERE DayNumber IN(2,3,4)
GROUP BY DayNumber

To make this work you need to JOIN the two tables in your query. You can then use SUM to calculate the sum of the sold products and earnings. To filter for days 2, 3, and 4 you can use IN in your WHERE clause.
SELECT S.DayNumber, SUM(S.Quantity) AS 'Products Sold', SUM(S.Quantity * P.Price) AS 'Total Earnings'
FROM Sales AS S JOIN Products AS P ON S.ProductID = P.ProductID
WHERE S.DayNumber IN (2, 3, 4)
GROUP BY S.DayNumber

You need AVG per day and Total Sale per day. Filter as per the days required
SELECT SUM(p.Price * s.Quantity) TotalsalePerday, AVG(p.Price * s.Quantity) AvgsalePerday ,s.DayNumber from SO_Products p
INNER JOIN SO_Sales s ON p.ProductID = s.ProductID
GROUP BY DayNumber
ORDER BY DayNumber
/------------------------
OUTPUT
------------------------/
TotalsalePerday AvgsalePerday DayNumber
--------------- ------------- -----------
10130 3376 1
30760 7690 2
(2 row(s) affected)

what is the query to join 2 table to calculate qty per stuff

is it possible to create a query that join 2 table to calculate the total qty per stuff code ? I am using SQL Server 2008.
I have table Purchase
id_Purchase| stuff_code| qty
------------------------------
1 | G-001 | 6000
2 | G-002 | 4000
3 | G-003 | 2000
4 | G-001 | 5000
and table Selling
id_selling| id_purchase| qty
------------------------------
1 | 1 | 2000
2 | 1 | 3000
3 | 2 | 1000
id_purchase is foreign key from table Purchase
what I want is the query to generate this
stuff_code| qty
-----------------
G-001 | 6000
G-002 | 3000
G-002 | 2000
note that the G-001 qty is from 6000 + 5000 - 2000 - 3000
this is my current query
SELECT stuff_code, SUM(P.qty)-ISNULL(SUM(S.qty),0)
FROM Purchase P LEFT JOIN Selling S ON P.ID_Purchase = S.ID_Purchase
GROUP BY stuff_code
and the result is
stuff_code| qty
-----------------
G-001 | 12000
G-002 | 3000
G-002 | 2000

Below is one method, using common table expressions for purchases and sales summaries by stuff_code.
CREATE TABLE dbo.Purchase(
id_Purchase int
CONSTRAINT PK_Purchase PRIMARY KEY
, stuff_code varchar(10)
, qty int
);
INSERT INTO dbo.Purchase VALUES
(1, 'G-001', 6000)
,(2, 'G-002', 4000)
,(3, 'G-003', 2000)
,(4, 'G-001', 5000);
CREATE TABLE dbo.Selling(
id_selling int
CONSTRAINT PK_Selling PRIMARY KEY
, id_purchase int
, qty int
);
INSERT INTO dbo.Selling VALUES
(1, 1, 2000)
, (2, 1, 3000)
, (3, 2, 1000);
WITH
purchase_summary AS (
SELECT stuff_code, SUM(qty) AS qty
FROM dbo.Purchase
GROUP BY stuff_code
)
,sales_summary AS (
SELECT p.stuff_code, SUM(s.qty) AS qty
FROM dbo.Selling AS s
JOIN dbo.Purchase AS p ON
p.id_purchase = s.id_purchase
GROUP BY p.stuff_code
)
SELECT
purchase_summary.stuff_code
, purchase_summary.qty - COALESCE(sales_summary.qty, 0) AS qty
FROM purchase_summary
LEFT JOIN sales_summary ON
sales_summary.stuff_code = purchase_summary.stuff_code
ORDER BY
stuff_code;

You have data sets at two different granularities. Get them at the same grain and you can just do math straight across.
SELECT p.ID_Purchase
, p.stuff_code
, p.StuffPurchased
, ISNULL(s.StuffSold,0) as StuffSold
, p.StuffPurchased - ISNULL(s.StuffSold,0) as StuffLeft
FROM (SELECT
ID_Purchase
, stuff_code
, SUM(Qty) StuffPurchased
FROM Purchase
GROUP BY ID_Purchase
, stuff_code) p
LEFT JOIN (
SELECT ID_Purchase
, SUM(Qty) StuffSold
FROM Selling
GROUP BY ID_Purchase) s
ON P.ID_Purchase = S.ID_Purchase

Try something like:
SELECT table1.id_purchase, table1.stuff_code, table1.sum - table2.sum
FROM
((SELECT id_purchase, stuff_code, SUM(qty) sum
FROM Purchase
GROUP BY id_purchase) AS table1
INNER JOIN
(SELECT id_purchase, SUM(qty) sum
FROM SELLING
GROUP BY id_purchase) AS table2
ON table1.id_purchase = table2.id_purchase);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas