Use table with multiple rows with the same value - sql

I'm trying to do a INNER JOIN and count values with information from two tables. The problem is that the product category table have multiple rows with the same or similar value and my COUNT() is to high as a result.
My two tables
Sales table
Date prod_id
2016-01-01 81
2016-01-01 82
2016-01-01 81
2016-10-01 80
2016-01-01 80
2016-01-02 80
2016-01-02 80
2016-01-02 81
2016-01-02 81
.... ....
Product table
prodid Name
80 Banana
81 Apple
82 Orange
83 Ice Cream
80 BANANAS
81 APPLE
82
83 Ice Cream
.... ....
When I do an INNER JOIN and count the number of occurrences of e.g. prod_id I get an unreasonable high number, and my guess is that it's because there are more than one occurrence of prod_id 80 for example.
Do you have any idea for a solution? My first reaction was to redo the Procuct table, but there are many other systems depending on that table so I can't change it a foreseeable future.
My query so far:
SELECT
pt.Date AS "Date",
ft.Name AS "Product",
COUNT(ft.Name) Number
FROM SALES as pt
INNER JOIN PROD_TABLE AS ft ON pt.prod_id=ft.prodid
WHERE pt.Date BETWEEN '2016-01-01' AND '2016-01-30'
GROUP BY pt.Date, ft.Name
ORDER BY pt.Date DESC
Expected result:
Date Product Number
2016-01-01 Banana 2
2016-01-01 Apple 2
2016-01-01 Orange 1

First, you should fix the data. Having a product table with duplicates seems non-sensical. You shouldn't try to get around such issues by writing more complex queries.
That said, this is pretty easy to do in SQL Server. I think outer apply is appropriate:
select p.name, count(*)
from sales s outer apply (
(select top 1 p.*
from product p
where p.name is not null and
p.prodid = s.prod_id -- note: the columns should have the same name
) p;

I guess this simple query solves your requirement:
select
date,
name,
count(name)
from product p inner join sales s
on s.prod_id=p.prodid group by date,name

Related

How to add a query to a table in SQL?

I have 3 tables.
For simplicity I changed them to these sample tables.
table1: CorporateActionSmmary
RATE Quantity ProductID
--------------------------
56 0 1487
30 0 1871
40 0 8750
table2# ProductMaster
RATEGROSS ISIN ProductID
--------------------------
60 JP0001 1487
33 JP0002 1871
45 JP0003 8750
table3# OpenPosition
Quantity ProductID
-------------------
5 1487
1 1487
5 1487
3 1871
2 1871
4 8750
2 8750
7 8750
3 8750
First I need to add ISIN from table2 to table1
table1: CorporateActionSmmary
RATE Quantity ProductID ISIN
-------------------------------------
56 0 1487 JP0001
30 0 1871 JP0002
40 0 8750 JP0003
So, I used this code
SELECT [dbo].[CorporateActionSummary].*, [dbo].[ProductMaster].[ISIN]
FROM [dbo].[CorporateActionSummary] JOIN [dbo].[ProductMaster] ON CorporateActionSummary.ProductID = ProductMaster.ProductID
Now as you can see the Quantity is missing in Table1 so I have to add-up all the quantities in Table3 for each product ID and add to Table1(as a new column or over-write the Quntity column)
I think I can get the sum of each ProductID's Quantity by the following code, But how can I add it to Table1 that already has ISIN column
SELECT SUM(Qantity),ProductID
FROM [dbo].[OpenPositions]
I am super new to SQL, please explain in detail if it is possible, thank you
I am using Microsoft SQL Server Management Studio
you can sum the quantities and then join with your query like so:
SELECT CA.*, PM.[ISIN],CA.Quantity
FROM [dbo].[CorporateActionSummary] CA
JOIN [dbo].[ProductMaster] PM
ON CA.ProductID = PM.ProductID
JOIN (
SELECT ProductID, SUM(Qantity) Quantity
FROM [dbo].[OpenPositions]
GROUP BY ProductID
) OO
on OO.ProductID = CA.ProductID
you are almost there.. you just need to use the same logic to join to the product master table. However, since you need the total of quantity, you need to group by the other columns you select (but not aggregate).
The query will be something like this :
SELECT
[dbo].[CorporateActionSummary].ProductID
, [dbo].[ProductMaster].[ISIN]
,sum([OpenPosition].Quantity) as quantity
FROM [dbo].[CorporateActionSummary]
JOIN [dbo].[ProductMaster]
ON CorporateActionSummary.ProductID = ProductMaster.ProductID
JOIN [dbo].[OpenPosition]
ON CorporateActionSummary.ProductID = OpenPosition.ProductID
group by
[dbo].[CorporateActionSummary].ProductID
, [dbo].[ProductMaster].[ISIN]
if you want to add more columns to your select, then you need to group by those colums as well

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

How to perform group function over an outer join query?

I have the following categories:
category_id name
----- -----
50D34E5A-A935-490A-9492-153DE50A94A2 Luxuries
013E3D0F-E755-495B-8D1E-4A3D1340ACF8 Household
88C477EE-CF99-49B4-9E92-4C41B09A5715 Petrol
40099E3A-18F1-4710-A803-7107648518CC Other
E3B81693-07B5-4D69-A3EC-796CA4290B45 Rent
F0728052-0733-454B-B8EE-96AB6D6E40BE Insurance
6E06581A-1643-4DEC-90B7-9D57F770F313 Groceries
CFD1ED67-7059-4A33-8DD6-F2FFAB213970 Monthly Bill
And the following transactions (shortened and joined on category_id):
category_id amount
------ -----
Luxuries 14
Household 14
Petrol 14
Other 14
Rent 14
Insurance 14
There are no Groceries transactions. I would like to sum these amounts and display their count but include Groceries in the results, but displaying zero. I have tried this:
SELECT SUM(ut.amount), COUNT(ut.amount), c.name
FROM User_Transaction ut
FULL OUTER JOIN Category c ON (ut.category_id = c.category_id)
GROUP BY c.name
total count name
--- --- ---
84 6 Household
84 6 Insurance
84 6 Luxuries
98 7 Monthly Bill
56 4 Other
182 13 Petrol
112 8 Rent
But Groceries has not been included. How can I include Groceries on the result set but just displaying as 0?
Use a LEFT JOIN starting with category (that values you want to keep):
SELECT c.name, COALESCE(SUM(ut.amount), 0) as amount,
COUNT(ut.category_id) as num_transactions,
FROM Category c LEFT JOIN
User_Transaction ut
ON ut.category_id = c.category_id
GROUP BY c.category_id, c.name;
That said, I think your query should do what you want, although it is misleading to use a full join in this context.

How Do I Select All Parents and the Top Previous Child Record Based on Dates in SQL Server 2008

I'm using a vendor provided database running on SQL Server 2008. There are two tables that track tests. For every record in Table A there may be zero, one or multiple records in Table B. There can also be multiple tests in Table A for the same user. The relationship is TableA.UserID = TableB.UserID. Tests taken in Table B can occur before or after Table A.
I need to select all of the records in Table A and, if test(s) from Table B have been taken by the same user before the test in Table A, data from Table B but only from the last previous child record. Both tables are structured similarly:
**TABLE A**
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate DATE,
Score INTEGER
TABLE B
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate Date,
Score INTEGER
Sample Data
TABLE A
TestID UserID TestDate Score
1 100 2014-02-15 80
2 101 2014-02-20 100
3 102 2014-02-22 90
4 102 2014-03-10 70
TABLE B
TestID UserID TestDate Score
1000 100 2014-02-01 55
1007 100 2014-02-05 85
1012 100 2014-02-20 95
1034 102 2014-02-12 65
1205 102 2014-03-05 75
1986 101 2014-03-10 45
What I'd like returned would be:
UserID TestA_ID TestADate TestAScore TestB_ID TestBDate TestBScore
100 1 2014-02-15 80 1007 2014-02-05 85
101 2 2014-02-20 100 NULL NULL NULL
102 3 2014-02-22 90 1034 2014-02-12 65
102 4 2014-03-10 70 1205 2014-03-05 75
I've know how to get all of the previous Table B rows joined to the Table A rows by using a LEFT OUTER JOIN and filtering by date in the WHERE clause, and I know how to get the Top row from Table B, but I haven't been able to work out how to get the top child record that occurs before the date of the record in Table A. Any help would be appreciated. Thanks.
You can do this using OUTER APPLY in T-SQL.
For each record in TableA, we're looking for a record in TableB for the same user but with a test date prior to the test date in TableA and we're also ordering the test in TableB to ensure we're getting the most recent test from TableB (but still prior to the test date from TableA).
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
OUTER APPLY
(
SELECT TOP 1
B1.[TestID] [TestB_ID],
B1.[TestDate] [TestBDate],
B1.[Score] [TestBScore]
FROM [TableB] B1
WHERE A.[UserID] = B1.[UserID]
AND A.[TestDate] > B1.[TestDate]
ORDER BY
B1.[TestDate] DESC
) B
Or another option might be to use the ROW_NUMBER() window function to find the record from TableB. I have a hunch this one wouldn't perform as well because it needs to hit TableA twice, but can't be sure without running tests.
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY A.[UserID], A.[TestID] ORDER BY B.[TestDate] DESC) [rn],
A.[UserID],
A.[TestID] [TestA_ID],
B.[TestID] [TestB_ID],
B.[TestDate] [TestBDate],
B.[Score] [TestBScore]
FROM [TableA] A
INNER JOIN [TableB] B
ON A.[UserID] = B.[UserID]
AND A.[TestDate] > B.[TestDate]
) B
ON A.[UserID] = B.[UserID]
AND A.[TestID] = B.[TestA_ID]
AND B.[rn] = 1

SQL multiple 1-to-many joins

I'm almost certain I've run into this before and am just having an extended senior moment, but I am trying to pull work order data from three different tables across 2 db's on a SQL instance and combine it all into a report, I'm looking for the end result to contain the following columns:
WO | Production Recorded Qty | Inventory Qty | Variance
The Variance part is easy I can just nest the select statement, and subtract the two quantities in the outer statement, but the problem I'm running in to is when I join the production and Inventory tables in their corresponding databases I end up getting sums of the columns that I'm targeting that are way larger than what they should be
Sample Data:
Work Order, Short P/N, and Long P/N in Work Order Table:
dba.1
WO ShortPN LongPN
152 1234 Part1
Short P/N, Quantity on hand, location, and lot # in inventory table:
dba.2
ShortPN Qty Loc Lot
1234 31 Loc1 456
1234 0 Loc2 456
1234 0 Loc4 456
1234 19 Loc1 789
1234 25 Loc4 789
Work Order, Long P/N, and production count of the last 5min in Production table:
dbb.3
WO LongPN Count
152 Part3 6
152 Part3 8
152 Part3 9
152 Part3 4
152 Part3 6
152 Part3 7
With this example I've tried:
SELECT 1.WO AS WO
,SUM(2.Qty) AS "Qty On Hand"
,SUM(3.Count) AS "Produced Qty"
FROM dba.1
INNER JOIN dbb.2 ON 1.ShortPN=2.ShortPN
INNER JOIN dbb.3 ON 1.WO = 3.WO
GROUP BY 1.WO
I've also tried selecting from 3, joining 1 to 3 on the WO, then 2 to 1 on shortPN, and both yield SUM() numbers that are exponentially higher than they should be(ie what should be 15,xxx turns into over 2,000,000), however if I remove one of the data points from the report and select just the inventory or production qty I get the correct end results. I swear that I've run into this before but for the life of me can't remember how it was solved, sorry if it's a duplicate question as well, couldn't find anything by searching.
Thanks in advance for the help, it's greatly appreciated.
You can do something like this
select
WO.WO, isnull(i.Qty, 0) as Qty, isnull(p.[Count], 0) as [Count]
from WorkOrder as WO
left outer join (select t.ShortPN, sum(t.Qty) as Qty from inventory as t group by t.ShortPN) as i on
i.ShortPN = WO.ShortPN
left outer join (select t.WO, sum(t.[Count]) as [Count] from Production as t group by t.WO) as p on
p.WO = WO.WO
SQL FIDDLE example
if you have SQL Server 2005 or higher, you can write it like this
select
WO.WO, isnull(i.Qty, 0) as Qty, isnull(p.[Count], 0) as [Count]
from WorkOrder as WO
outer apply (select sum(t.Qty) as Qty from inventory as t where t.ShortPN = WO.ShortPN) as i
outer apply (select sum(t.[Count]) as [Count] from Production as t where t.WO = WO.WO) as p
SQL FIDDLE example
this happens because when you make a join of WO and inventory tables you got
WO SHORTPN QTY
-------------------
152 1234 31
152 1234 0
152 1234 0
152 1234 19
152 1234 25
and you see that now you have 5 rows with WO = 152. When you add join with Production table, for each row with WO = 152 from this join there will be 6 rows with WO = 152 from Production table, so you will have 30 rows total and QTY from inventory will be listed 6 times each. When you sum this up, instead of 75 you will have 75 * 6 = 450. And for Count you'll have each Count * 5, so instead of 40 you'll have 200.