SQL JOIN + GROUPBY select data from row with MAX(Date) - sql

I'm having trouble figuring out the solution to this SQL query.
Schema
Edit: Adding Item Table
Item Table
PK ItemID
lots of other columns
Linking Table
FK ItemID uniqueidentifier
FK TransactionID uniqueidentifier
Transaction Table
PK ID uniqueidentifier
EntryDateTime DateTime
(several other rows of int, varchar...)
Edit : I think I haven't made the relationships clear. Each ITEM (table not shown) can have multiple transactions. Multiple items can share the same transaction (hence the linking table).
Please see the bottom for my current Query. I have left this striked to show the progression of the question.
I want to do something like this query. The trick is I want the t.varchar and t.int columns to be whatever values are in the MAX(t.EntryDateTime) row. I don't even know if group by is the right way to do this query.
SELECT lt.ItemID, MAX(t.EntryDateTime), t.varchar, t.int
FROM LinkingTable lt
LEFT JOIN Transactions t ON lt.TransactionID = t.ID
GROUP BY lt.ItemID
This table is going to be joined against in this SQL query, so please try and give me the most performant solution . Assume Table1 will contain millions of records.
SELECT
(many columns)
FROM Table1
LEFT JOIN Table2 ON Table1.Table2ID = Table2.ID
LEFT JOIN Table3 ON ....
LEFT JOIN Table4 ON (Table2.ID = Table4.Table2ID and Table4.LocaleID = 127 and Table4.Type = 0)
LEFT JOIN **the query above** AS vTable1 ON vTable1.ItemID = Table1.ID
WHERE Table1.CheckID IN (SELECT ID FROM Checks WHERE ....)
Edit : This is the query I have that is working, but I'm not sure its the most efficient. LinkingTable has ~ 200k records and its taking 6 seconds to run.
SELECT DISTINCT lt.ItemID, t.EntryDateTime, t.varchar, t.int
FROM LinkingTable lt
LEFT JOIN Transactions t ON t.id = (SELECT Top 1 t2.id FROM LinkingTable lt2
LEFT JOIN Transactions t2 on lt2.TransactionID = t2.ID
where lt2.ItemID = lt.ItemID ORDER BY t2.PrintTime DESC)

Try this,
SELECT i.*, outerT.EntryDateTime, outerT.varchar, outerT.int
FROM Item i
LEFT JOIN
(SELECT ItemId AS outerItemId, EntryDateTime, varchar, int
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY lt.ItemId ORDER BY t.EntryDateTime) AS RowNumber, lt.ItemId, t.EntryDateTime, t.varchar, t.int
FROM Tranaction t INNER JOIN LinkingTable lt ON lt.TransactionId = t.ID) innerT
WHERE RowNumber = 1) outerT ON outerT.outerItemId = Item.ID
Hope this solves your problem

Even with a million plus records, you will have some performance hits, but I would ensure and index on the transaction table based on the ( ItemID, Primary Key ). The reason Primary Key and not the date -- if its auto-incremented, and it's date/time stamped at time the transaction occurs, they will be in-essence, one-in-the-same. The last entry in the file will always have the latest date. That said, an ID column should be faster with index than a date/time. This also prevents need of looking at BOTH elements of most recent date, and the transaction ID associated with that date. Here is how I would FIRST attempt the query.
select
I.*,
T2.*
from
Item I
JOIN
( select T.ItemID, MAX( T.PrimaryKey ) as LastEntryPerItem
from Transactions T
group by T.ItemID ) MaxPerItem
ON I.ItemID = T.ItemID
JOIN Transactions T2
on MaxPerItem.LastEntryPerItem = T2.PrimaryKey
order by
whatever

select lt.ItemId, t.entrydatetime, t.varchar, t.int
from LinkingTable lt
left join transactions t
on lt.transactionId = t.id
and t.entryDateTime = (select max(t.EntryDateTime)
from transactions t2
where t2.id = t.id)
I had a similar question before
( SQL Join to get value belong with most recent date). There's another solution by JNK involving two joins which may be faster. I've posted below. You'll need to test to see which performs better.
select lt.ItemId, t.entrydatetime, t.varchar, t.int
from LinkingTable lt
inner join transactions t
on lt.ItemId= t.ItemId
Inner join (SELECT ItemId, MAX(entrydatetime) entrydatetime
FROM transactions t2
GROUP BY ItemId) SubQ
ON SubQ.ItemId= t.ItemId
AND SubQ.entrydatetime= t.entrydatetime

Why don't you create a view that has all your "many columns" and then run a query against that view?

Related

SQL Multiple INNER JOINS In One Select-Statement

I am using this code for inventory management system, in which i want to retrieve stock in hand from four tables. i have tried with two table and got accurate result as i need it.please help me out.
Table Schema
Productmastertb
prod_id,
Product_name
salesdetailstb
sales_id,
Prod_id,
Prod_qty
estimatedetailstb
est_id,
Prod_id,
Prod_qty
Purchasedetailstb
est_id,
Prod_id,
Prod_qty
Query example (working):
SELECT
productmastertb.prod_id,
productmastertb.prod_name,
sum(estimatedetailstd.prod_qty) as Est_qty
FROM
productmaster
INNER JOIN
estimatedetailstb ON productmastertb.prodid = estimatedetails.prodid
GROUP BY
productmastertb.prod_id, productmastertb.prod_name
Similarly I have to retrieve sum of salesdetailstb.qty and purchasedetailstb.qty
Thanks in advance
You want to summarize across different "dimensions" -- that is tables. One good approach is to aggregate before doing the JOINs. Or to use subqueries. Here is the latter approach:
SELECT pm.prod_id, pm.prod_name,
(SELECT SUM(ed.prod_qty)
FROM estimatedetailstb as ed
WHERE ed.prodid = ed.prodidas
) as Est_qty,
(SELECT SUM(sd.prod_qty)
FROM salesdetailstb as sd
WHERE sd.prodid = pm.prodidas
) as Sales_qty,
(SELECT SUM(pd.prod_qty)
FROM purchasedetailstb as pd
WHERE pd.prodid = pm.prodid
) as Sales_qty
FROM productmaster pm;
This will give you all products, even those missing from one or more of the other tables.
You can add multiple joins.
SELECT t1.id, t4.name, count(t4.name)
FROM Table1 AS t1
INNER JOIN Table2 AS t2 -- the AS statement renames the table within
-- this query to t2. Columns from this table can be used
-- as t2.columnname. This needs to be done when you have
-- columns with the same name in different tables.
ON t1.id = t2.id
INNER JOIN Table3 as t3
ON t1.id = t3.id
INNER JOIN Table4 as t4
ON t3.name = t4.name
GROUP BY t1.id, t4.name

Count records only from left side of a LEFT JOIN

I'm building an Access query with a LEFT JOIN that, among other things, counts the number of unique sampleIDs present in the left table of the JOIN, and counts the aggregate number of specimens (bugs) present in the right table of the JOIN, both for a given group of samples (TripID). Here's the pertinent chunk of SQL code:
SELECT DISTINCT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t2.C1 + t2.C2)
AS Bugs FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
The trouble I'm having is that COUNT(t1.SampleID) is not giving me my desired result. My desired result is the number of unique SampleIDs present in t1 for a given TripID (let's say 7). Instead, what I get seems to be the number of rows in t2 for which the SampleID is contained within the given TripID group (let's say 77). How can I change this SQL query to get the desired number (7, not 77)?
just take the aggregate sum first on t2, then join with t2 like this:
SELECT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t3.Bugs) as Bugs
FROM tbl_Sample AS t1
LEFT Join (
SELECT t2.SampleID, SUM(t2.C1 + t2.C2) as Bugs
FROM tbl_Bugs as t2
GROUP BY SampleID) AS t3 ON t1.SampleID = t3.SampleID
GROUP BY t1.TripID
This is a tricky query, because you have different hierarchies. Here is one method:
select s.tripid, count(*) as numsamples,
(select sum(b2.c1 + b2.c2)
from bugs b join
tbl_sample s2
on s2.sampleid = b.sampleid
where s2.tripid = s.tripid
) as numbugs
from tbl_sample s
group by s.tripid
You included a DISTINCT with a Group By. This is removing duplicates twice, which is unnecessarily complex. You can get rid of the DISTINCT.
I would have the count separate from what is going on in the group by.
SELECT dT.TripID
,(SELECT COUNT(DISTINCT(SampleID))
FROM Bugs B
WHERE B.TripID = dT.TripID
) AS [Samples]
,dT.Bugs
FROM (
SELECT t1.TripID
,SUM(t2.C1 + t2.C2) AS Bugs
FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
) AS dT

INNER JOIN with a large table

I am currently using Microsoft Access 2013, and I am trying to join Table1 and Table2 together, but the problem is that Table2 is massive. Table1 is a list of part, vendor combinations with PK as part, vendor. Table2 is a table I created with the top2 most recent quotes for each part,vendor combination. All these quotes were pulled from a table with PK quote_id. I think my creation of Table2 might be the problem, because I cannot create Table2 with every part,vendor combination (i have to filter out by vendor). This is the query I used for Table2.
a.part, a.vendor, a.quote_date
FROM quoteTable AS a
WHERE a.quote_date > DATEADD("yyyy", -3, DATE()) AND
a.quote_date IN
(SELECT TOP 2 quote_date
FROM quoteTable
WHERE quote_date > DATEADD("yyyy", -3, DATE()) AND
part=a.part AND vendor=a.vendor
ORDER BY quote_date DESC)
If anyone knows a better way to select the top 2 most recent quotes from the table for each part,vendor combination, I would really appreciate it. As for the join, this works but would take too long.
SELECT *
FROM Table1 AS a INNER JOIN Table2 AS b ON a.id = b.id
I am wondering if there was a way I could use the id from Table1 to filter Table2? Something like this:
SELECT *
FROM Table1 AS a INNER JOIN
(SELECT * FROM Table2 WHERE id=a.id) AS b ON a.id = b.id
You definitely could use:
SELECT *
FROM Table1 AS a
INNER JOIN
(SELECT * FROM Table2 WHERE id=a.id) AS b
ON a.id = b.id
Or you could use :
With CTE
as
(
SELECT *
FROM Table2
WHERE id in (select id from table1)
)
SELECT *
FROM Table1 AS a
INNER JOIN
CTE
ON CTE.Id = a.Id
For performance issue, you could try to create index on id column from both tables, or try to limit the selected column from your result.
The real solution is that you need an index on TABLE2.id. I am assuming that it is not the primary key of that table, because it probably is the primary key of TABLE1, and why would you have two tables with the exact same, matching primary keys? It would not be a normalized layout. But then again, there are times when it makes sense to de-normalize.

Left Join with Distinct Clause

Below is my insert query.
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT Q.CUSTNO, NVL(((NVL(P.RATING,0) * '10.0')/100),0) AS RATING
FROM TB_ACCOUNTS Q LEFT JOIN TB_CUSTPARAM P
ON P.TEXT_PARAM IN (SELECT DISTINCT PRDCD FROM TB_ACCOUNTS)
AND P.TABLENAME='TB_ACCOUNTS' AND P.COLUMNNAME='PRDCD';
In the previous version of the query, P.TEXT_PARAM=Q.PRDCD but during insert to TEMP_CUSTPARAM due to violation of unique constraint on CUSTNO.
The insert query is taking ages to complete. Would like to know how to use distinct with LEFT JOIN statement.
Thanks.
SELECT T1.Col1, T2.Col2 FROM Table1 T1
Left JOIN
(SELECT Distinct Col1, Col2 FROM Table2
) T2 ON T2.Id = T1.Id
You are missing criteria to join TB_ACCOUNTS records with their related TB_ACCOUNTS/PRDCD TB_CUSTPARAM records and thus cross join them instead. I guess you want:
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT
Q.CUSTNO,
NVL(P.RATING, 0) * 0.1 AS RATING
FROM TB_ACCOUNTS Q
LEFT JOIN TB_CUSTPARAM P ON P.TEXT_PARAM = Q.PRDCD
AND P.TABLENAME = 'TB_ACCOUNTS'
AND P.COLUMNNAME = 'PRDCD';
If the query is taking ages to complete, check first the execution plan. You may find some hints here - If you see a cartesian join on two non-trivial tables, probably the query should be revisited.
Than ask yourself what is the expectation of the query.
Do you expect one record per CUSTNO? Or can a customer have more than one rating?
One reting per customer could have sense from the point of business. To get unique customer list with rating
1) first get a UNIQUE CUSTNO - note that this is in generel not done with a DISTINCT clause, but if tehre are more rows per customer with a filter predicate, e.g. selecting the most recent row.
2) than join to the rating table

selecting records from main table and count of each row in another table

I have 2 table in my database that tables are in relationship with foreign key
I want to select all records from main table and then select count of each row in another table than have same ID from main table I tried to create a select query but it is not work correctly
this query return all records from main table + count of all records from next table(not count of each row in relationship)
SELECT tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID,
tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc,
(SELECT COUNT(dbo.tblForumPosts.id) AS Expr1
FROM dbo.tblForumSubGroups INNER JOIN dbo.tblForumPosts ON
dbo.tblForumSubGroups.id = dbo.tblForumPosts.SubGroupID) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1 INNER JOIN
dbo.tblForumPosts AS tblForumPosts_1 ON tblForumSubGroups_1.id
= tblForumPosts_1.SubGroupID
SELECT tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID, tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc,
COUNT(tblForumPosts_1.id) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1
INNER JOIN dbo.tblForumPosts AS tblForumPosts_1 ON tblForumSubGroups_1.id = tblForumPosts_1.SubGroupID
GROUP BY tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID, tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc
I would suggest cross apply as you can do a lot more things with it ...
SELECT t1.id,
t1.GroupID,
t1.SubGroupTitle,
t1.SubGroupDesc,
t2.val
FROM dbo.tblForumSubGroups AS t1
cross apply (SELECT COUNT(*)
FROM dbo.tblForumPosts as t2
WHERE t1.id = t2.SubGroupID) x(val)
Do not mix sub-query and join logic. Use only one of them. I prefer sub-select.
SELECT tblForumSubGroups_1.id,
tblForumSubGroups_1.GroupID,
tblForumSubGroups_1.SubGroupTitle,
tblForumSubGroups_1.SubGroupDesc,
(SELECT COUNT(*)
FROM dbo.tblForumPosts
WHERE dbo.tblForumSubGroups.id = dbo.tblForumPosts.SubGroupID) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1
Just to supply another answer though I believe the cross apply is likely the best option:
SELECT
A.id, A.GroupID, A.SubGroupTitle, A.SubGroupDesc,
B.IDCount AS Expr1
FROM dbo.tblForumSubGroups A
INNER JOIN (
Select SubGroupID, Count(ID) as IDCount
from dbo.tblForumPosts
Group By SubGroupID
) B On A.ID = B.SubGroupID