Ranking before grouping problem in SQL Server 2005 - sql

HI,
This should be easy but I don't understand enough about how grouping works.
Basically I have 2 tables "Categories" and "Items"
Categories
ID
CategoryName
Items
ID
CategoryID
ItemName
Photo
Score
All I want to do is get 1 row for each category which contains the Category ID, the Category Name and the photo that belongs to the highest scoring item.
So I have tried joining the categories to the items and grouping by the CategoryID. Trouble is that I want to order the items so that the highest scoring items are at the top before it does the groupings to make sure that the photo is from the current highest scoring item in that category. If I select MAX(I.score) I can get the highest score but I'm not sure how to get accompanying photo as MAX(photo) will obviously give me the photo with the highest file name alphabetically.
I hope I've explained that well.

You could try something like (Full example)
DECLARE #Categories TABLE(
ID INT,
CategoryName VARCHAR(50)
)
DECLARE #Items TABLE(
ID INT,
CategoryID INT,
ItemName VARCHAR(50),
Photo VARCHAR(50),
Score FLOAT
)
INSERT INTO #Categories (ID,CategoryName) SELECT 1, 'Cat1'
INSERT INTO #Categories (ID,CategoryName) SELECT 2, 'Cat2'
INSERT INTO #Items (ID,CategoryID,ItemName,Photo,Score) SELECT 1, 1, 'Item1', 'PItem1', 1
INSERT INTO #Items (ID,CategoryID,ItemName,Photo,Score) SELECT 2, 1, 'Item2', 'PItem2', 2
INSERT INTO #Items (ID,CategoryID,ItemName,Photo,Score) SELECT 3, 1, 'Item3', 'PItem3', 3
INSERT INTO #Items (ID,CategoryID,ItemName,Photo,Score) SELECT 4, 2, 'Item4', 'PItem4', 5
INSERT INTO #Items (ID,CategoryID,ItemName,Photo,Score) SELECT 5, 2, 'Item5', 'PItem5', 2
SELECT *
FROM (
SELECT c.ID,
c.CategoryName,
i.Photo,
i.Score,
ROW_NUMBER() OVER(PARTITION BY i.CategoryID ORDER BY i.Score DESC) RowID
FROM #Categories c INNER JOIN
#Items i ON c.ID = i.CategoryID
) CatItems
WHERE RowID = 1
Using the ROW_NUMBER you can selet the items you require.

You need to aggregate first and join back like this.
(If you change grouping, you need to change JOIN)
SELECT
...
FROM
(
select
max(Score) AS MaxScore,
CategoryID
FROM
Items
GROUP BY
CategoryID
) M
JOIN
Items I ON M.CategoryID = I.CategoryID AND M.MaxScore = I.Score
JOIN
Categories C ON I.CategoryID = C.CategoryID

This is a pretty common problem, and one that SQL Server doesn't solve particularly well. Something like this should do the trick, though:
select
c.ID,
c.CategoryName,
item.*
from Categories c
join (
select
ID,
CategoryID,
ItemName,
Photo,
Score,
(row_number() over order by CategoryID, Score desc) -
(rank() over order by CategoryID) as rownum
from Items) item on item.CategoryID = c.CategoryID and item.rownum = 0
While there is no explicit group by clause, this (for practical purposes) groups the Categories records and gives you a joined statement that allows you to view any property of the highest scoring item.

You can use row numbers to rank items per category:
select *
from (
select
row_number() over (partition by c.id order by i.score desc) rn
, *
from Categories c
join Items i on c.ID = i.CategoryID
) sub
where rn = 1
In SQL 2005, you can't reference a row_number() directly in a where, so it's wrapped in a subquery.

Exactly as you worded it:
"the Category ID, the Category Name and the photo that belongs to the highest scoring item." -- Now here I surmise you really meant "...highest scoring item in that category", no?)
Select CategoryID, c.Categoryname, Photo
From items i Join Categoiries c
On c.ID = i.CategoryId
Where Score = (Select Max(Score) From Items
Where CategoryID = i.CategoryId)
If you really meant the highest scoring item on the whole items table, then just omit the predicate in the subquery
Select CategoryID, c.Categoryname, Photo
From items i Join Categoiries c
On c.ID = i.CategoryId
Where Score = (Select Max(Score) From Items)
Both these queries will return multiple rows per group if there are more than one item in the defined group which tie for highest score..

Related

SQL Server: Query for products with matching tags

I have been pondering over this for the past few hours but I cannot find a solution.
I have a products in a table, tags in another table and a product/tag link table.
Now I want to retrieve all products which have the same tags as a certain product.
Here are the tables (simplified):
PRODUCT:
id varchar(36) (primary key)
Name varchar(50)
TAG:
id varchar(36) (primary key)
Name varchar(50)
PRODUCTTAG:
id varchar(36) (primary key)
ProductID varchar(36)
TagID varchar(36)
I find quite a few answers here on Stackoverflow talking about returning full and partial matches. However I am looking for a query which only gives full matches.
Example:
Product A has tags 1, 2, 3
Product B has tags 1, 2
Product C has tags 1, 2, 3
Product D has tags 1, 2, 3, 4
If I query for product A, only product C should be found - as it is the only one having exactly the same tags.
Is this even possible?
Yes, yes, try this way:
with aa as (
select count(*) count
from [PRODUCTTAG]
where ProductID = '19A947C0-6A0F-4A6F-9675-48FBE30A877D'
), bb as
(
select ProductID, count(*) count
from [PRODUCTTAG]
group by ProductID
)
select distinct b.ProductID
from [dbo].[PRODUCTTAG] a join
[dbo].[PRODUCTTAG] b on a.TagID = b.TagID cross join
aa join
bb on aa.count = bb.count and b.ProductID = bb.ProductID
where a.ProductID = '19A947C0-6A0F-4A6F-9675-48FBE30A877D'
declare #PRODUCTTAG table(id int identity(1,1),ProductID int,TagID int)
insert into #PRODUCTTAG VALUES
(1,1),(1,2),(1,3)
,(2,1),(2,2)
,(3,1),(3,2),(3,3)
,(4,1),(4,2),(4,3),(4,4)
;With CTE as
(
select ProductID,count(*)smallCount
FROM #PRODUCTTAG
group by ProductID
)
,CTE1 as
(
select smallCount, count(smallCount)BigCount
from cte
group by smallCount
)
,CTE2 as
(
select * from cTE c
where exists(
select smallCount from cte1 c1
where BigCount>1 and c1.smallCount=c.smallCount
)
)
select * from cte2
--depending upon the output expected join this with #PRODUCTTAG,#Product,#Tag
--like this
--select * from #PRODUCTTAG PT
--where exists(
--select * from cte2 c2 where pt.productid=c2.productid
--)
Or Tell what is final output look like ?
This is a case where I find it simpler to combine all the tags into a single string and compare the strings. But, that is painful in SQL Server until 2016.
So, there is a set based solution:
with pt as (
select pt.*, count(*) over (partition by productid) as cnt
from producttag pt
)
select pt.productid
from pt join
pt pt2
on pt.cnt = pt2.cnt and
pt.productid <> pt2.productid and
pt.tagid = pt2.tagid
where pt2.productid = #x
group by pt.productid, pt.cnt
having count(*) = pt.cnt;
This matches every product to your given product based on the tags. The having clause then ensures that the number of matching tags is the same for the two products. Because the join only considers matching tags, all the tags are the same.

SELECT TOP inside INNER JOIN

I created this simple database in SQL Server:
create database product_test
go
use product_test
go
create table product
(
id int identity primary key,
label varchar(255),
description text,
price money,
);
create table picture
(
id int identity primary key,
p_path text,
product int foreign key references product(id)
);
insert into product
values ('flip phone 100', 'back 2 the future stuff.', 950),
('flip phone 200', 's;g material', 1400)
insert into picture
values ('1.jpg', 1), ('2.jpg', 1), ('3.jpg', 2)
What I want is to select all products and only one picture for each product. Any help is greatly appreciated.
I'm a fan of outer apply for this purpose:
select p.*, pi.id, pi.path
from product p outer apply
(select top 1 pi.*
from picture pi
where pi.product = p.id
) pi;
You can include an order by to get one particular picture (say, the one with the lowest or highest id). Or, order by newid() to get a random one.
Have you tried using a correlated sub-query?
SELECT *, (SELECT TOP 1 p_path FROM picture WHERE product = p.id ORDER BY id)
FROM picture p
Hope this helps,
SELECT
*,
(
SELECT TOP 1 p2.p_path
FROM dbo.picture p2
WHERE p.id = p2.product
) AS picture
FROM dbo.product p
Or with join:
SELECT
*
FROM dbo.product p
INNER JOIN
(
SELECT p2.product, MIN(p2.p_path) AS p_path
FROM dbo.picture p2
GROUP BY p2.product
) AS pt
ON p.id = pt.product
But you need to change p_path to varchar type
I would use a windowing function like this:
SELECT *
FROM product
JOIN (
SELECT id, product, p_path,
row_number() OVER (PARTITION BY product ORDER BY id ASC) as RN
FROM picture
) pic ON product.id = pic.product AND pic.RN = 1
As you can see here I am selecting the picture with the lowest id (ORDER BY id ASC) -- you can change this order by to your requirements.
just group by and take min or max
left join in case there is no picture
select pr.ID, pr.label, pr.text, pr.price
, min(pic.p_path)
from product pr
left join picture pic
on pic.product = pr.ID
group by pr.ID, pr.label, pr.text, pr.price

Querying parent category

I have following query now:
SELECT DISTINCT c.Name as CategoryName, SUM(Amount) OVER(partition by c.Name order by c.Name) as Total
FROM Statements s
LEFT JOIN Categories c on c.Id = s.CategoryId
WHERE Date >= '2016-01-01'
ORDER BY c.Name
It works so far and gets every category with its total amount but what I need is the following.
The categories can have child categories. I want to get the sum over every root category and all its child categories for every row.
Means category A has childs B and C. I want to get the sum over all entries in Statements which have category A, B or C.
The category table consists of following columns: Id, Name, ParentId.
The statements table consists of following coluimns: Id, Date, Amount, CategoryId
Sample data of table category:
Id, Name, ParentId
1, A, null,
2, B, 1
3, C, 2
4, D, null
Sample data of table Statements
Id, Date, Amount, CategoryId
1, 2016-01-01, 100, 1
2, 2016-01-01, 200, 2
3, 2016-01-01, 800, 4
4, 2016-01-01, 300, 3
The output of the query should be as follows:
CategoryName, Total
A, 600
D, 800
Ideally I could pass the parentCategoryId into the where clause which is in my sample null but could also be 2 for instance.
Any hints appreciated :)
A common way is to use a recursive CTE to build the hierarchy..
;
WITH cte
AS (
SELECT [Id],
[Name],
[ParentId],
[Name] AS [Root]
FROM Categories
WHERE ParentId IS NULL
UNION ALL
SELECT c.[Id],
c.[Name],
c.[ParentId],
[Root]
FROM Categories c
JOIN cte ON c.ParentID = cte.Name
)
SELECT cte.[Root] AS [CategoryName],
SUM(Amount) AS [Total]
FROM cte
JOIN Statements s ON s.CategoryId = cte.Id
GROUP BY cte.[Root]
This is not an answer to the question (which as I write this lacks sufficient information about how the parent/child relationship is described in the data). But, the query is just not the right query for the OP's stated purpose. The right query is a simple aggregation:
SELECT c.Name as CategoryName, SUM(Amount) as Total
FROM Table s LEFT JOIN
Categories c
on c.Id = s.CategoryId
WHERE Date >= '2016-01-01'
GROUP BY c.Name
ORDER BY c.Name;
I would be quite surprised if the query in the question actually produced useful results, given the duplicated rows for each category.

How to get all child of a given id in SQL Server query

I have two tables in SQL Server database:
category(
itemid,
parentid
)
ArticleAssignedCategories(
categid,
artid
)
categid is a foreign key of itemid
I want to get count of artids and child of that for given itemid (child means categories with parentid of given itemid.)
For example; If given itemid = 1 and in table category have (3,1),(4,1)(5,3)
All of 3, 4, 5 are child of 1
Can anyone help me to write a good query?
Recursive queries can be done using CTE
with CTE(itemid, parentid)
as (
-- start with some category
select itemid, parentid
from category where itemid = <some_itemid>
union all
-- recursively add children
select c.itemid, c.parentid
from category c
join CTE on c.parentid = CTE.itemid
)
select count(*)
from ArticleAssignedCategories a
join CTE on CTE.itemid = a.categid
Here is the query. I hope this may help you
select b.artid,count(b.artid) from category a
inner join ArticleAssignedCategories b on a.itemid = b.artid
group by b.artid

Help with generating a report from data in a parent-children model

I need help with a problem regarding data saved in a parent-children model table and a report I need to build upon it. I've already tried searching for topics about parent-children issues, but I couldn't find anything useful in my scenario.
What I have
A Microsoft SQL Server 2000 database server.
A categories table, which has four columns: category_id, category_name, father_id and visible; the categories have x root categories (where x is variable), and could be y level deep (where y is variable), if a category is a root level one it has father_id null otherwise it's filled with the id of the father category.
A sales table, which has z columns, one of which is category_id, a foreign key to categories.category_id; a sale must always have a category, and it could be linked anywhere in the aforementioned y level.
What I need
I've been asked a report displaying only the root (first level) categories, and the quantity of sales belongings to each of these, or their children, no matter how deep. I.e. if one of the root categories is food, which has a children category named fruit, which has a children category named apple, I need to count every item belonging to food or fruit or apple.
Couldn't you use the nested set data model?
I know of the nested set model, but I already have the table this way, and migrating it to the nested set model would be a pain (let alone I didn't even fully grasp how nested set works), not counting the changes needed in the application using the database. (If someone thinks this is still the least pain way, please explain why and how the current data could be migrated.)
Couldn't you use CTE (Common Table Expressions)?
No, it's a Microsoft SQL Server 2000, and Common Table Expressions are introduced in the 2005 edition.
Thanks in advance, Andrea.
SQL 2000 Based solution
DECLARE #Stack TABLE (
StackID INTEGER IDENTITY
, Category VARCHAR(20)
, RootID INTEGER
, ChildID INTEGER
, Visited BIT)
INSERT INTO #Stack
SELECT [Category] = c.category_name
, [RootID] = c.category_id
, [ChildID] = c.category_id
, 0
FROM Categories c
WHILE EXISTS (SELECT * FROM #Stack WHERE Visited = 0)
BEGIN
DECLARE #StackID INTEGER
SELECT #StackID = MAX(StackID) FROM #Stack
INSERT INTO #Stack
SELECT st.Category
, st.RootID
, c.category_id
, 0
FROM #Stack st
INNER JOIN Categories c ON c.father_id = st.ChildID
WHERE Visited = 0
UPDATE #Stack
SET Visited = 1
WHERE StackID <= #StackID
END
SELECT st.RootID
, st.Category
, COUNT(s.sales_id)
FROM #Stack st
INNER JOIN Sales s ON s.category_id = st.ChildID
GROUP BY st.RootID, st.Category
ORDER BY st.RootID
SQL 2005 Based solution
A CTE should get you what you want
Select each category from Categories to be the root item
recursively add each child of every root item
INNER JOIN the results with your sales table. As every root is in the result of the CTE, a simple GROUP BY is sufficient to get a count for each item.
SQL Statement
;WITH QtyCTE AS (
SELECT [Category] = c.category_name
, [RootID] = c.category_id
, [ChildID] = c.category_id
FROM Categories c
UNION ALL
SELECT cte.Category
, cte.RootID
, c.category_id
FROM QtyCTE cte
INNER JOIN Categories c ON c.father_id = cte.ChildID
)
SELECT cte.RootID
, cte.Category
, COUNT(s.sales_id)
FROM QtyCTE cte
INNER JOIN Sales s ON s.category_id = cte.ChildID
GROUP BY cte.RootID, cte.Category
ORDER BY cte.RootID
Something like this?
CREATE TABLE #SingleLevelCategoryCounts
{
category_id,
count,
root_id
}
CREATE TABLE #ProcessedCategories
{
category_id,
root_id
}
CREATE TABLE #TotalTopLevelCategoryCounts
{
category_id,
count
}
INSERT INTO #SingleLevelCategoryCounts
SELECT
category_id, SUM(*), category_id
FROM
Categories
INNER JOIN Sales ON Categories.category_id = sales.category_id
WHERE
Categories.father_id IS NULL
GROUP BY
Categories.category_id
WHILE EXISTS (SELECT * FROM #SingleLevelCategoryCounts)
BEGIN
IF NOT EXISTS(SELECT * FROM #TopLevelCategoryCounts)
BEGIN
INSERT INTO #TopLevelCategoryCounts
SELECT
root_id, count
FROM
#SingleLevelCategoryCounts
END
ELSE
BEGIN
UPDATE top
SET
top.count = top.count + level.count
FROM
#TopLevelCategoryCounts top
INNER JOIN #SingleLevelCategoryCounts level ON top.category_id = level.count
END
INSERT INTO #ProcessedCategories
SELECT category_id, root_id FROM #SingleLevelCategoryCounts
DELETE #SingleLevelCategoryCounts
INSERT INTO #SingleLevelCategoryCounts
SELECT
category_id, SUM(*), pc.root_id
FROM
Categories
INNER JOIN Sales ON Categories.category_id = sales.category_id
INNER JOIN #ProcessedCategories pc ON Categories.father_id = pc.category_id
WHERE
Categories.category_id NOT IN
(
SELECT category_id in #ProcessedCategories
)
GROUP BY
Categories.category_id
END