Calculating Top Ten Categories in SSIS - sql

I'm using an SSIS package to update my contents daily basis. There are thousands of contents which have different Moderation ID's and I want to calculate top ten categories FOR EACH Moderation ID. Before I realized that I should calculate it for each ModerationId, I used this query to get the Contents to be updated:
SELECT TOP 10 ModerationId, Category, COUNT(ContentSeqNum) AS Total FROM Content
WHERE Category IS NOT NULL
GROUP BY ModerationId, Category ORDER BY ModerationId, Total DESC
And that was a faulty approach because this query calculates top ten Categories for all the data, which should be different top ten categories for different ModerationId's.
How can I change this query to calculate Top 10 Categories for each ModerationId?

Use Window Function to get the to calculate top ten categories for Moderation ID. Try this.
SELECT moderationid,
category,
total
FROM (SELECT Row_number() OVER (partition BY moderationid
ORDER BY Count(contentseqnum)) Rn,
moderationid,
category,
Count(contentseqnum) AS Total
FROM content
WHERE category IS NOT NULL
GROUP BY moderationid,
category) A
WHERE rn <= 10

Use Row_number() function
select * from
(
select *,
row_number() over(partition by ModerationId order by ModerationId) as sno
from Content WHERE Category IS NOT NULL
) as t
where sno<=10
Find more methods at http://beyondrelational.com/modules/2/blogs/70/posts/10845/return-top-n-rows.aspx

Try this:
SELECT TOP(10) ModerationId, Category, COUNT(ContentSeqNum) OVER(PARTITION BY ModerationId ORDER BY ModerationId) AS Total
FROM Content
WHERE Category IS NOT NULL
ORDER BY Total DESC

Related

Snowflake SQL code to show only second record for items with duplicate ID

I'm trying to get my head around SQL and am using Snowflake as a testbed to do this. I have a table with products which have multiple reviews against them. I am trying to structure a query to only show products with 2 or more reviews and then only show the second review. As I say, this is merely me trying to better understand SQL so selecting the second review is a random ask. The table is made up of 4 columns. 1 is Product ID, 2 is Product Name, 3 is Review and 4 is Date Review was posted.
Thanks in advance for any help.
You use row_number() for this type of query:
select t.*
from (select t.*,
row_number() over (partition by product_id order by date_review asc) as seqnum
from t
) t
where seqnum = 2;
You can use a windowing function like ROW_NUMBER() to make numbered groupings, eg:
WITH Review_Sequence (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Review_Date) Review_No
FROM Reviews r
)
SELECT * FROM Review_Sequence WHERE Review_No = 2

SQL - show groups plus subgroups

I have a StockLines table. See image below. How can I extract groups and group items at once from the server. The structure I want to extract from code is:
Site,
Article,
List of StockLines
I do:
select top 20 Site, Article, SUM(Quantity)
from StockLines
group by Site, Article
that gives me the groups. Now foreach group I want to return 10 sub items. Should I create a new select foreach group to get its child items?
This will return rows with least 10 ids (change it as needed) for every group of Site, Article along with the sum of all the rows in the group.
select *
from(
select *, sum(Quantity) over(partition by Site, Article) s
-- change order by expression as needed
, row_number() over(partition by Site, Article order by Id) rn
from StockLines
) t
where rn <= 10;

I need the Top 10 results from table

I need to get the Top 10 results for each Region, Market and Name along with those with highest counts (Gaps). There are 4 Regions with 1 to N Markets. I can get the Top 10 but cannot figure out how to do this without using a Union for every Market. Any ideas on how do this?
SELECT DISTINCT TOP 10
Region, Market, Name, Gaps
FROM
TableName
ORDER BY
Region, Market, Gaps DESC
One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your Region, Market, Name - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH RegionsMarkets AS
(
SELECT
Region, Market, Name, Gaps,
RN = ROW_NUMBER() OVER(PARTITION BY Region, Market, Name ORDER BY Gaps DESC)
FROM
dbo.TableName
)
SELECT
Region, Market, Name, Gaps
FROM
RegionsMarkets
WHERE
RN <= 10
Here, I am selecting only the "first" entry for each "partition" (i.e. for each Region, Market, Name tuple) - ordered by Gaps in a descending fashion.
With this, you get the top 10 rows for each (Region, Market, Name) tuple - does that approach what you're looking for??
I think you want row_number():
select t.*
from (select t.*,
row_number() over (partition by region, market order by gaps desc) as seqnum
from tablename t
) t
where seqnum <= 10;
I am not sure if you want name in the partition by clause. If you have more than one name within a market, that may be what you are looking for. (Hint: Sample data and desired results can really help clarify a question.)

Selecting 5 Most Recent Records Of Each Group

The below statement retrieves the top 2 records within each group in SQL Server. It works correctly, however as you can see it doesn't scale at all. I mean that if I wanted to retrieve the top 5 or 10 records instead of just 2, you can see how this query statement would grow very quickly.
How can I convert this query into something that returns the same records, but that I can quickly change it to return the top 5 or 10 records within each group instead, rather than just 2? (i.e. I want to just tell it to return the top 5 within each group, rather than having 5 unions as the below format would require)
Thanks!
WITH tSub
as (SELECT CustomerID,
TransactionTypeID,
Max(EventDate) as EventDate,
Max(TransactionID) as TransactionID
FROM Transactions
WHERE ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID)
SELECT *
from tSub
UNION
SELECT t.CustomerID,
t.TransactionTypeID,
Max(t.EventDate) as EventDate,
Max(t.TransactionID) as TransactionID
FROM Transactions t
WHERE t.TransactionID NOT IN (SELECT tSub.TransactionID
FROM tSub)
and ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID
Use Partition by to solve this type problem
select values from
(select values ROW_NUMBER() over (PARTITION by <GroupColumn> order by <OrderColumn>)
as rownum from YourTable) ut where ut.rownum<=5
This will partitioned the result on the column you wanted order by EventDate Column then then select those entry having rownum<=5. Now you can change this value 5 to get the top n recent entry of each group.

Write an Oracle query to get top 10 products for top 5000 stores [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Get top 10 products for every category
I am looking for an Oracle query to get top 5000 stores and for each store get top 10 products and for each top 10 products get top 5 sub-products. So In total I should get 5000*10*5 rows.
Can someone help me get this using Oracle's analytical functions.
My current query looks like
SELECT
store,
product,
sub-product,
count(*) as sales
FROM stores_data
GROUP BY store, product, sub-product;
Please assume table names as stores_data with columns store_id , product,sub_product
You should use dense_rank to get the top N rows.
Something like
SELECT
storeid,
store,
productid,
product,
subproductid,
subproduct
FROM
(
SELECT
s.storeid,
s.store,
p.productid,
p.product,
sp.subproductid,
sp.subproduct,
dense_rank() over ( order by s.storeid) as storerank,
dense_rank() over ( partition by s.storeid
order by p.productid) as productrank
dense_rank() over ( partition by s.storeid, p.productid
order by sp.subproductid) as productrank
FROM
stores s
INNER JOIN products p on p.storeid = s.storeid
INNER JOIN subproduct sp on sp.productid = p.productid
) t
WHERE
t.storerank <= 5000 and
t.productrank < 10 and
t.subproductrank < 5
Of course, I don't now your tables nor the relation between them. And the actual fields and conditions you want to check for, so this is just a simple query getting the top N records based on their id. Also, this query expects a product to have only one store which might not be the case.. At least it will show you how to use dense_rank to get a three-layered sorting/filtering.
I'll leave the other answer because that looks more like how such a table structure should be, I think.
But you described in your other thread to have a table that looks like this:
create table store_data (
store varchar2(40),
product varchar2(40),
subproduct varchar2(40),
sales int);
That actually looks like data that is aggregated already and that you do now want to analyze again. You query could look like this. It first aggregates the sum of the sales, so you can order shops and products by sales too (the sales in the table seem to be for the subproducts. After that, you can add ranks to the shops and products by sales. I added a rank to the subproducts too. I used rank here, so there is a gap in the numbering when more records have the same sales. This way, when you got 8 records with a rank of 1, because they all have the same sales, the 6th record will actually have rank 9 instead of 2, so you will only select the 8 top stores (you wanted 5, but why skip the other 3 if they actually sold exactly the same) and not 4 others too.
select
ts.*
from
(
select
ss.*,
rank() over (order by storesales) as storerank,
rank() over (partition by store order by productsales) as productrank,
rank() over (partition by store, product order by subproductsales) as subproductrank
from
(
select
sd.*,
sum(sales) over (partition by store) as STORESALES,
sum(sales) over (partition by store, product) as PRODUCTSALES,
sum(sales) over (partition by store, product, subproduct) as SUBPRODUCTSALES
from
store_data sd
) ss
) ts
where
ts.storerank <= 2 and
ts.productrank <= 3 and
ts.subproductrank <= 4