Using SQL Server : how to use select criteria based on sum - sql

Given the below table and using SQL (SQL Server preferred), how can I select only the ProductID's that sum to the first 200 orders or less is returned?
In other words, I'd like ID's for 'Corn Flakes', 'Wheeties' returned since this is close to the sum of 200 orders but returning anything more would be over the limit.

Given that 108 + 92 = 200, I must assume that you want the product ids in order.
In that case, you can use a cumulative sum:
select t.*
from (select t.*,
sum(orders) over (order by product_id) as running_orders
from t
) t
where running_orders <= 200;

Not sure which is more appropriate for your level and version:
select * from T as t
where (
select sum(Orders) from T as t2
where t2.ProductID <= t.ProductID -- *
) <= 200;
with data as (
select *,
sum(Orders)
over (order by ProductID desc) as cumm -- *
from T
)
select * from data where cumm <= 200;
Both of these essentially assume there will be no ties, or at least no ties that would both land as a single product order in the 200th spot.
If you discover that you intended to sort by number or orders rather than product id change the column references in the lines marked with asterisks.

Related

Producing n rows per group

It is known that GROUP BY produces one row per group. I want to produce multiple rows per group. The particular use case is, for example, selecting two cheapest offerings for each item.
It is trivial for two or three elements in the group:
select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
or price = (select min(price) from fruits as f where f.type = fruits.type
and price > (select min(price) from fruits as f2 where f2.type = fruits.type));
(Select n rows per group in mysql)
But I am looking for a query that can show n rows per group, where n is arbitrarily large. In other words, a query that displays 5 rows per group should be convertible to a query that displays 7 rows per group by just replacing some constants in it.
I am not constrained to any DBMS, so I am interested in any solution that runs on any DBMS. It is fine if it uses some non-standard syntax.
For any database that supports analytic functions\ window functions, this is relatively easy
select *
from (select type,
variety,
price,
rank() over ([partition by something]
order by price) rnk
from fruits) rank_subquery
where rnk <= 3
If you omit the [partition by something], you'll get the top three overall rows. If you want the top three for each type, you'd partition by type in your rank() function.
Depending on how you want to handle ties, you may want to use dense_rank() or row_number() rather than rank(). If two rows tie for first, using rank, the next row would have a rnk of 3 while it would have a rnk of 2 with dense_rank. In both cases, both tied rows would have a rnk of 1. row_number would arbitrarily give one of the two tied rows a rnk of 1 and the other a rnk of 2.
To save anyone looking some time, at the time of this writing, apparently this won't work because https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html.
I've never been a fan of correlated subqueries as most uses I saw for them could usually be written more simply, but I think this has changed by mind... a little. (This is for MySQL.)
SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE `price` IN (
SELECT DISTINCT `price`
FROM `fruits` AS f1
WHERE f1.type = f2.type
ORDER BY `price` ASC
LIMIT X
)
;
Where X is the "arbitrary" value you wanted.
If you know how you want to limit further in cases of duplicate prices, and the data permits such limiting ...
SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE (`price`, `other_identifying_criteria`) IN (
SELECT DISTINCT `price`, `other_identifying_criteria`
FROM `fruits` AS f1
WHERE f1.type = f2.type
ORDER BY `price` ASC, `other_identifying_criteria` [ASC|DESC]
LIMIT X
)
;
"greatest N per group problems" can easily be solved using window functions:
select type, variety, price
from (
select type, variety, price,
dense_rank() over (partition by type) order by price as rnk
from fruits
) t
where rnk <= 5;
Windows functions only work on SQL Server 2012 and above. Try this out:
SQL Server 2005 and Above Solution
DECLARE #yourTable TABLE(Category VARCHAR(50), SubCategory VARCHAR(50), price INT)
INSERT INTO #yourTable
VALUES ('Meat','Steak',1),
('Meat','Chicken Wings',3),
('Meat','Lamb Chops',5);
DECLARE #n INT = 2;
SELECT DISTINCT Category,CA.SubCategory,CA.price
FROM #yourTable A
CROSS APPLY
(
SELECT TOP (#n) SubCategory,price
FROM #yourTable B
WHERE A.Category = B.Category
ORDER BY price DESC
) CA
Results in two highest priced subCategories per Category:
Category SubCategory price
------------------------- ------------------------- -----------
Meat Chicken Wings 3
Meat Lamb Chops 5

Top 10% of sum() Postgres

I'm looking to pull the top 10% of a summed value on a Postgres sever.
So i'm summing a value with sum(transaction.value) and i'd like the top 10% of the value
From what I gather in your comments, I assume you want to:
Sum transactions per customer to get a total per customer.
List the top 10 % of customers who actually have transactions and spent the most.
WITH cte AS (
SELECT t.customer_id, sum(t.value) AS sum_value
FROM transaction t
GROUP BY 1
)
SELECT *, rank() OVER (ORDER BY sum_value DESC) AS sails_rank
FROM cte
ORDER BY sum_value DESC
LIMIT (SELECT count(*)/10 FROM cte)
Major points
Best to use a CTE here, makes the count cheaper.
The JOIN between customer and transaction automatically excludes customers without transaction. I am assuming relational integrity here (fk constraint on customer_id).
Dividing bigint / int effectively truncates the result (round down to the nearest integer). You may be interested in this related question:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
I added a sails_rank column which you didn't ask for, but seems to fit your requirement.
As you can see, I didn't even include the table customer in the query. Assuming you have a foreign key constraint on customer_id, that would be redundant (and slower). If you want additional columns from customer in the result, join customer to the result of above query:
WITH cte AS (
SELECT t.customer_id, sum(t.value) AS sum_value
FROM transaction t
GROUP BY 1
)
SELECT c.customer_id, c.name, sub.sum_value, sub.sails_rank
FROM (
SELECT *, rank() OVER (ORDER BY sum_value DESC) AS sails_rank
FROM cte
ORDER BY sum_value DESC
LIMIT (SELECT count(*)/10 FROM cte)
) sub
JOIN customer c USING (customer_id);

SQL. Is there any efficient way to find second lowest value?

I have the following table:
ItemID Price
1 10
2 20
3 12
4 10
5 11
I need to find the second lowest price. So far, I have a query that works, but i am not sure it is the most efficient query:
select min(price)
from table
where itemid not in
(select itemid
from table
where price=
(select min(price)
from table));
What if I have to find third OR fourth minimum price? I am not even mentioning other attributes and conditions... Is there any more efficient way to do this?
PS: note that minimum is not a unique value. For example, items 1 and 4 are both minimums. Simple ordering won't do.
SELECT MIN( price )
FROM table
WHERE price > ( SELECT MIN( price )
FROM table )
select price from table where price in (
select
distinct price
from
(select t.price,rownumber() over () as rownum from table t) as x
where x.rownum = 2 --or 3, 4, 5, etc
)
Not sure if this would be the fastest, but it would make it easier to select the second, third, etc... Just change the TOP value.
UPDATED
SELECT MIN(price)
FROM table
WHERE price NOT IN (SELECT DISTINCT TOP 1 price FROM table ORDER BY price)
To find out second minimum salary of an employee, you can use following:
select min(salary)
from table
where salary > (select min(salary) from table);
This is a good answer:
SELECT MIN( price )
FROM table
WHERE price > ( SELECT MIN( price )
FROM table )
Make sure when you do this that there is only 1 row in the subquery! (the part in brackets at the end).
For example if you want to use GROUP BY you will have to define even further using:
SELECT MIN( price )
FROM table te1
WHERE price > ( SELECT MIN( price )
FROM table te2 WHERE te1.brand = te2.brand)
GROUP BY brand
Because GROUP BY will give you multiple rows, otherwise you will get the error:
SQL Error [21000]: ERROR: more than one row returned by a subquery used as an expression
I guess a simplest way to do is using offset-fetch filter from standard sql, distinct is not necessary if you don't have repeat values in your column.
select distinct(price) from table
order by price
offset 1 row fetch first 1 row only;
no need to write complex subqueries....
In amazon redshift use limit-fetch instead for ex...
Select distinct(price) from table
order by price
limit 1
offset 1;
You can either use one of the following:-
select min(your_field) from your_table where your_field NOT IN (select distinct TOP 1 your_field from your_table ORDER BY your_field DESC)
OR
select top 1 ColumnName from TableName where ColumnName not in (select top 1 ColumnName from TableName order by ColumnName asc)
I think you can find the second minimum using LIMIT and ORDER BY
select max(price) as minimum from (select distinct(price) from tableName order by price asc limit 2 ) --or 3, 4, 5, etc
if you want to find third or fourth minimum and so on... you can find out by changing minimum number in limit. you can find using this statement.
You can use RANK functions,
it may seem complex query but similar results like other answers can be achieved with the same,
WITH Temp_table AS (SELECT ITEM_ID,PRICE,RANK() OVER (ORDER BY PRICE) AS
Rnk
FROM YOUR_TABLE_NAME)
SELECT ITEM_ID FROM Temp_table
WHERE Rnk=2;
Maybe u can check the min value first and then place a not or greater than the operator. This will eliminate the usage of a subquery but will require a two-step process
select min(price)
from table
where min(price) <> -- "the min price you previously got"

Join to replace sub-query

I am almost a novie in database queries.
However,I do understand why and how correlated subqueries are expensive and best avoided.
Given the following simple example - could someone help replacing with a join to help understand how it scores better:
SQL> select
2 book_key,
3 store_key,
4 quantity
5 from
6 sales s
7 where
8 quantity < (select max(quantity)
9 from sales
10 where book_key = s.book_key);
Apart from join,what other option do we have to avoid the subquery.
In this case, it ought to be better to use a windowed-function on a single access to the table - like so:
with s as
(select book_key,
store_key,
quantity,
max(quantity) over (partition by book_key) mq
from sales)
select book_key, store_key, quantity
from s
where quantity < s.mq
Using Common Table Expressions (CTE) will allow you to execute a single primary SELECT statement and store the result in a temporary result set. The data can then be self-referenced and accessed multiple times without requiring the initial SELECT statement to be executed again and won't require possibly expensive JOINs. This solution also uses ROW_NUMBER() and the OVER clause to number the matching BOOK_KEYs in descending order based off of the quantity. You will then only include the records that have a quantity that is less than the max quantity for each BOOK_KEY.
with CTE as
(
select
book_key,
store_key,
quantity,
row_number() over(partition by book_key order by quantity desc) rn
from sales
)
select
book_key,
store_key,
quantity
from CTE where rn > 1;
Working Demo: http://sqlfiddle.com/#!3/f0051/1
Apart from join,what other option do we have to avoid the subquery.
You use something like this:
SELECT select max(quantity)
INTO #myvar
from sales
where book_key = s.book_key
select book_key,store_key,quantity
from sales s
where quantity < #myvar

Group by every N records in T-SQL

I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) and then aggregate results with AVG.
I'm actually looking for a standard SQL solution, however any T-SQL specific results are also appreciated.
The query looks like this:
SELECT TestId,Throughput FROM dbo.Results ORDER BY id
WITH T AS (
SELECT RANK() OVER (ORDER BY ID) Rank,
P.Field1, P.Field2, P.Value1, ...
FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;
Something like that should get you started. If you can provide your actual schema I can update as appropriate.
Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.
WITH T AS (
SELECT RANK() OVER (ORDER BY sID) Rank, sID
FROM docSVsys
)
SELECT (Rank-1) / 1000 GroupID, count(sID)
FROM T
GROUP BY ((Rank-1) / 1000)
order by GroupID
I +1'd #Yuck, because I think that is a good answer. But it's worth mentioning NTILE().
Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.
If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.
NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.
So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.
You could get your group-size simply by
DECLARE #ntile int
SET #ntile = (SELECT count(1) from myTable) / 1000
And then modifying #Yuck's approach with the NTILE() substitution:
;WITH myCTE AS (
SELECT NTILE(#ntile) OVER (ORDER BY id) myGroup,
col1, col2, ...
FROM dbo.myTable
)
SELECT myGroup, col1, col2...
FROM myCTE
GROUP BY (myGroup), col1, col2...
;
Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
FROM T
And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:
TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID
without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.
This is BigQuery syntax. T-SQL might be slightly different.
Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
, COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
FROM T
ORDER BY CountInGroup
You can also use Row_Number() instead of rank. No Floor required.
declare #groupsize int = 50
;with ct1 as ( select YourColumn, RowID = Row_Number() over(order by YourColumn)
from YourTable
)
select YourColumn, RowID, GroupID = (RowID-1)/#GroupSize + 1
from ct1
I read more about NTILE after reading #user15481328 answer
(resource: https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-ntile-function/ )
and this solution allowed me to find the max date within each of the 25 groups of my data set:
with cte as (
select date,
NTILE(25) OVER ( order by date ) bucket_num
from mybigdataset
)
select max(date), bucket_num
from cte
group by bucket_num
order by bucket_num