Select every possible combination below a certain sum? - sql

I have a table called Item which has an ID, a Name and a Price.
Is it possible using SQL select statements to get every possible, but distinct combination of items below a specific price?
For example, assume this table:
ID Name Price
-- ---- -----
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
The query, taken the limit is for example 10, shall only return A, B, C, D, but not A, B, D, C additionally.
Is something like this even possible? Please excuse this probably stupid question, but I'm learning SQL for a year now, but our teacher hasn't even explained what SQL means. My entire knowledge is from books, so I'm not sure if this is a suitable question or not.

Well, since no inspiration was generated by Getting all possible combinations which obey certain condition with MS SQL, here's an adaptation of the code found there ;)
declare #data table (id int not null primary key, name varchar(40) not null, price money not null);
insert into #data (id, name, price) values (1, 'A', 1), (2, 'B', 2), (3, 'C', 3), (4, 'D', 4), (5, 'E', 5);
-- Replace #data with actual table name and delete the above
with
anchor as (
select id, name, price
from #data
),
cte as (
select
id as max_id,
price,
'|' + cast(id as varchar(max)) + '|' as level
from anchor
union all
select
a.id as max_id,
c.price + a.price,
c.level + '|' + cast(a.id as varchar(max)) + '|' as level
from
cte c
inner join anchor a on a.id > c.max_id and c.level not like '%|' + cast(a.id as varchar(max)) + '|%'
)
select level
from cte
where price <= 10
;

It's a good question, but as far as I know it's not possible with a set-based approach. It might be possible with some sort of CTE-based recursion, but if so, I can't think of how. It could be done with cursors, but not directly with "SQL select statements" (which I interpret to mean a set-based approach).
If you wanted to find all combinations of exactly two numbers that were less then x, you could cross join the table to itself (eliminating rows where the ID was the same) and sum the two prices, excluding sums that were less than x.

If you want combinations of 2 items, it is simple, just join the table with itself with no conditions (so you do the cartesian product) and then sum the price of the items in both tables (you can exclude the same item twice, if that is what you need).
IT is not possible to generalize for N items in SQL by definition since it is not possible in relational algebra. You would need a query with N lines with N being the number of records, so you would need to know a max number of records in advance.
Also, the combinatorial of N elements (if that is what you need) is O(N!) so it grows really fast in both size and possibilities, so if you have too many elements and a price large enough you would soon face the limits of any possible computer.

Related

Combine values into string inside recursive CTE

I am using a recursive CTE to essentially navigate down a tree structure of id values. At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far. At first glance it seems like I would need a group by or aggregate function (like string_agg()) to accomplish this, but these are not allowed in the recursive part of a recursive CTE. My question seems to be similar to that found here recursive CTE to combine values, but I am hoping for a slightly more straightforward answer that will relate to the data I have added a sample of below. This first table is essentially the top of the tree. There could be one or multiple nodes at the top depending on how many records are in this first table. Here there is just 1 record:
name_id
group_id
100
15
where name_id you could think of as the name of the tree, which is essentially irrelevant to this example since there is only one name_id, and group_id is the top of the tree which will then branch out based on the following table:
group_id
group_id_member
10
15
11
15
4
10
11
4
3
11
10
3
where the group_id_member is saying that, for example, the group_id from the first table, 15, is a member of the group_id 10 and also 11. So the tree would look like 15 at the top, with a branch down for 10 and another branch for 11. Then off of 10 comes a branch of 4, and off of 11 comes a branch of 3. Then off of 4 comes a branch of 11, and off of 3 comes a branch of 10. Thus, this is an infinite loop since those branches are already in the table.
Essentially, the goal of this recursive query is to return all the branch nodes but not return any of them more than once. This recursive cte will run forever unless we do a check, like checking whether or not the group_id value is already in rpath. To check that, I am wanting to create a column which is a list of all nodes that have gotten hit from current and previous iterations (rpath column), and append to that column at each iteration, making the result table look something like:
name_id
group_id
group_id_member
iter
rpath
100
15
null
1
/15/
100
10
15
2
/15/10/11/
100
11
15
2
/15/10/11/
100
4
10
3
/15/10/11/4/3/
100
3
11
3
/15/10/11/4/3/
100
10
3
4
/15/10/11/4/3/10/11/
100
11
4
4
/15/10/11/4/3/10/11/
Where I included the 4th iteration in the table, which shows that the group_id values 10 and 11 are now in the rpath two times since this loop has been fully completed. Ultimately I would want to terminate this recursion at the 3rd iteration since at the 4th, for both group_id values, they appear in the rpath list already.
So my primary question is, what is the best way to create this rpath string inside the recursive part of the cte? See below for code to create the temp tables, and my code which attempts to solve this but is returning an error since I am trying to use string_agg() in the recursive part of the cte which is not allowed. Thanks!
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (group_id int, group_id_member int)
INSERT into #t1 VALUES
(10,15),
(11, 15),
(4, 10),
(11, 4),
(3, 11),
(10, 3);
if object_id('tempdb..#t2') is not null drop table #t2
CREATE TABLE #t2 (name_id int, group_id int)
INSERT into #t2 VALUES
(100, 15)
; with rec as (
select name_id,
group_id,
cast(null as int) as group_id_member,
1 as iter,
convert(varchar(128),concat('/', str_agg.agg, '/')) as rpath
from #t2
cross apply (select string_agg(group_id, '/') as agg from #t2) as str_agg -- this does work since it is not in the recursive part of the cte
union all
select rec.name_id,
t1.group_id,
t1.group_id_member,
iter + 1 as iter,
convert(varchar(128), concat(rec.rpath, str_agg1.agg1, '/')) as rpath -- trying to create the string that represents all of the nodes that have been visited
from #t1 t1
inner join rec
on t1.group_id_member = rec.group_id
cross apply (select string_agg(t1.group_id_member ,'/') as agg1 from #t1 t1 where t1.group_id_member = rec.group_id) str_agg1 -- cross apply to create the string_agg, if this was allowed in recursive cte...
where rec.rpath not like '%/' + convert(varchar(128), t1.group_id) + '/%'
)
select * from rec order by iter asc
At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far
As I understand your question, you don’t need aggregation here. The recursive query processes iteratively, so you can just accumulate the ids of the visited nodes along the way, using concat.
with rec as (
select
name_id,
group_id,
cast(null as int) as group_id_member,
1 as iter,
concat('/', convert(varchar(max), group_id)) as rpath
from #t2
union all
select
r.name_id,
t1.group_id,
t1.group_id_member,
iter + 1 as iter,
concat(r.rpath, '/', t1.group_id)
from #t1 t1
inner join rec r
on t1.group_id_member = r.group_id
where r.rpath not like concat('%/', t1.group_id, '/%')
)
select * from rec order by iter
Demo on DB Fiddle

Matching a set of child records between two similar table hierarchies

I have two similar table hierarchies:
Owner -> OwnerGroup -> Parent
and
Owner2 -> OwnerGroup2
I would like to determine if there is an exact match of Owners that exists in Owner2 based on a set of values. There are approximately a million rows in each Owner table. Some OwnerGroups contain up to 100 Owners.
So basically if there is an OwnerGroup than contains Owners "Smith", "John" and "Smith, "Jane", I want to know the id of the OwnerGroup2s that are exact matches.
The first attempt at this was to generate a join per Owner (which required dynamic sql being generated in the application:
select og.id
from owner_group2 og
-- dynamic bit starts here
join owner2 o1 on
(og.id = o1.og_id) AND
(o1.given_names = 'JOHN' and o1.surname='SMITH')
-- dynamic bit ends here
join owner2 o2 on
(og.id = o2.og_id) AND
(o2.given_names = 'JANE' and o2.surname='SMITH');
This works fine until for small numbers of owners, but when we have to deal with the 100 Owners in a group scenario as this query plan means there 100 nested loops and it takes almost a minute to run.
Another option I had was to use something around the intersect operator. E.g.
select * from (
select o.surname, o.given_names
from owner1 o1
join owner_group1 og1 on o1.og_id = og1.id
where
og1.parent_id = 1936233
)
intersect
select o.surname, o.given_names
from owner2 o2
join owner_group2 og2 on og2.id = o2.og_id;
I'm not sure how to suck out the owner2.id in this scenario either - and it was still running in the 4-5 second range.
I feel like I am missing something obvious - so please feel free to provide some better solutions!
You're on the right track with intersect, you just need to go a bit further. You need to join the results of it back to the owner_groups2 table to find the ids.
You can use the listagg function to convert the groups into comma-separated lists of the names (note - requires 11g). You can then take the intersection of these name lists to find the matches and join this back to the list in owner_groups2.
I've created a simplified example below, in it "Dave, Jill" is the group that is present in both tables.
create table grps (id integer, name varchar2(100));
create table grps2 (id integer, name varchar2(100));
insert into grps values (1, 'Dave');
insert into grps values(1, 'Jill');
insert into grps values (2, 'Barry');
insert into grps values(2, 'Jane');
insert into grps2 values(3, 'Dave');
insert into grps2 values(3, 'Jill');
insert into grps2 values(4, 'Barry');
with grp1 as (
SELECT id, listagg(name, ',') within group (order by name) n
FROM grps
group by id
), grp2 as (
SELECT id, listagg(name, ',') within group (order by name) n
FROM grps2
group by id
)
SELECT * FROM grp2
where n in (
-- find the duplicates
select n from grp1
intersect
select n from grp2
);
Note this will still require a full scan of owner_groups2; I can't think of a way you can avoid this. So your query is likely to remain slow.

Ordering parent rows by date descending with child rows ordered independently beneath each

This is a contrived version of my table schema to illustrate my problem:
QuoteID, Details, DateCreated, ModelQuoteID
Where QuoteID is the primary key and ModelQuoteID is a nullable foreign key back onto this table to represent a quote which has been modelled off another quote (and may have subsequently had its Details column etc changed).
I need to return a list of quotes ordered by DateCreated descending with the exception of modelled quotes, which should sit beneath their parent quote, ordered by date descending within any other sibling quotes (quotes can only be modelled one level deep).
So for example if I have these 4 quote rows:
1, 'Fix the roof', '01/01/2012', null
2, 'Clean the drains', '02/02/2012', null
3, 'Fix the roof and door', '03/03/2012', 1
4, 'Fix the roof, door and window', '04/04/2012', 1
5, 'Mow the lawn', '05/05/2012', null
Then I need to get the results back in this order:
5 - Mow the lawn
2 - Clean the drains
1 - Fix the roof
4 - -> Fix the roof, door and window
3 - -> Fix the roof and door
I'm also passing in search criteria such as keywords for Details, and I'm returning modelled quotes even if they don't contain the search term but their parent quote does. I've got that part working using a common table expression to get the original quotes, unioned with a join for modelled ones.
That works nicely but currently I'm having to do the rearrangement of the modelled quotes into the correct order in code. That's not ideal because my next step is to implement paging in the SQL, and if the rows are not grouped properly at that time then I won't have the children present in the current page to do the re-ordering in code. Generally speaking they will be naturally grouped together anyway, but not always. You could create a model quote today for a quote from a month back.
I've spent quite some time on this, can any SQL gurus help? Much appreciated.
EDIT: Here is a contrived version of my SQL to fit my contrived example :-)
;with originals as (
select
q.*
from
Quote q
where
Details like #details
)
select
*
from
(
select
o.*
from
originals o
union
select
q2.*
from
Quote q2
join
originals o on q2.ModelQuoteID = o.QuoteID
)
as combined
order by
combined.CreatedDate desc
Watching the Olympics -- just skimmed your post -- looks like you want to control the sort at each level (root and one level in), and make sure the data is returned with the children directly beneath its parent (so you can page the data...). We do this all the time. You can add an order by to each inner query and create a sort column. I contrived a slightly different example that should be easy for you to apply to your circumstance. I sorted the root ascending and level one descending just to illustrate how you can control each part.
declare #tbl table (id int, parent int, name varchar(10))
insert into #tbl (id, parent, name)
values (1, null, 'def'), (2, 1, 'this'), (3, 1, 'is'), (4, 1, 'a'), (5, 1, 'test'),
(6, null, 'abc'), (7, 6, 'this'), (8, 6, 'is'), (9, 6, 'another'), (10, 6, 'test')
;with cte (id, parent, name, sort) as (
select id, parent, name, cast(right('0000' + cast(row_number() over (order by name) as varchar(4)), 4) as varchar(1024))
from #tbl
where parent is null
union all
select t.id, t.parent, t.name, cast(cte.sort + right('0000' + cast(row_number() over (order by t.name desc) as varchar(4)), 4) as varchar(1024))
from #tbl t inner join cte on t.parent = cte.id
)
select * from cte
order by sort
This produces these results:
id parent name sort
---- -------- ------- ----------
6 NULL abc 0001
7 6 this 00010001
10 6 test 00010002
8 6 is 00010003
9 6 another 00010004
1 NULL def 0002
2 1 this 00020001
5 1 test 00020002
3 1 is 00020003
4 1 a 00020004
You can see that the root nodes are sorted ascending and the inner nodes are sorted descending.

How to select only one full row per group in a "group by" query?

In SQL Server, I have a table where a column A stores some data. This data can contain duplicates (ie. two or more rows will have the same value for the column A).
I can easily find the duplicates by doing:
select A, count(A) as CountDuplicates
from TableName
group by A having (count(A) > 1)
Now, I want to retrieve the values of other columns, let's say B and C. Of course, those B and C values can be different even for the rows sharing the same A value, but it doesn't matter for me. I just want any B value and any C one, the first, the last or the random one.
If I had a small table and one or two columns to retrieve, I would do something like:
select A, count(A) as CountDuplicates, (
select top 1 child.B from TableName as child where child.A = base.A) as B
)
from TableName as base group by A having (count(A) > 1)
The problem is that I have much more rows to get, and the table is quite big, so having several children selects will have a high performance cost.
So, is there a less ugly pure SQL solution to do this?
Not sure if my question is clear enough, so I give an example based on AdventureWorks database. Let's say I want to list available States, and for each State, get its code, a city (any city) and an address (any address). The easiest, and the most inefficient way to do it would be:
var q = from c in data.StateProvinces select new { c.StateProvinceCode, c.Addresses.First().City, c.Addresses.First().AddressLine1 };
in LINQ-to-SQL and will do two selects for each of 181 States, so 363 selects. I my case, I am searching for a way to have a maximum of 182 selects.
The ROW_NUMBER function in a CTE is the way to do this. For example:
DECLARE #mytab TABLE (A INT, B INT, C INT)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
FROM #mytab AS m
)
SELECT *
FROM numbered
WHERE rn=1
As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.
The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.
In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.
It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?
If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.
select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base
group by A
having (count(A) > 1)
you can do some thing like this if you have id as primary key in your table
select id,b,c from tablename
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1)
)d on tablename.id= d.id

Determine the hierarchy of records in a SQL database

I've got a problem I was wondering if there's an elegant solution to. It is a real business problem and not a class assignment!
I have a table with thousands of records, some of which are groups related to each other.
The database is SQL 2005.
ID is the primary key. If the record replaced an earlier record, the ID of that record is in the REP_ID column.
ID REP_ID
E D
D B
C B
B A
A NULL
So in this example, A was the original row, B replaced A, C replaced B unsuccessfully, D replaced B successfully and finally E replaced D.
I'd like to be able to display all the records in this table in a grid.
Then, I'd like for the user to be able to right click any record in any
group, and for the system to locate all the related records and display them
in a some sort of tree.
Now I can obviously brute force a solution to this but I'd like to ask the
community if they can see a more elegant answer.
It's a recursive CTE you need, something like (untested)
;WITH myCTE AS
(
SELECT
ID
FROM
myTable
WHERE
REP_ID IS NULL
UNION ALL
SELECT
ID
FROM
myTable T
JOIN
myCTE C ON T.REP_ID = C.ID
)
SELECT
*
FROM
myCTE
However, the links C->B and D->B
So you want the C->B or both?
Do you want a ranking?
etc?
Use a CTE to build your hierarchy. Something like
CREATE TABLE #test(ID CHAR(1), REP_ID CHAR(1) NULL)
INSERT INTO #test VALUES('E','D')
INSERT INTO #test VALUES('D','B')
INSERT INTO #test VALUES('C','B')
INSERT INTO #test VALUES('B','A')
INSERT INTO #test VALUES('A',NULL)
WITH tree( ID,
REP_ID,
Depth
)
AS
(
SELECT
ID,
REP_ID,
1 AS [Depth]
FROM
#test
WHERE
REP_ID IS NULL
UNION ALL
SELECT
[test].ID,
[test].REP_ID,
tree.[Depth] + 1 AS [Depth]
FROM
#test [test]
INNER JOIN
tree
ON
[test].REP_ID = tree.ID
)
SELECT * FROM tree
You probably already considered it but have you looked into simply adding a row to store the "original_id"? That'd make your queries lightning fast compared to building a tree of who inherited from whom.
Barring that, just google for "SQL tree DFS".
Just make sure you have an optimization for your DFS as follows: if you know most records only have <=3 revisions, you can start with a 3-way joint to find A, B and C right away.