How to select group of distinct rows from the table - sql

Let's assume, a table has the following rows
ID Name Value
1 Apple Red
1 Taste Sour
2 Apple Yellow
2 Taste Sweet
3 Apple Red
3 Taste Sour
4 Apple Green
4 Taste Tart
5 Apple Yellow
5 Taste Sweet
I wonder, how can I select ID's corresponding to distinct combination of Apple and Taste? For example, ID=1 corresponds to red sour apple and ID=3 can be omitted in the query result. Similarly, ID=2 is for yellow sweet apple and ID=5 can be excluded from the query result, etc. A valid query result can be any of the following ID sets: (1,2,4), (1,4,5), (2,3,4) etc.

The query or the model could be improved with more understanding of the problem.
But assuming the model is correct and the problem is presented as this, this would be my quick approach.
SELECT MIN(a.ID) as ID
FROM Table a
INNER JOIN Table b ON a.ID = b.ID AND a.Name > b.Name
GROUP BY a.Value, b.Value
This query is joining the table with itself using the ID. But because you would have four lines for each possible combination (Ex.: Apple-Apple, Taste-Taste, Apple-Taste and Taste-Apple), you need to state not only that they are different (Because you would still have Apple-Taste and Taste-Apple) but that one of them is bigger than the other (That way you choose to have Apples on one side of the join and Tastes in the other). That's why there is the a.Name > b.Name.
You then group by both the values, stating that you don't want to have more than one combination of Apple values and Taste values. Resulting in only three lines.
The Select I think it depends of the RDBMS (I used SQL Server syntax), and it's selecting the lowest ID. You don't care, so you could choose Min or Max. Min results in lines with 1,2,4. Max would result in 3,4,5.

Related

Getting ranges of arbitrary strings in SQL based on sequence dictated in a separate table

Consider the following dataset (may look weird but want to land my point that the strings are arbitrary):
Table A
TicketId
StartAnimal
EndAnimal
1
Monkey
Bee
1
Lion
Buffalo
Table B
Animal
Sequence
Monkey
1
Zebra
2
Bee
3
Turtle
4
Lion
5
Buffalo
6
Is it possible to retrieve the animals that correspond to Ticket ID 1 based on the different "ranges" in each of its rows? For example,for Ticket ID 1 the following animals should be retrieved: Monkey, Zebra, Bee, Lion, Buffalo.
As you can see the animal strings themselves have no order logic to it, but the sequence can be leveraged for it. I'm just failing to come up with how to reference it for each row in a single query.
Edit
As an edge case, sometimes the EndAnimal might not even have a sequence to start with, in which case only the StartAnimal should be returned. As an example, assuming Bee is not in the sequence table, we should only get Monkey, Lion and Buffalo. Is that something SQL can handle?
Thanks!
There are numerous ways, one such way is to inner join the tables to find the corresponding start and end sequences and then find those rows that qualify:
with s as (
select bs.Sequence s1, IsNull(be.sequence,1) s2, a.ticketId
from a left join b bs on bs.animal = a.StartAnimal
left join b be on be.Animal = a.EndAnimal
)
select b.Animal
from b
join s on b.Sequence >=s1 and b.Sequence <= s2
where s.ticketId = 1
order by b.Sequence;
Example Fiddle

SQL get first reord in each of a list of records

HELP! Kind of new to SQL. I've been working with simple statements for a few years but I need a little advanced help. I know it can be done and will save me time.
Here is my example to try to find results:
select top 1 apples, color from fruits
where apples in ('gala', 'fuji', 'granny')
and (inStock is not null and inStock <> '')
In the above query I would get the first color in 'gala' apples and thats it. What I want is the first color in 'gala', the first in 'fuji', first in 'granny' and so on.
InStock isn't as important - it's just an additional filter in the search results.
What I want is a two column list. Left Column being apple types and right column being the first color result for each apple type.
You can use row_number() window ranking function to serialize apples wise colors in a specific order. Then choose first one from each group by selecting first rows.
with cte as
(
select apples, color ,row_number()over(partition by apples order by apples) rn from fruits
where apples in ('gala', 'fuji', 'granny')
and (inStock is not null and inStock <> '')
)
select apples, color from cte where rn=1
I think one issue you might have here is the concept of "first". A color is a categorical variable and tables don't typically attach meaning to a "first" or "last" value with a few exceptions. If you're dead set on returning the first row for each fruit, one easy way to get the result utilizes union all.
SELECT top 1 apples, color from fruits where apples = 'gala'
UNION ALL
SELECT top 1 apples, color from fruits where apples = 'fuji'
UNION ALL
SELECT top 1 apples, color from fruits where apples = 'granny'

How do I SELECT minimum set of rows to cover all possible values of each columns in SQL?

I am running a SQL query to get data from a table to map all different possible values of all categories represented by each columns.
How do I run the SELECT query such that it returns the minimum number of rows just enough to include all possible values of all columns?
For example, if I have a table of 10 rows and 3 columns, each column containing 3 possible values:
TABLE sales
--------------------------------
brandID color size
--------------------------------
2 red big
3 blue big
2 blue big
2 red small
2 blue medium
3 green small
3 red big
1 green medium
2 red medium
2 blue big
Of course I could SELECT all rows from table without filter, but that would be an expensive query of 10 rows.
However, as you can see, if we filter the SELECT query to only return the following rows below, it is possible to cover all the possible values of all columns:
1,2,3 for brandID
red,blue,green for color
big,small,medium for size
--------------------------------
brandID color size
--------------------------------
3 blue big
2 red small
1 green medium
How do I do that in SQL query?
This one does what you expect:
select b.brandid, c.color, s.size
from (
select brandid, row_number() over (order by brandid) as rn
from sales
group by brandid
) b
full join (
select color, row_number() over (order by color) as rn
from sales
group by color
) c on b.rn = c.rn
full join (
select size, row_number() over (order by size) as rn
from sales
group by size
) s on b.rn = s.rn;
Online example: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e72e7d1dfed43825025c5703b5d3671a
But this only works properly, if you have the same number of (distinct) brands, colors and sizes. If you have e.g. 5 brands, 6 colors and 7 sizes the result is rather "strange":
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4417a4d97ecf7601364f09d65f6522fa
First, a query that returns ten rows is not "expensive".
Second, this is a very hard problem. It involves looking at all combinations of rows to see if the set has all combinations of columns. I suspect that any algorithm will need to basically search through all possible combinations -- although there may be some efficiencies, such as automatically including all rows with a unique value in any column.
As a hard problem involving comparing zillions of sets, SQL is not really an appropriate language for addressing the issue.
This is a rather weird requirement... But you might try something along this:
DECLARE #sales TABLE(BrandID INT, color VARCHAR(10),size VARCHAR(10));
INSERT INTO #sales VALUES
(2,'red', 'big'),
(3,'blue', 'big'),
(2,'blue', 'big'),
(2,'red', 'small'),
(2,'blue', 'medium'),
(3,'green', 'small'),
(3,'red', 'big'),
(1,'green', 'medium'),
(2,'red', 'medium'),
(2,'blue', 'big');
WITH AllBrands AS (SELECT ROW_NUMBER() OVER(ORDER BY BrandID) AS RowInx, BrandID FROM #sales GROUP BY BrandID)
,AllColors AS (SELECT ROW_NUMBER() OVER(ORDER BY color) AS RowInx, color FROM #sales GROUP BY color)
,AllSizes AS (SELECT ROW_NUMBER() OVER(ORDER BY size) AS RowInx, size FROM #sales GROUP BY size)
SELECT COALESCE(b.RowInx,c.RowInx,s.RowInx) AS RowInx
,b.BrandID
,c.color
,s.size
FROM AllBrands b
FULL OUTER JOIN AllColors c ON COALESCE(b.RowInx,c.RowInx)=c.RowInx
FULL OUTER JOIN AllSizes s ON COALESCE(b.RowInx,c.RowInx,s.RowInx)=s.RowInx;
This solution is similar to #a_horse_with_no_name's, but avoids gaps in the result in case of unequal counts of values per column.
The idea in short:
We create a numbered set of all distinct values per column and join all sets on this number. As we don't know the count in advance I use COALESCE to pick the first value, which is not null.
This is not a good problem if you demand ONE AND ONLY ONE query and ONE AND ONLY ONE of each result set, and ONE AND ONLY ONE instance of each result. As Gordon Linoff accurately put: that is not a problem for SQL.I get that maybe you have a MUCH larger table, but he's absolutely right.
But add another layer, and you can have exactly what you want, with all the efficiency you want, and a readable output. Use a cursor and some basic SELECT from dynamic SQL with a SELECT columns.name from sys.tables JOIN sys.columns ON tables.object_id = columns.object_id, if you absolutely have to do this with TSQL alone.
And if you're willing to build a basic application with any framework with a SQL driver, you can just SELECT DISTINCT FROM < and put the various results into arrays.
Alternatively: reword your question, with the understanding that the results of any SQL query are gonna be x rows by x columns. Not an array for each column.
I think your example confuses things by having exactly 3 values for each field, which makes the requested result seem like a reasonable thing to expect. But what happens when two more brands are added, or a new colour? Then what would you expect to be returned?
Really you are asking three questions, so I feel this should be done as three queries:
"What are the different brands?"
"What are the different colours?"
"What are the different sizes?"
If they need to be displayed in a neat table, stitch them together afterwards in your application layer. You could maybe do it in the SQL with something like a_horse_with_no_name suggests, but really its the wrong place.

Without using conjunctions in conditions of selection operators

Let's say there is a table call ITEM and it contains 3 attributes(name, id, price):
name id price
Apple 1 3
Orange 1 3
Banana 2 4
Cherry 3 5
Mango 1 3
How should I write a query to use a constants selection operator to select those item that have same prices and same ids ? The first thing come into my mind is use a rename operator to rename id to id', and price to price', then union it with the ITEM table, but since I need to select 2 tuples (price=price' & id=id') from the table, how can I select them without using the conjunctions operator in relational algebra ?
Thank you.
I'm not quite sure but for me, it would be something like this in relational calculus:
and then in SQL:
SELECT name FROM ITEM i WHERE
EXISTS ITEM u
AND u.name != i.name
AND u.price=i.price
AND u.id = i.id
But still, I think your assumption is right, you can still do it by renaming. I do believe it is a bit longer than what I did above.

Working with sets of rows in (My)SQL and comparing values

I am trying to figure out the SQL for doing some relatively simple operations on sets of records in a table but I am stuck. Consider a table with multiple rows per item, all identified by a common key.
For example:
serial model color
XX1 A blue
XX2 A blue
XX3 A green
XX5 B red
XX6 B blue
XX1 B blue
What I would for example want to do is:
Assuming that all model A rows must have the same color, find the rows which dont. (for example, XX3 is green).
Assuming that a given serial number can only point to a single type of model, find out the rows which that does not occur (for example XX1 points both to A and B)
These are all simple logically things to do. To abstract it, I want to know how to group things by using a single key (or combination of keys) and then compare the values of those records.
Should I use a join on the same table? should i use some sort of array or similar?
thanks for your help
For 1:
SELECT model, color, COUNT(*) AS num FROM yourTable GROUP BY model, color;
This will give you a list of each model and each color for that model along with the count. So the output from your dataset would be:
model color num
A blue 2
A green 1
B red 1
B blue 2
From this output you can easily see what's incorrect and fix it using an UPDATE statement or do a blanket operation where you assign the most popular color to each model.
For 2:
SELECT serial, COUNT(*) AS num FROM yourTable GROUP BY serial HAVING num > 1
The output for this would be:
serial num
XX1 2
To address #1, I would use a self-join (a join on the same table, as you put it).
For example,
select *
from mytable
where serial in (select serial
from mytable
group by model, color
having count(*) = 1)
would find all the serial numbers that only exist in one color. I did not test this, but I hope you see what it does. The inner select finds all the records that only occur once, then the outer select shows all detail for those serials.
Of course, having said that, this is a poor table design. But I don't think that was your question. And I hope this was a made up example for a real situation. My concern would be that there is no reason to assume that the single occurrence is actually bad -- it could be that there are 10 records, all of which have a distinct color. This approach would tell you that all of them are wrong, and you would be unable to decide which was correct.