Select unique record based on column value priority - sql

This is a continuation of my previous question here.
In the following example:
id PRODUCT ID COLOUR
1 1001 GREEN
2 1002 GREEN
3 1002 RED
4 1003 RED
Given a product ID, I want to retrieve only one record - that with GREEN colour, if one exists, or the RED one otherwise. It sounds like I need to employ DISTINCT somehow, but I don't know how to supply the priority rule.
Pretty basic I'm sure, but my SQL skills are more than rusty..
Edit: Thank you everybody. One more question please: how can this be made to work with multiple records, ie. if the WHERE clause returns more than just one record? The LIMIT 1 would limit across the entire set, while what I'd want would be to limit just within each product.
For example, if I had something like SELECT * FROM table WHERE productID LIKE "1%" ... how can I retrieve each unique product, but still respecting the colour priority (GREEN>RED)?

try this:
SELECT top 1 *
FROM <table>
WHERE ProductID = <id>
ORDER BY case when colour ='GREEN' then 1
when colour ='RED' then 2 end
If you want to order it based on another color, you can give it in the case statement

SELECT *
FROM yourtable
WHERE ProductID = (your id)
ORDER BY colour
LIMIT 1
(Green will come before Red, you see. The LIMIT clause returns only one record)
For your subsequent edit, you can do this
select yourtable.*
from
yourtable
inner join
(select productid, min(colour) mincolour
from yourtable
where productid like '10%'
group by productid) v
on yourtable.productid=v.productid
and yourtable.colour=v.mincolour

Related

How do I SELECT minimum set of rows to cover all possible values of each columns in SQL?

I am running a SQL query to get data from a table to map all different possible values of all categories represented by each columns.
How do I run the SELECT query such that it returns the minimum number of rows just enough to include all possible values of all columns?
For example, if I have a table of 10 rows and 3 columns, each column containing 3 possible values:
TABLE sales
--------------------------------
brandID color size
--------------------------------
2 red big
3 blue big
2 blue big
2 red small
2 blue medium
3 green small
3 red big
1 green medium
2 red medium
2 blue big
Of course I could SELECT all rows from table without filter, but that would be an expensive query of 10 rows.
However, as you can see, if we filter the SELECT query to only return the following rows below, it is possible to cover all the possible values of all columns:
1,2,3 for brandID
red,blue,green for color
big,small,medium for size
--------------------------------
brandID color size
--------------------------------
3 blue big
2 red small
1 green medium
How do I do that in SQL query?
This one does what you expect:
select b.brandid, c.color, s.size
from (
select brandid, row_number() over (order by brandid) as rn
from sales
group by brandid
) b
full join (
select color, row_number() over (order by color) as rn
from sales
group by color
) c on b.rn = c.rn
full join (
select size, row_number() over (order by size) as rn
from sales
group by size
) s on b.rn = s.rn;
Online example: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e72e7d1dfed43825025c5703b5d3671a
But this only works properly, if you have the same number of (distinct) brands, colors and sizes. If you have e.g. 5 brands, 6 colors and 7 sizes the result is rather "strange":
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4417a4d97ecf7601364f09d65f6522fa
First, a query that returns ten rows is not "expensive".
Second, this is a very hard problem. It involves looking at all combinations of rows to see if the set has all combinations of columns. I suspect that any algorithm will need to basically search through all possible combinations -- although there may be some efficiencies, such as automatically including all rows with a unique value in any column.
As a hard problem involving comparing zillions of sets, SQL is not really an appropriate language for addressing the issue.
This is a rather weird requirement... But you might try something along this:
DECLARE #sales TABLE(BrandID INT, color VARCHAR(10),size VARCHAR(10));
INSERT INTO #sales VALUES
(2,'red', 'big'),
(3,'blue', 'big'),
(2,'blue', 'big'),
(2,'red', 'small'),
(2,'blue', 'medium'),
(3,'green', 'small'),
(3,'red', 'big'),
(1,'green', 'medium'),
(2,'red', 'medium'),
(2,'blue', 'big');
WITH AllBrands AS (SELECT ROW_NUMBER() OVER(ORDER BY BrandID) AS RowInx, BrandID FROM #sales GROUP BY BrandID)
,AllColors AS (SELECT ROW_NUMBER() OVER(ORDER BY color) AS RowInx, color FROM #sales GROUP BY color)
,AllSizes AS (SELECT ROW_NUMBER() OVER(ORDER BY size) AS RowInx, size FROM #sales GROUP BY size)
SELECT COALESCE(b.RowInx,c.RowInx,s.RowInx) AS RowInx
,b.BrandID
,c.color
,s.size
FROM AllBrands b
FULL OUTER JOIN AllColors c ON COALESCE(b.RowInx,c.RowInx)=c.RowInx
FULL OUTER JOIN AllSizes s ON COALESCE(b.RowInx,c.RowInx,s.RowInx)=s.RowInx;
This solution is similar to #a_horse_with_no_name's, but avoids gaps in the result in case of unequal counts of values per column.
The idea in short:
We create a numbered set of all distinct values per column and join all sets on this number. As we don't know the count in advance I use COALESCE to pick the first value, which is not null.
This is not a good problem if you demand ONE AND ONLY ONE query and ONE AND ONLY ONE of each result set, and ONE AND ONLY ONE instance of each result. As Gordon Linoff accurately put: that is not a problem for SQL.I get that maybe you have a MUCH larger table, but he's absolutely right.
But add another layer, and you can have exactly what you want, with all the efficiency you want, and a readable output. Use a cursor and some basic SELECT from dynamic SQL with a SELECT columns.name from sys.tables JOIN sys.columns ON tables.object_id = columns.object_id, if you absolutely have to do this with TSQL alone.
And if you're willing to build a basic application with any framework with a SQL driver, you can just SELECT DISTINCT FROM < and put the various results into arrays.
Alternatively: reword your question, with the understanding that the results of any SQL query are gonna be x rows by x columns. Not an array for each column.
I think your example confuses things by having exactly 3 values for each field, which makes the requested result seem like a reasonable thing to expect. But what happens when two more brands are added, or a new colour? Then what would you expect to be returned?
Really you are asking three questions, so I feel this should be done as three queries:
"What are the different brands?"
"What are the different colours?"
"What are the different sizes?"
If they need to be displayed in a neat table, stitch them together afterwards in your application layer. You could maybe do it in the SQL with something like a_horse_with_no_name suggests, but really its the wrong place.

Custom Sortby , Order by issue

I have data in table -> bp like below
1 Vendor
2 Customer
3 Transporter
I want select * from bp order by row value 2,1,3, like this the result should be:
2 Customer
1 Vendor
3 Transporter
As the ordering isn't alphabetic or numeric, and appears somewhat arbitrary, then use a case statement. However this doesn't support growth and code would have to be changed anytime a new value is presented in col2. You'd be better off including an orderBy Column in the base table containing these values. and allowing a user to specify order for long term usability. Why tie a user into a specific order... Seems odd but this is the way to do it.
SELECT *
FROM bp
order by CASE when col2='Customer' then 1
when col2='Vendor' then 2
when col2='Transporter' then 3
else then 4 end;
Try This:-
Add One More Column SortBy and Add Values Like first digit shows one type of sort, second digit second type and third digit third type. Its a one and very simple way. If records are more then you can arrange it in other ways.
1 Vendor 132
2 Customer 213
3 Transporter 321
Vendor --> select * from bp order by substring(SortBy,1,1)
Customer --> select * from bp order by substring(SortBy,2,1)
Transporter --> select * from bp order by substring(SortBy,3,1)

SQL query interleaving two different statuses

I have a table testtable having fields
Id Name Status
1 John active
2 adam active
3 cristy incative
4 benjamin inactive
5 mathew active
6 thomas inactive
7 james active
I want a query that should dispaly the reuslt like
Id Name Status
1 John active
3 cristy incative
2 adam active
4 benjamin inactive
5 mathew active
6 thomas inactive
7 james active
my question is how to take records in the order of active status then inactive then active then inactive etc.. like that from this table.
This query sorts on interleaved active/inactive state:
SELECT [id],
[name],
[status]
FROM (
(
SELECT
Row_number() OVER(ORDER BY id) AS RowNo,
0 AS sorter,
[id],
[name],
[status]
FROM testtable
WHERE [status] = 'active'
)
UNION ALL
(
SELECT
Row_number() OVER(ORDER BY id) AS RowNo,
1 AS sorter,
[id],
[name],
[status]
FROM testtable
WHERE [status] = 'inactive'
)
) innerUnion
ORDER BY ( RowNo * 2 + sorter )
This approach uses an inner UNION on two SELECT statements, one which returns active rows, the other inactive rows. They both have a RowNumber generated, which is later multiplied by two to ensure it's always even. There's a sorter column that's just a bit field, and to ensure that a unique number is available for sorting: adding it to the RowNumber yields either an odd or even number depending on active/inactive state, hence allowing the results to be interleaved.
The SQL Fiddle link is here, to allow testing and manipulation:
http://sqlfiddle.com/#!3/8a8a1/11/0
In the absence of a specified DB system, I've assumed that SQL Server 2008 (or newer) is being used. An alternate row numbering system would be necessary on other DBMSes.
Finally i got the answer
SET #rank=0;
SET #rank1=0;
SELECT #rank:=#rank+1 AS rank,id,name,status FROM `testtablejohn` where status='E'
UNION
SELECT #rank1:=#rank1+1 AS rank,id,name,status FROM `testtablejohn` where status='D'
order by rank
Since you didn't post any example of what you tried so far, I will limit my answer to the general approach as well.
One approach could be to generate a row number for active rows and a row number for inactive rows. Start your numbering for active at 1 and use only odd numbers (that means increase your counter by 2 every time) and do the same thing with 2 and even numbers for the inactive rows. Put those two counters in the same column.
You will end up with a single column to easily sort on in your ORDER BY clause.
Here are some links that might be useful for you:
MySQL - Get row number on select
http://www.mysqltutorial.org/mysql-case-statement/
Just give it a go with those. If you can't make it work, then show us what you tried so far. Post some example code in the question and we might be able to guide you!

sql statement to select previous rows to a search param

Im after an sql statement (if it exists) or how to set up a method using several sql statements to achieve the following.
I have a listbox and a search text box.
in the search box, user would enter a surname e.g. smith.
i then want to query the database for the search with something like this :
select * FROM customer where surname LIKE searchparam
This would give me all the results for customers with surname containing : SMITH . Simple, right?
What i need to do is limit the results returned. This statement could give me 1000's of rows if the search param was just S.
What i want is the result, limited to the first 20 matches AND the 10 rows prior to the 1st match.
For example, SMI search:
Sives
Skimmings
Skinner
Skipper
Slater
Sloan
Slow
Small
Smallwood
Smetain
Smith ----------- This is the first match of my query. But i want the previous 10 and following 20.
Smith
Smith
Smith
Smith
Smoday
Smyth
Snedden
Snell
Snow
Sohn
Solis
Solomon
Solway
Sommer
Sommers
Soper
Sorace
Spears
Spedding
Is there anyway to do this?
As few sql statements as possible.
Reason? I am creating an app for users with slow internet connections.
I am using POSTGRESQL v9
Thanks
Andrew
WITH ranked AS (
SELECT *, ROW_NUMBER() over (ORDER BY surname) AS rowNumber FROM customer
)
SELECT ranked.*
FROM ranked, (SELECT MIN(rowNumber) target FROM ranked WHERE surname LIKE searchparam) found
WHERE ranked.rowNumber BETWEEN found.target - 10 AND found.target + 20
ORDER BY ranked.rowNumber
SQL Fiddle here. Note that the fiddle uses the example data, and I modified the range to 3 entries before and 6 entries past.
I'm assuming that you're looking for a general algorithm ...
It sounds like you're looking for a combination of finding the matches "greater than or equal to smith", and "less than smith".
For the former you'd order by surname and limit the result to 20, and for the latter you'd order by surname descending and limit to 10.
The two result sets can then be added together as arrays and reordered.
I think you need to use ROW_NUMBER() (see this link).
WITH cust1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY surname) as numRow FROM customer
)
SELECT c1.surname, c1.numRow, x.flag
FROM cust1 c1, (SELECT *,
case when numRow = (SELECT MIN(numRow) FROM cust1 WHERE surname='Smith') then 1 else 0 end as flag
FROM cust1) x
WHERE x.flag = 1 and c1.numRow BETWEEN x.numRow - 1 AND x.numRow + 1
ORDER BY c1.numRow
SQLFiddle here.
This works, but the flag finally isn't necessary and it would be a query like PinnyM posts.
A variation on #PinnyM's solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() over (ORDER BY surname) AS rowNumber
FROM customer
),
minrank AS (
SELECT
*,
MIN(CASE WHEN surname LIKE searchparam THEN rowNumber END) OVER () AS target
FROM ranked
)
SELECT
surname
FROM minrank
WHERE rowNumber BETWEEN target - 10 AND target + 20
;
Instead of two separate calls to the ranked CTE, one to get the first match's row number and the other to read the results from, another CTE is introduced to serve both purposes. Can't speak for PostgreSQL but in SQL Server this might result in a better execution plan for the query, although in either case the real efficiency would still need to be verified by proper testing.

Working with sets of rows in (My)SQL and comparing values

I am trying to figure out the SQL for doing some relatively simple operations on sets of records in a table but I am stuck. Consider a table with multiple rows per item, all identified by a common key.
For example:
serial model color
XX1 A blue
XX2 A blue
XX3 A green
XX5 B red
XX6 B blue
XX1 B blue
What I would for example want to do is:
Assuming that all model A rows must have the same color, find the rows which dont. (for example, XX3 is green).
Assuming that a given serial number can only point to a single type of model, find out the rows which that does not occur (for example XX1 points both to A and B)
These are all simple logically things to do. To abstract it, I want to know how to group things by using a single key (or combination of keys) and then compare the values of those records.
Should I use a join on the same table? should i use some sort of array or similar?
thanks for your help
For 1:
SELECT model, color, COUNT(*) AS num FROM yourTable GROUP BY model, color;
This will give you a list of each model and each color for that model along with the count. So the output from your dataset would be:
model color num
A blue 2
A green 1
B red 1
B blue 2
From this output you can easily see what's incorrect and fix it using an UPDATE statement or do a blanket operation where you assign the most popular color to each model.
For 2:
SELECT serial, COUNT(*) AS num FROM yourTable GROUP BY serial HAVING num > 1
The output for this would be:
serial num
XX1 2
To address #1, I would use a self-join (a join on the same table, as you put it).
For example,
select *
from mytable
where serial in (select serial
from mytable
group by model, color
having count(*) = 1)
would find all the serial numbers that only exist in one color. I did not test this, but I hope you see what it does. The inner select finds all the records that only occur once, then the outer select shows all detail for those serials.
Of course, having said that, this is a poor table design. But I don't think that was your question. And I hope this was a made up example for a real situation. My concern would be that there is no reason to assume that the single occurrence is actually bad -- it could be that there are 10 records, all of which have a distinct color. This approach would tell you that all of them are wrong, and you would be unable to decide which was correct.