Pivot-table style price matrix with Postgres - sql

How would you implement a price matrix like you see on airfare search websites using Postgres? Essentially, I would like to show the minimum prices for items by category. The items could be anything, but let's assume we're dealing with products that have various attributes, like color or size.
Our products table might look something like this:
id name price color size
1 Test 1 99 blue medium
2 Test 2 89 red small
3 Test 3 109 blue large
4 Test 4 79 blue small
On one axis we'd show the colors, and size on the other. The values in the table would show the minimum price for that color/size combo, so the cell at the intersection of blue and small would be 79.
How do you implement this, which looks a lot like a pivot table?

with t(id,name,price,color,size) as (
values (1,'Test 1',99,'blue','medium')
, (2,'Test 2',89,'red','medium')
, (3,'Test 3',109,'blue','large')
, (4,'Test 4',79,'blue','medium')
, (5,'Test 5',99,'blue','medium')
, (6,'Test 6',69,'blue','small')
, (7,'Test 7',107,'blue','large')
)
select DISTINCT on (color,size) id, color, size, price
from t
ORDER BY color,size,price

Related

How do I SELECT minimum set of rows to cover all possible values of each columns in SQL?

I am running a SQL query to get data from a table to map all different possible values of all categories represented by each columns.
How do I run the SELECT query such that it returns the minimum number of rows just enough to include all possible values of all columns?
For example, if I have a table of 10 rows and 3 columns, each column containing 3 possible values:
TABLE sales
--------------------------------
brandID color size
--------------------------------
2 red big
3 blue big
2 blue big
2 red small
2 blue medium
3 green small
3 red big
1 green medium
2 red medium
2 blue big
Of course I could SELECT all rows from table without filter, but that would be an expensive query of 10 rows.
However, as you can see, if we filter the SELECT query to only return the following rows below, it is possible to cover all the possible values of all columns:
1,2,3 for brandID
red,blue,green for color
big,small,medium for size
--------------------------------
brandID color size
--------------------------------
3 blue big
2 red small
1 green medium
How do I do that in SQL query?
This one does what you expect:
select b.brandid, c.color, s.size
from (
select brandid, row_number() over (order by brandid) as rn
from sales
group by brandid
) b
full join (
select color, row_number() over (order by color) as rn
from sales
group by color
) c on b.rn = c.rn
full join (
select size, row_number() over (order by size) as rn
from sales
group by size
) s on b.rn = s.rn;
Online example: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e72e7d1dfed43825025c5703b5d3671a
But this only works properly, if you have the same number of (distinct) brands, colors and sizes. If you have e.g. 5 brands, 6 colors and 7 sizes the result is rather "strange":
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4417a4d97ecf7601364f09d65f6522fa
First, a query that returns ten rows is not "expensive".
Second, this is a very hard problem. It involves looking at all combinations of rows to see if the set has all combinations of columns. I suspect that any algorithm will need to basically search through all possible combinations -- although there may be some efficiencies, such as automatically including all rows with a unique value in any column.
As a hard problem involving comparing zillions of sets, SQL is not really an appropriate language for addressing the issue.
This is a rather weird requirement... But you might try something along this:
DECLARE #sales TABLE(BrandID INT, color VARCHAR(10),size VARCHAR(10));
INSERT INTO #sales VALUES
(2,'red', 'big'),
(3,'blue', 'big'),
(2,'blue', 'big'),
(2,'red', 'small'),
(2,'blue', 'medium'),
(3,'green', 'small'),
(3,'red', 'big'),
(1,'green', 'medium'),
(2,'red', 'medium'),
(2,'blue', 'big');
WITH AllBrands AS (SELECT ROW_NUMBER() OVER(ORDER BY BrandID) AS RowInx, BrandID FROM #sales GROUP BY BrandID)
,AllColors AS (SELECT ROW_NUMBER() OVER(ORDER BY color) AS RowInx, color FROM #sales GROUP BY color)
,AllSizes AS (SELECT ROW_NUMBER() OVER(ORDER BY size) AS RowInx, size FROM #sales GROUP BY size)
SELECT COALESCE(b.RowInx,c.RowInx,s.RowInx) AS RowInx
,b.BrandID
,c.color
,s.size
FROM AllBrands b
FULL OUTER JOIN AllColors c ON COALESCE(b.RowInx,c.RowInx)=c.RowInx
FULL OUTER JOIN AllSizes s ON COALESCE(b.RowInx,c.RowInx,s.RowInx)=s.RowInx;
This solution is similar to #a_horse_with_no_name's, but avoids gaps in the result in case of unequal counts of values per column.
The idea in short:
We create a numbered set of all distinct values per column and join all sets on this number. As we don't know the count in advance I use COALESCE to pick the first value, which is not null.
This is not a good problem if you demand ONE AND ONLY ONE query and ONE AND ONLY ONE of each result set, and ONE AND ONLY ONE instance of each result. As Gordon Linoff accurately put: that is not a problem for SQL.I get that maybe you have a MUCH larger table, but he's absolutely right.
But add another layer, and you can have exactly what you want, with all the efficiency you want, and a readable output. Use a cursor and some basic SELECT from dynamic SQL with a SELECT columns.name from sys.tables JOIN sys.columns ON tables.object_id = columns.object_id, if you absolutely have to do this with TSQL alone.
And if you're willing to build a basic application with any framework with a SQL driver, you can just SELECT DISTINCT FROM < and put the various results into arrays.
Alternatively: reword your question, with the understanding that the results of any SQL query are gonna be x rows by x columns. Not an array for each column.
I think your example confuses things by having exactly 3 values for each field, which makes the requested result seem like a reasonable thing to expect. But what happens when two more brands are added, or a new colour? Then what would you expect to be returned?
Really you are asking three questions, so I feel this should be done as three queries:
"What are the different brands?"
"What are the different colours?"
"What are the different sizes?"
If they need to be displayed in a neat table, stitch them together afterwards in your application layer. You could maybe do it in the SQL with something like a_horse_with_no_name suggests, but really its the wrong place.

SQL - Select top item from a grouping of two columns

I have a list of numbers attached to two separate columns, and I want to just return the first "match" of the two columns to get that data. I got close with this answer, but it only works with one field. I need it to work with a combination of fields. About ten second before I was ready to post.
Here's an example table "Item":
Item Color Area
Boat Red 1
Boat Red 2
Boat Blue 4
Boat Blue 5
Car Red 3
Car Red 4
Car Blue 10
Car Blue 31
And the result set returned should be:
Item Color Area
Boat Red 1
Boat Blue 4
Car Red 3
Car Blue 10
A much simpler way to do this:
select Item,
Color,
min(Area) as Area
from Item
group by Item
Color
Just use the MIN function with a GROUP BY.
SELECT Item, Color, MIN(area) AS Area
FROM Item
GROUP BY Item, Color
Output:
Item Color Area
Boat Blue 4
Boat Red 1
Car Blue 10
Car Red 3
SQL Fiddle: http://sqlfiddle.com/#!9/46a154/1/0
SQL tables represent unordered sets. Hence, there is no "first" of anything without a column specifying the ordering.
For your example results, the simplest query is:
select item, color, min(area) as area
from item i
group by item, color;
About ten seconds before I was ready to post the question, I realized the answer:
WITH summary AS (
SELECT i.item + ':' + i.color,
a.area,
ROW_NUMBER() OVER(PARTITION BY i.item + ':' + i.color,
ORDER BY i.item + ':' + i.color DESC) AS rk
FROM Item i
group by (i.item + ':' + i.color, i.Area)
SELECT s.*
FROM summary s
WHERE s.rk = 1
It's as simple as combining the two composite key fields into one field and grouping based on that. This might be a bit hackish so if anyone wants to suggest a better option I'm all for it.

Sqlite query for getting index from sorted result

I need to make a sql query were in I have a table which consists below columns
id Name Color
1 Water red
5 Sun blue Light
7 Fire green
10 Wter red
21 Son blue Light
24 Fore green
So the requirement is I have a record say
5 Sun blue Light
Now I need to get the index of above record from the sorted result of the Name. Say below can be the select query.
SELECT * FROM MYTABLENAME WHERE COLOR LIKE “blue/%” ORDER BY Name ASC
Note:- I cannot load all the records in my memory and iterate as the records can be huge at times. So need to come up with a query that gives the exact indexof the record without loading the records.
Thanks In advance
If you haven't stored the sort result in a temporary table, the only way to do this is to count how many records would be sorted before this record:
SELECT COUNT(*)
FROM MYTABLENAME
WHERE COLOR LIKE “blue/%”
AND Name <= 'Sun'

Select unique record based on column value priority

This is a continuation of my previous question here.
In the following example:
id PRODUCT ID COLOUR
1 1001 GREEN
2 1002 GREEN
3 1002 RED
4 1003 RED
Given a product ID, I want to retrieve only one record - that with GREEN colour, if one exists, or the RED one otherwise. It sounds like I need to employ DISTINCT somehow, but I don't know how to supply the priority rule.
Pretty basic I'm sure, but my SQL skills are more than rusty..
Edit: Thank you everybody. One more question please: how can this be made to work with multiple records, ie. if the WHERE clause returns more than just one record? The LIMIT 1 would limit across the entire set, while what I'd want would be to limit just within each product.
For example, if I had something like SELECT * FROM table WHERE productID LIKE "1%" ... how can I retrieve each unique product, but still respecting the colour priority (GREEN>RED)?
try this:
SELECT top 1 *
FROM <table>
WHERE ProductID = <id>
ORDER BY case when colour ='GREEN' then 1
when colour ='RED' then 2 end
If you want to order it based on another color, you can give it in the case statement
SELECT *
FROM yourtable
WHERE ProductID = (your id)
ORDER BY colour
LIMIT 1
(Green will come before Red, you see. The LIMIT clause returns only one record)
For your subsequent edit, you can do this
select yourtable.*
from
yourtable
inner join
(select productid, min(colour) mincolour
from yourtable
where productid like '10%'
group by productid) v
on yourtable.productid=v.productid
and yourtable.colour=v.mincolour

Working with sets of rows in (My)SQL and comparing values

I am trying to figure out the SQL for doing some relatively simple operations on sets of records in a table but I am stuck. Consider a table with multiple rows per item, all identified by a common key.
For example:
serial model color
XX1 A blue
XX2 A blue
XX3 A green
XX5 B red
XX6 B blue
XX1 B blue
What I would for example want to do is:
Assuming that all model A rows must have the same color, find the rows which dont. (for example, XX3 is green).
Assuming that a given serial number can only point to a single type of model, find out the rows which that does not occur (for example XX1 points both to A and B)
These are all simple logically things to do. To abstract it, I want to know how to group things by using a single key (or combination of keys) and then compare the values of those records.
Should I use a join on the same table? should i use some sort of array or similar?
thanks for your help
For 1:
SELECT model, color, COUNT(*) AS num FROM yourTable GROUP BY model, color;
This will give you a list of each model and each color for that model along with the count. So the output from your dataset would be:
model color num
A blue 2
A green 1
B red 1
B blue 2
From this output you can easily see what's incorrect and fix it using an UPDATE statement or do a blanket operation where you assign the most popular color to each model.
For 2:
SELECT serial, COUNT(*) AS num FROM yourTable GROUP BY serial HAVING num > 1
The output for this would be:
serial num
XX1 2
To address #1, I would use a self-join (a join on the same table, as you put it).
For example,
select *
from mytable
where serial in (select serial
from mytable
group by model, color
having count(*) = 1)
would find all the serial numbers that only exist in one color. I did not test this, but I hope you see what it does. The inner select finds all the records that only occur once, then the outer select shows all detail for those serials.
Of course, having said that, this is a poor table design. But I don't think that was your question. And I hope this was a made up example for a real situation. My concern would be that there is no reason to assume that the single occurrence is actually bad -- it could be that there are 10 records, all of which have a distinct color. This approach would tell you that all of them are wrong, and you would be unable to decide which was correct.