Postgres: select n unique rows for ID

Postgres: select n unique rows for ID - sql

Using Postgres I have a scenario where I need to return a variable number of rows for a each unique id in a sql statement.
Consider I have a table of the cars a user has owned over the years.
+----+----------+---------+-------+
| ID | make | model | type |
+----+----------+---------+-------+
| 1 | toyota | camry | sedan |
| 1 | ford | mustang | coupe |
| 1 | toyota | celica | coupe |
| 1 | bmw | z4 | coupe |
| 1 | honda | accord | sedan |
| 2 | buick | marque | sedan |
| 2 | delorean | btf | coupe |
| 2 | mini | cooper | coupe |
| 3 | ford | f-150 | truck |
| 3 | ford | mustang | coupe |
| 1 | ford | taurus | sedan |
+--------+----------+-------+-----+
From this table I'd only want to return two rows for each user that has a coupe and ignore the rest.
So something like. I'd also like to preserve the empty columns so the second result for ID 3 would be empty because there is only one car of type coupe. I am also working with restrictions as this has to run AWS Reshift. So, I can't use many functions. It seems this would be easy using a Top statement like in SQL server, but with Redshift restrictions and my lack of knowledge I'm not sure of the best way.
+----+----------+---------+-------+
| ID | make | model | type |
+----+----------+---------+-------+
| 1 | ford | mustang | coupe |
| 1 | toyota | celica | coupe |
| 2 | delorean | btf | coupe |
| 2 | mini | cooper | coupe |
| 3 | ford | mustang | coupe |
| 3 | | | |
+--------+----------+-------+-----+
Thanks a lot for your help.

As far as I know, Redshift supports window functions:
select id, make, model, type
from (
select id, make, model, type,
row_number() over (partition by id order by make) as rn
from the_table
where type = 'coupe'
) t
where rn <= 2
order by id, make;

Related

Teradata SQL Assistant - How can I pivot or transpose large tables with many columns and many rows?

I am using Teradata SQL Assistant Version TD 16.10.06.01 ...
I have seen a lot people transpose data for set smallish tables but I am working on thousands of clients and need the break the columns up into Line Item Values to compare orders/highlight differences between orders. Problem is it is all horizontally linked and I need to transpose it to Id,Transaction id,Version and Line Item Value 1, Line Item Value 2... then another column comparing values to see if they changed.
example:
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
| Id | First Name | Last Name | DOB | transaction id | Make | Location | Postcode | Year | Price |
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
| 1 | John | Smith | 15/11/2001 | 1654654 | Audi | NSW | 2222 | 2019 | $ 10,000.00 |
| 2 | Mark | White | 11/02/2002 | 1661200 | BMW | WA | 8888 | 2016 | $ 8,999.00 |
| 3 | Bob | Grey | 10/05/2002 | 1667746 | Ford | QLD | 9999 | 2013 | $ 3,000.00 |
| 4 | Phil | Faux | 6/08/2002 | 1674292 | Holden | SA | 1111 | 2000 | $ 5,800.00 |
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
hoping to change the data to :
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
| id | trans_id | Vers_ord | Item Val | Ln_Itm_Dscrptn | Org_Val | Updt_Val | Amndd_Ord_chck | Lbl_Rnk | ... |
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
| 1 | 1654654 | 2 | 11169 | Make | Audi BLK | Audi WHT | Yes | 1 | |
| 1 | 1654654 | 2 | 11189 | Location | NSW | WA | Yes | 2 | |
| 1 | 1654654 | 2 | 23689 | Postcode | 2222 | 6000 | Yes | 3 | |
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
Recently with smaller data I created a table added in Values then used a case statement when value 1 then xyz with a product join ... and the data warehouse admins didn't mention anything out of order. but I only had row 16 by 200 column table to transpose ( Sum, Avg, Count, Median(function) x 4 subsets of clients) , which were significantly smaller than my current tables to make comparisons with.
I am worried my prior method will probably slow the data Warehouse down, plus take me significant amount of time to type the SQL.
Is there a better way to transpose large tables?

Materialized view only for covering index

Given the following table car_year of car model's year:
| id | maker | model | year |
----------------------------------
| 6 | Audi | Allroad | 2001 |
| 12 | Audi | A8 | 2008 |
| 14 | Ford | Mustang | 1996 |
| 15 | Honda | Civic | 2000 |
| 19 | Honda | Insight | 2000 |
| 22 | Ford | F150 | 2009 |
| 24 | Honda | Accord | 2000 |
| 28 | Ford | F150 | 2007 |
| 34 | Audi | S8 | 2002 |
| 48 | Ford | Expedition | 2011 |
| 62 | Ford | Escort | 2004 |
| 81 | Ford | Explorer | 2007 |
| 84 | Ford | Escape | 2006 |
| 93 | Honda | Accord | 1995 |
I would like to have a covering index for the "earliest model of a maker".
My solution is to create a materialized view:
CREATE MATERIALIZED VIEW earliest AS
SELECT DISTINCT ON(maker) maker, model
FROM car_year
ORDER BY maker, year
And then a covering index over it:
CREATE INDEX earliest_index ON earliest(maker) INCLUDE (model);
It works! But the materialized views is useless (for my usage) because I will only use the the covering index.
Am I missing a more elegant solution, or a (Postgre)SQL feature that I don't know about ?

If I get you right, you are asking for an “index organized table” (or index organized materialized view in that case), that is a table that is really an index. Then you wouldn't have to waste twice the storage, once for the table that you never use, and once for the index, on which you want to perform index-only scans.
The answer is that such “index organized tables” don't exist in PostgreSQL.

joining wide tables (10s of unique cols)

I have multiples tables which need to be joined on multiple common attributes such the different attributes can be shown in a single table.
table1
+--------+---------+-------+
| make | model | r_yr |
+--------+---------+-------+
| toyota | corolla | 1999 |
| toyota | camry | 2002 |
| toyota | qualis | 2004 |
| toyota | rav4 | 2006 |
+--------+---------+-------+
table2
+--------+---------+--------+
| make | model | kms |
+--------+---------+--------+
| toyota | corolla | 25000 |
| toyota | camry | 50000 |
+--------+---------+--------+
table4
+--------+---------+---------+
| make | model | mileage |
+--------+---------+---------+
| toyota | corolla | 20 |
| toyota | qualis | 25 |
+--------+---------+---------+
table5
+--------+----------+-------+
| make | model | colr |
+--------+----------+-------+
| toyota | camry | blue |
| toyota | rav4 | green |
+--------+----------+-------+
I'm doing the following to join the results
select a.make, a.model,a.r_yr,b.kms,c.mileage,d.colr
from table1 as a
left join table2 as b
on b.make=a.make and b.model=a.model and b.r_yr=a.r_yr
left join table3 as c
on c.make=a.make and c.model=a.model and c.r_yr=a.r_yr
left join table4 as d
on d.make=a.make and d.model=a.model and d.r_yr=a.r_yr
This gives a table like below
+--------+---------+-------+-------+----------+--------+
| make | model | r_yr | kms | mileage | colr |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
However the issue I have is that, for the real data set I'm working with, there are 5 common cols per table and around 20-40 unique attributes per table requiring to specify 20-40 col names in the query in the form of b.kms, ....,c.mileage, ......,d.colr,..... Is there a work around to not having to specify those unique columns by specifying all except the common cols or other ways ?

You cannot do something like SELECT all except x,y,z ...
But you can simplify this query using USING clause instead of JOIN ... ON
Demo: http://sqlfiddle.com/#!17/fa97a/6
select *
from table1 as a
left join table2 as b
USING (make, model)
left join table3 as c
USING (make, model)
left join table4 as d
USING (make, model)
| make | model | r_yr | kms | mileage | colr |
|--------|---------|------|--------|---------|--------|
| toyota | camry | 2002 | 50000 | (null) | blue |
| toyota | corolla | 1999 | 25000 | 20 | (null) |
| toyota | qualis | 2004 | (null) | 25 | (null) |
| toyota | rav4 | 2006 | (null) | (null) | green |
Note: In the above example I am using only two common columns (make, model) since in your example r_yr is not a common column because it is only in table1

how can I order an postgresql table so that consecutive values do not match

How can I write an postgresql query or view that returns a table where it’s first results do not have any matching consecutive values in a given column.
For example on this table:
+--------------------------+-------+----------------+--------+
| id | model | name | colour |
+--------------------------+-------+----------------+--------+
| 59663f188b06e815cae676f4 | ford | amazing car | red |
| 59663f1487b60515ca614999 | ford | great car | pink |
| 59663f230f519115cae6e089 | mini | broken car | blue |
| 59663f1ce28a5315ca07262e | vw | mystery mobile | blue |
+--------------------------+-------+----------------+--------+
If I choose model or colour the order could be:
+--------------------------+-------+----------------+--------+
| id | model | name | colour |
+--------------------------+-------+----------------+--------+
| 59663f188b06e815cae676f4 | ford | amazing car | red |
| 59663f230f519115cae6e089 | mini | broken car | blue |
| 59663f1487b60515ca614999 | ford | great car | pink |
| 59663f1ce28a5315ca07262e | vw | mystery mobile | blue |
+--------------------------+-------+----------------+--------+

Try this. It probably works if the data has more than one duplicate values
Select
id
,model
,name
,colour
,row_number() over(partition by model) Rno
from Table
order by Rno

SQL Join with Group By

Ok, so i'm trying to write a complex query (at least complex to me) and need some pro help. This is my database setup:
Table: MakeList
| MakeListId | Make |
| 1 | Acura |
| 2 | Chevy |
| 3 | Pontiac |
| 4 | Scion |
| 5 | Toyota |
Table: CustomerMake
| CustomerMakeId | CustomerId | _Descriptor |
| 1 | 123 | Acura |
| 2 | 124 | Chevy |
| 3 | 125 | Pontiac |
| 4 | 126 | Scion |
| 5 | 127 | Toyota |
| 6 | 128 | Acura |
| 7 | 129 | Chevy |
| 8 | 130 | Pontiac |
| 9 | 131 | Scion |
| 10 | 132 | Toyota |
Table: Customer
| CustomerId | StatusId |
| 123 | 1 |
| 124 | 1 |
| 125 | 1 |
| 126 | 2 |
| 127 | 1 |
| 128 | 1 |
| 129 | 2 |
| 130 | 1 |
| 131 | 1 |
| 132 | 1 |
What i am trying to end up with is this...
Desired Result Set:
| Make | CustomerId|
| Acura | 123 |
| Chevy | 124 |
| Pontiac | 125 |
| Scion | 131 |
| Toyota | 127 |
I am wanting a list of unique Makes with one active (StatusId = 1) CustomerId to go with it. I'm assuming i'll have to do some GROUP BYs and JOINS but i haven't been able to figure it out. Any help would be greatly appreciated. Let me know if i haven't given enough info for my question. Thanks!
UPDATE: The script doesn't have to be performant - it will be used one time for testing purposes.

Something like this:
select cm._Descriptor,
min(cu.customerid)
from CustomerMake cm
join Customer cu on cuo.CustomerId = cm.CustomerId and cu.StatusId = 1
group by cm._Descriptor
I left out the MakeList table as it seems unnecessary because you are storing the full make name as _Descriptorin the CustomerMake table anyway (so the question is what is the MakeList table for? Why don't you store a FK to it in the CustomerMake table?)

You want to
(a) join the customer and customermake tables
(b) filter on customer.statusid
(c) group by customermake._descriptor
Depending on your RDBMS, you may need to explicitly apply a group function to customer.customerid to include it in the select list. Since you don't care which particular customerid is displayed, you could use MIN or MAX to just pick an essentially arbitrary value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres: select n unique rows for ID - sql

As far as I know, Redshift supports window functions: select id, make, model, type from ( select id, make, model, type, row_number() over (partition by id order by make) as rn from the_table where type = 'coupe' ) t where rn <= 2 order by id, make;

Related

Teradata SQL Assistant - How can I pivot or transpose large tables with many columns and many rows?

Materialized view only for covering index

joining wide tables (10s of unique cols)

how can I order an postgresql table so that consecutive values do not match

SQL Join with Group By

Categories

Resources