joining wide tables (10s of unique cols) - sql

I have multiples tables which need to be joined on multiple common attributes such the different attributes can be shown in a single table.
table1
+--------+---------+-------+
| make | model | r_yr |
+--------+---------+-------+
| toyota | corolla | 1999 |
| toyota | camry | 2002 |
| toyota | qualis | 2004 |
| toyota | rav4 | 2006 |
+--------+---------+-------+
table2
+--------+---------+--------+
| make | model | kms |
+--------+---------+--------+
| toyota | corolla | 25000 |
| toyota | camry | 50000 |
+--------+---------+--------+
table4
+--------+---------+---------+
| make | model | mileage |
+--------+---------+---------+
| toyota | corolla | 20 |
| toyota | qualis | 25 |
+--------+---------+---------+
table5
+--------+----------+-------+
| make | model | colr |
+--------+----------+-------+
| toyota | camry | blue |
| toyota | rav4 | green |
+--------+----------+-------+
I'm doing the following to join the results
select a.make, a.model,a.r_yr,b.kms,c.mileage,d.colr
from table1 as a
left join table2 as b
on b.make=a.make and b.model=a.model and b.r_yr=a.r_yr
left join table3 as c
on c.make=a.make and c.model=a.model and c.r_yr=a.r_yr
left join table4 as d
on d.make=a.make and d.model=a.model and d.r_yr=a.r_yr
This gives a table like below
+--------+---------+-------+-------+----------+--------+
| make | model | r_yr | kms | mileage | colr |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
However the issue I have is that, for the real data set I'm working with, there are 5 common cols per table and around 20-40 unique attributes per table requiring to specify 20-40 col names in the query in the form of b.kms, ....,c.mileage, ......,d.colr,..... Is there a work around to not having to specify those unique columns by specifying all except the common cols or other ways ?

You cannot do something like SELECT all except x,y,z ...
But you can simplify this query using USING clause instead of JOIN ... ON
Demo: http://sqlfiddle.com/#!17/fa97a/6
select *
from table1 as a
left join table2 as b
USING (make, model)
left join table3 as c
USING (make, model)
left join table4 as d
USING (make, model)
| make | model | r_yr | kms | mileage | colr |
|--------|---------|------|--------|---------|--------|
| toyota | camry | 2002 | 50000 | (null) | blue |
| toyota | corolla | 1999 | 25000 | 20 | (null) |
| toyota | qualis | 2004 | (null) | 25 | (null) |
| toyota | rav4 | 2006 | (null) | (null) | green |
Note: In the above example I am using only two common columns (make, model) since in your example r_yr is not a common column because it is only in table1

Related

Materialized view only for covering index

Given the following table car_year of car model's year:
| id | maker | model | year |
----------------------------------
| 6 | Audi | Allroad | 2001 |
| 12 | Audi | A8 | 2008 |
| 14 | Ford | Mustang | 1996 |
| 15 | Honda | Civic | 2000 |
| 19 | Honda | Insight | 2000 |
| 22 | Ford | F150 | 2009 |
| 24 | Honda | Accord | 2000 |
| 28 | Ford | F150 | 2007 |
| 34 | Audi | S8 | 2002 |
| 48 | Ford | Expedition | 2011 |
| 62 | Ford | Escort | 2004 |
| 81 | Ford | Explorer | 2007 |
| 84 | Ford | Escape | 2006 |
| 93 | Honda | Accord | 1995 |
I would like to have a covering index for the "earliest model of a maker".
My solution is to create a materialized view:
CREATE MATERIALIZED VIEW earliest AS
SELECT DISTINCT ON(maker) maker, model
FROM car_year
ORDER BY maker, year
And then a covering index over it:
CREATE INDEX earliest_index ON earliest(maker) INCLUDE (model);
It works! But the materialized views is useless (for my usage) because I will only use the the covering index.
Am I missing a more elegant solution, or a (Postgre)SQL feature that I don't know about ?
If I get you right, you are asking for an “index organized table” (or index organized materialized view in that case), that is a table that is really an index. Then you wouldn't have to waste twice the storage, once for the table that you never use, and once for the index, on which you want to perform index-only scans.
The answer is that such “index organized tables” don't exist in PostgreSQL.

joining multiple tables on common attributes with disjoint rows

I have multiples tables which need to be joined on multiple common attributes such the different attributes can be shown in a single table.
table1
+--------+---------+-------+--------+
| make | model | year | kms |
+--------+---------+-------+--------+
| toyota | corolla | 1999 | 25000 |
| toyota | camry | 2002 | 50000 |
+--------+---------+-------+--------+
table2
+--------+---------+-------+---------+
| make | model | year | mileage |
+--------+---------+-------+---------+
| toyota | corolla | 1999 | 20 |
| toyota | qualis | 2004 | 25 |
+--------+---------+-------+---------+
table3
+--------+----------+-------+-------+
| make | model | year | color |
+--------+----------+-------+-------+
| toyota | camry | 2002 | blue |
| toyota | rav4 | 2006 | green |
+--------+----------+-------+-------+
I'm doing the following to join the results
select
* from table1 as a
full join table2 as b
using (make, model, year)
full join table3 as c
using (make, model, year)
What I need is a table like below.
+--------+---------+-------+-------+----------+--------+
| make | model | year | kms | mileage | color |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
However I get results with the make,model,year duplicated with some empty value for some rows.
How do I go about getting the required. Note that, for the real data set I'm working with, there are 5 common attributes per table and around 20-40 different attributes per table.
Duplicates could be due to the full join you are using. Change those to left joins as given below:
SELECT
A.MAKE, A.MODEL, A.YEAR, T1.KMS, T2.MILEAGE, T3.COLOR
FROM
(SELECT MAKE, MODEL, YEAR FROM TABLE1 UNION
SELECT MAKE, MODEL, YEAR FROM TABLE2 UNION
SELECT MAKE, MODEL, YEAR FROM TABLE3) A
LEFT JOIN TABLE1 T1
ON T1.MAKE = A.MAKE AND T1.MODEL = A.MODEL AND T1.YEAR = A.YEAR
LEFT JOIN TABLE2 T2
ON T2.MAKE = A.MAKE AND T2.MODEL = A.MODEL AND T2.YEAR = A.YEAR
LEFT JOIN TABLE3 T3
ON T3.MAKE = A.MAKE AND T3.MODEL = A.MODEL AND T3.YEAR = A.YEAR;;

Postgres: select n unique rows for ID

Using Postgres I have a scenario where I need to return a variable number of rows for a each unique id in a sql statement.
Consider I have a table of the cars a user has owned over the years.
+----+----------+---------+-------+
| ID | make | model | type |
+----+----------+---------+-------+
| 1 | toyota | camry | sedan |
| 1 | ford | mustang | coupe |
| 1 | toyota | celica | coupe |
| 1 | bmw | z4 | coupe |
| 1 | honda | accord | sedan |
| 2 | buick | marque | sedan |
| 2 | delorean | btf | coupe |
| 2 | mini | cooper | coupe |
| 3 | ford | f-150 | truck |
| 3 | ford | mustang | coupe |
| 1 | ford | taurus | sedan |
+--------+----------+-------+-----+
From this table I'd only want to return two rows for each user that has a coupe and ignore the rest.
So something like. I'd also like to preserve the empty columns so the second result for ID 3 would be empty because there is only one car of type coupe. I am also working with restrictions as this has to run AWS Reshift. So, I can't use many functions. It seems this would be easy using a Top statement like in SQL server, but with Redshift restrictions and my lack of knowledge I'm not sure of the best way.
+----+----------+---------+-------+
| ID | make | model | type |
+----+----------+---------+-------+
| 1 | ford | mustang | coupe |
| 1 | toyota | celica | coupe |
| 2 | delorean | btf | coupe |
| 2 | mini | cooper | coupe |
| 3 | ford | mustang | coupe |
| 3 | | | |
+--------+----------+-------+-----+
Thanks a lot for your help.
As far as I know, Redshift supports window functions:
select id, make, model, type
from (
select id, make, model, type,
row_number() over (partition by id order by make) as rn
from the_table
where type = 'coupe'
) t
where rn <= 2
order by id, make;

SQL Server 2008 displaying records horizontally

I have the tables and data created in SqlFiddle( http://sqlfiddle.com/#!3/72398/3)
I have 3 tables with data
dbo.Contracts:
C_ID,
C_NAME,
C_VOLUME
dbo.Players:
P_ID,
P_NAME
dbo.ContractPlayers:
C_ID,
P_ID,
Share
I am able to do simple joins which displays the data vertically. but i need the data horizontally
Like this:
C_ID | C_NAME | C_Volume | P_NAME1 | SHARE1 | P_NAME2 | SHARE2 | P_NAME3 | SHARE3 |
1 | Agriculture | 40000 | Johndeer | 3000 | Statefarm | 4500 | Vortex | 3200 |
2 | Chemicals | 50000 | Johndeer | 1231 | Statefarm | 2345 | Vortex | 2311 |
3 | Autos | 35000 | Johndeer | 1212 | Statefarm | 1111 | Vortex | 4534 |
I am assuming this is possible. Please help Here is the sqlFiddle( http://sqlfiddle.com/#!3/72398/3)

SQL Join with Group By

Ok, so i'm trying to write a complex query (at least complex to me) and need some pro help. This is my database setup:
Table: MakeList
| MakeListId | Make |
| 1 | Acura |
| 2 | Chevy |
| 3 | Pontiac |
| 4 | Scion |
| 5 | Toyota |
Table: CustomerMake
| CustomerMakeId | CustomerId | _Descriptor |
| 1 | 123 | Acura |
| 2 | 124 | Chevy |
| 3 | 125 | Pontiac |
| 4 | 126 | Scion |
| 5 | 127 | Toyota |
| 6 | 128 | Acura |
| 7 | 129 | Chevy |
| 8 | 130 | Pontiac |
| 9 | 131 | Scion |
| 10 | 132 | Toyota |
Table: Customer
| CustomerId | StatusId |
| 123 | 1 |
| 124 | 1 |
| 125 | 1 |
| 126 | 2 |
| 127 | 1 |
| 128 | 1 |
| 129 | 2 |
| 130 | 1 |
| 131 | 1 |
| 132 | 1 |
What i am trying to end up with is this...
Desired Result Set:
| Make | CustomerId|
| Acura | 123 |
| Chevy | 124 |
| Pontiac | 125 |
| Scion | 131 |
| Toyota | 127 |
I am wanting a list of unique Makes with one active (StatusId = 1) CustomerId to go with it. I'm assuming i'll have to do some GROUP BYs and JOINS but i haven't been able to figure it out. Any help would be greatly appreciated. Let me know if i haven't given enough info for my question. Thanks!
UPDATE: The script doesn't have to be performant - it will be used one time for testing purposes.
Something like this:
select cm._Descriptor,
min(cu.customerid)
from CustomerMake cm
join Customer cu on cuo.CustomerId = cm.CustomerId and cu.StatusId = 1
group by cm._Descriptor
I left out the MakeList table as it seems unnecessary because you are storing the full make name as _Descriptorin the CustomerMake table anyway (so the question is what is the MakeList table for? Why don't you store a FK to it in the CustomerMake table?)
You want to
(a) join the customer and customermake tables
(b) filter on customer.statusid
(c) group by customermake._descriptor
Depending on your RDBMS, you may need to explicitly apply a group function to customer.customerid to include it in the select list. Since you don't care which particular customerid is displayed, you could use MIN or MAX to just pick an essentially arbitrary value.