I have 2 tables in redshift:
table1
| ids |
|------:|
| 1 |
| 2 |
| 6 |
| 9 |
| 12 |
table2
| id | value |
|-----:|---------:|
| 1 | 0.134435 |
| 2 | 0.767417 |
| 3 | 0.779567 |
| 4 | 0.726051 |
| 5 | 0.405138 |
| 6 | 0.775206 |
| 7 | 0.699945 |
| 8 | 0.499433 |
| 10 | 0.457386 |
| 9 | 0.227511 |
| 10 | 0.369292 |
| 11 | 0.653735 |
| 12 | 0.537251 |
| 2 | 0.953539 |
| 13 | 0.377625 |
| 14 | 0.973905 |
| 4 | 0.104643 |
| 1 | 0.450627 |
And I basically want to get the rows in table2 where id is in table1 and I have 2 possibilities:
SELECT *
FROM table2
WHERE id IN (SELECT ids FROM table1)
or
SELECT t2.id, t2.value
FROM table2 t2
INNER JOIN table1 t1
ON t2.id = t1.ids
I want to know if there is any performance difference between them.
(I know I could just test in this example to find out but I would like to know if there is one which is always faster)
Edit: table1.ids is a unique column
The two queries do different things.
The JOIN can multiply the number of rows if id is duplicated in table1.
The IN will never duplicate rows.
If id can be duplicated, you should use the version that does what you want. If id is guaranteed to be unique, then the two are functionally equivalent.
In my experience, JOIN is typically at least as fast a IN. Of course, you can test on your data, but that is a starting point.
I have 3 tables as shown:
Video
+----+--------+-----------+
| id | name | videoSize |
+----+--------+-----------+
| 1 | video1 | 1MB |
| 2 | video2 | 2MB |
| 3 | video3 | 3MB |
+----+--------+-----------+
Survey
+----+---------+-----------+
| id | name | questions |
+----+---------+-----------+
| 1 | survey1 | 1 |
| 2 | survey2 | 2 |
| 3 | survey3 | 3 |
+----+---------+-----------+
Sequence
+----+---------+-----------+----------+
| id | videoId | surveyId | sequence |
+----+---------+-----------+----------+
| 1 | null | 1 | 1 |
| 2 | 2 | null | 2 |
| 3 | null | 3 | 3 |
+----+---------+-----------+----------+
I would like to query Sequence and join on both of video and survey tables and merge common columns without specifying the column names (in this case name) like this:
Query Result:
+----+---------+-----------+----------+---------+-----------+-----------+
| id | videoId | surveyId | sequence | name | videoSize | questions |
+----+---------+-----------+----------+---------+-----------+-----------+
| 1 | null | 1 | 1 | survey1 | null | 1 |
| 2 | 2 | null | 2 | video2 | 2MB | null |
| 3 | null | 3 | 3 | survey3 | null | 3 |
+----+---------+-----------+----------+---------+-----------+-----------+
Is this possible?
BTW the below sql doesn't work as it doesn't merge on the name field:
SELECT * FROM "Sequence"
LEFT JOIN "Survey" ON "Survey"."id" = "Sequence"."surveyId"
LEFT JOIN "Video" ON "Video"."id" = "Sequence"."videoId"
This query will show what you want:
select
s.*,
coalesce(y.name, v.name) as name, -- picks the right column
v.videoSize,
y.questions
from sequence s
left join survey y on y.id = s.surveyId
left join video v on v.id = s.videoId
However, the SQL standard requires you to name the columns you want. The only exception being * as shown above.
I am trying to write a query where I can concatenate some rows into a single column based on the result of the case statement in DB2 v9.5
The contractId can be a variable number of rows as well.
Given I have the following table structure
Table1
+------------+------------+------+
| ContractId | Reference | Code |
+------------+------------+------+
| 12 | P123456789 | A |
| 12 | A987654321 | B |
| 12 | 9995559971 | C |
| 12 | 3215654778 | D |
| 13 | abcdef | A |
| 15 | asdfa | B |
| 37 | 282jd | B |
| 89 | asdf82 | C |
+------------+------------+------+
I would like to get the output of the result like so
+-------------+-----------------------+------------------------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------------------------+
| 12 | P123456789 | A987654321, 9995559971, 3215654778 |
| 13 | abcdef | asdfa, 282jd, asdf82 |
+-------------+-----------------------+------------------------------------+
I've tried queries like
select t1.contract_id,
max(case when t1.code = A then t1.reference end) as "reference with code a",
max(case when t1.code in ('B','C','D') then t1.reference end) as 'other references
from table t1
group by t1.contractId
however, this is still giving me an output like
+-------------+-----------------------+------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------+
| 12 | P123456789 | null |
| 12 | null | A987654321 |
| 12 | null | 9995559971 |
| 12 | null | 3215654778 |
+-------------+-----------------------+------------------+
I've also attempted using some of the XML Agg functions but can't seem to get it to format the way I want it too.
Amended Once
Amended Twice: The headers of the remaining 9 tables except for reports are always called "what".
I have about 10 tables with the following structure:
reports (165k rows)
+-----------+-----------+
| identifier| category |
+-----------+-----------+
| 1 | fixed |
| 2 | wontfix |
| 3 | fixed |
| 4 | invalid |
| 5 | later |
| 6 | wontfix |
| 7 | duplicate |
| 8 | later |
| 9 | wontfix |
+-----------+-----------+
status (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time | what |
+-----------+-----------+----------+
| 1 | 12 | RESOLVED |
| 1 | 9 | NEW |
| 2 | 7 | ASSIGNED |
| 3 | 10 | RESOLVED |
| 5 | 4 | REOPEN |
| 7 | 9 | ASSIGNED |
| 4 | 9 | ASSIGNED |
| 7 | 11 | RESOLVED |
| 8 | 3 | NEW |
| 4 | 3 | NEW |
| 7 | 6 | NEW |
+-----------+-----------+----------+
priority (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time | what |
+-----------+-----------+----------+
| 3 | 12 | LOW |
| 1 | 9 | LOW |
| 9 | 2 | HIGH |
| 8 | 7 | HIGH |
| 3 | 10 | HIGH |
| 5 | 4 | MEDIUM |
| 4 | 9 | MEDIUM |
| 4 | 3 | LOW |
| 7 | 9 | LOW |
| 7 | 11 | HIGH |
| 8 | 3 | LOW |
| 6 | 12 | MEDIUM |
| 7 | 6 | LOW |
| 6 | 9 | HIGH |
| 2 | 6 | HIGH |
| 2 | 1 | LOW |
+-----------+-----------+----------+
What I need is:
reportsfinal (165k rows)
+-----------+-----------+--------------+------------+
| identifier| category | what11 | what22 |
+-----------+-----------+--------------+------------+
| 1 | fixed | RESOLVED | LOW |
| 2 | wontfix | ASSIGNED | HIGH |
| 3 | fixed | RESOLVED | LOW |
| 4 | invalid | ASSIGNED | MEDIUM |
| 5 | later | REOPEN | MEDIUM |
| 6 | wontfix | | MEDIUM |
| 7 | duplicate | RESOLVED | HIGH |
| 8 | later | NEW | HIGH |
| 9 | wontifx | | HIGH |
+-----------+-----------+--------------+------------+
That is, reports (after query = reportsfinal) serves as the basis table and I have to add one or two columns from 9 other tables. The identifier is the key, but in some tables, the identifier comes up multiple times. In these cases I want to use the entry with the highest time only.
I tried several queries, but none of them worked. If possible, I want to run one query to get different columns from the 9 other tables with this approach.
What I tried based on the answer below:
select T.identifier,
T.category,
t.what AS what11,
t.what AS what22 from (
select R.identifier,
R.category,
COALESCE(S.what,'NA')what,
COALESCE(P.what,'NA')what,
ROW_NUMBER()OVER(partition by R.identifier,R.category ORDER by (select null))RN
from reports R
LEFT JOIN bugstatus S
ON S.identifier = R.identifier
LEFT JOIN priority P
ON P.identifier = s.identifier
GROUP BY R.identifier,R.category,S.what,P.what)T
Where T.RN = 1
ORDER BY T.identifier;
This gives the error:
Error: near "(": syntax error.
Basically you need a correlated subqueries in the select list.
From the hip, something like:
Select a.Identifier
,a.Category
,(select process
from status where status.identifier = a.Identifer order by time desc limit 1) Process
,(select prio
from priority where priorty.identifier = a.Identifer order by time desc limit 1) prio
From Reports a
For each associated table just use a predicate based on a subquery to identify the specific timestamp...
Single letter tokens r, s, and p are defined aliases for tables reports, status and priority respectively
Select r.Identifier, r.category,
coalesce(s.what, 'NA') status,
coalesce(p.what, 'NA') priority
From reports r
left join status s
on s.identifier = r.identifier
and s.time =
(Select max(time) from status
where identifier = r.identifier)
left join priority p
on p.identifier = r.identifier
and p.time =
(Select max(time) from priority
where identifier = r.identifier);
QUESTION: Why did you rename the columns from Status, and priority to What?? You might as well name then something or data, or information. At least the original names (status and prio) communicated something.. The word What is meaningless.
NOTE. I reversed (undid) the edit for the aliases of what11 and what12, as these names are e meaningless.
using Row_number works based on your assumed data
select T.identifier,
T.category,
what AS what11,
what AS what22 from (
select R.identifier,
r.category,
COALESCE(S.what,'NA')what,
COALESCE(P.what,'NA')what,
ROW_NUMBER()OVER(partition by R.identifier,r.category ORDER by (select null))RN
from reports R left join status S
ON S.identifier = R.identifier
LEFT JOIN Priority P
ON P.identifier = s.identifier
GROUP BY R.identifier,r.category,S.what,P.what)T
Where T.RN = 1
ORDER BY T.identifier
I have a table with CostCenter_ID (int) and a second table with Process_ID (int).
I'd like to combine the results of both tables so that each cost center ID is assigned to all process IDs, like so:
|CostCenterID | ProcessID |
---------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I've done it before but I'm drawing a blank. I've tried this:
SELECT CostCenter_ID,NULL FROM dbo.Cost_Centers
UNION ALL
SELECT NULL,Process_ID FROM dbo.Processes
which returns this:
|CostCenterID | ProcessID |
---------------------------
| 1 | NULL |
| NULL | 1 |
| NULL | 2 |
| NULL | 3 |
Try:
select a.CostCenterID, b.ProcessID
from table1 a
cross join table2 b
or:
select a.CostCenterID, b.ProcessID
from table1 a
,table2 b
NB: cross join is the better method as it makes it clearer to the reader what your intentions are.
More info (with pics) here: http://www.w3resource.com/sql/joins/cross-join.php