Replicate rows based on one to many relationships between columns - sql

This has to be a solved problem but I don't know the right terms to search for on google. So, will explain the problem here.
I have the following dataset that has two different identifiers for users (say id1 and id2).
+------+-----+-------+
| id1 | id2 | value |
+------+-----+-------+
| 1 | 11 | blah1 |
| 1 | 12 | blah2 |
| 2 | 13 | blah3 |
| null | 14 | blah4 |
+------+-----+-------+
There is a one-to-many relationship between id1 and id2 and so users with id2 11 and 12 are actually the same users. I want to replicate the rows for such users so that the value is associated with each id2. The resulting dataset would then look like
+------+-----+-------+
| id1 | id2 | value |
+------+-----+-------+
| 1 | 11 | blah1 |
| 1 | 12 | blah2 |
| 2 | 13 | blah3 |
| null | 14 | blah4 |
| 1 | 12 | blah1 |
| 1 | 11 | blah2 |
+------+-----+-------+
As you can see, the value blah1 is now associated with both 11 and 12 id2, as is the value blah2.
There must be some kind of self-join that does that but I am not aware of what it is called (SQL newbie). Would appreciate if some one could point me in the right direction.

Well, you can self join, its totally permitted...
Join will link columns based on a key connection (in the general case)
Notice that in this case will also need union, because you'd like more lines, not columns
SELECT t.*
FROM
table t
INNER JOIN table t2 ON t.id1 = t2.id1 AND t. id2 != t2.id2
UNION
SELECT t.*
FROM
table t
INNER JOIN table t2 ON t.id1 = t2.id1 AND t. id2 = t2.id2

You can generate the rows using join for this purpose:
select i.id1, i.id2, iv.value
from (select distinct id1, value from t) iv join
(select distinct id1, id2 from t) i
on iv.id1 = i.id1 ;
Actually, the second select distinct is probably not necessary (unless your original data has duplicates which is would if you added these rows back into the table), but I think it make the query clearer. This should also work:
select t.id1, t.id2, iv.value
from (select distinct id1, value from t) iv join
t
on iv.id1 = t.id1 ;

Related

Combining multiple tables with same ID

I am trying to combine several tables. The goal is to have them all on one table in the end, but sorted by ID. So if I have a matching ID between some of the tables, it unites into one row based on the ID.
There are more than two tables (I currently have three: table1, table2, table3 but I plan to add more in the future)
Some of the tables don't have the same columns or number of columns.
Some of the tables don't have the same name for the ID column, table1 has "ID" and table2 has it named "identity" and so on...
I try not to include each of all the available columns from every table in the code, because there's a good amount of columns in each of the available tables and I assume adding and naming each one of them in the query would be tiring. I do however know the column name of the ID in each of the tables.
So the column names for the ID are: "ID(table1), Identity(table2), CatalogNum(table3)"
Here's an example,
table1:
**ID** | Name | Price | Date | ....
000212 Rod 200 NULL etc
......
table2:
Descr | **Identitiy** | amount | ...
Silver rod 000212 3 NULL
......
table3:
Type | Price | Condition | **CatalogNum** | .....
Metal NULL 8 000212 etc
Wood 300 1 000313 etc
.....
So end result should look like:
**ID** | Name | Price | Date | Descr | amount | Type | Condition | .... | ... | .....
000212 Rod 200 NULL Silver rod 3 Metal 8 etc NULL etc
000313 NULL 300 NULL NULL NULL Wood 1 NULL NULL etc
is this what you want:
Select b.*, c.*, d.* From
(select ID from table1 union select Identitiy from table2 union select ID from table3)a
left join table1 b on a.ID = b.ID
left join table2 c on a.ID = c.Identitiy
left join table3 d on a.ID = d.ID
Since there's no single source for all of the ID values that could occur in the various tables, you'll have to build one, and then use that to join to all of the other tables to get the columns you're interested in.
For the purposes here, I pulled back all of the columns in all of the tables, though.
WITH IdList AS (
SELECT
ID AS MasterId
FROM table1
UNION
SELECT
Identitiy
FROM table2
UNION
SELECT
CatalogNum
FROM table3
)
SELECT
i.MasterID,
t1.*,
t2.*,
t3.*
FROM IdList as i
LEFT JOIN table1 as t1 ON t1.ID = i.MasterID
LEFT JOIN table2 as t2 ON t2.Identitiy = i.MasterID
LEFT JOIN table3 as t3 ON t3.CatalogNum = i.MasterID;
Result:
+----------+--------+--------+--------+--------+------------+-----------+--------+-------+--------+-----------+------------+
| MasterID | ID | Name | Price | Date | Descr | Identitiy | amount | Type | Price | Condition | CatalogNum |
+----------+--------+--------+--------+--------+------------+-----------+--------+-------+--------+-----------+------------+
| 000212 | 000212 | Rod | 200 | (null) | Silver rod | 000212 | 3 | Metal | (null) | 8 | 000212 |
| 000313 | (null) | (null) | (null) | (null) | (null) | (null) | (null) | Wood | 300 | 1 | 000313 |
+----------+--------+--------+--------+--------+------------+-----------+--------+-------+--------+-----------+------------+
SQL Fiddle demo
EDIT: Your question notes that you want the data in one table, ", but sorted by ID." SQL tables don't work that way. They are, by definition, unordered sets. You impose order on them by using an ORDER BY clause in your SELECT queries. So there is no effort in my query above to create any kind of "order" at all.

Select unique ordered values of several columns in sql

I am using a table with a couple of geometries in each row. I would like that each geometries appears only once in my database. I sorted the couple by distance. I succeded to have distinct geom1 or geom2 but never in the same time. The ids are linked to their related geometries.
| id1 | id2 | distance| | id1 | id2 | distance|
| 1 | 2 | 3 | | 1 | 2 | 3 |
| 2 | 1 | 4 | -> | 2 | 1 | 7 |
| 2 | 2 | 7 |
| 1 | 1 | 9 |
My table contains more than 2 millions rows, so the performance is an issue.
I taught to create several temp table where I group by the id1 and then id2, collect the missing values and group by again and again... But if anyone has a better idea, It would be amazing.
Thanks,
if i understand correctly you are looking for distinct triplets of id1, id2 and distance:
SELECT DISTINCT id1, id2 , distance FROM <table name>;
or
SELECT id1, id2 FROM <table name> GROUP BY id1, id2, distance;
You seems want :
select t1.*
from table t1
where id2 = (select max(t1.id2) from table t2 where t2.id1 = t1.id1);

behavior of filters in outer join

I understand filters in JOIN clause and in WHERE clause is different when using outer join. Let's say I have these 2 tables.
table1
id | value
---+------
1 | 11
2 | 12
table2
id | value
---+------
1 | 101
Now if I query for
select a.id as id1, a.value as value1, b.value as value2
from table1 as a
left join table2 on a.id=b.id and a.value=11
The result is this, an extra row with value1=12
id1 | value1 | value2
----+--------+--------
1 | 11 | 101
2 | 12 | NULL
However, if I put the filter in where clause, it gives me what I want. The question is why it behaves like this?
The second condition used on your left join example limits which rows will be considered for joining.
select f1.id as id1, t1.value as value1, t2.value as value2
from t1
left join t2 on t1.id=t2.id AND T2.VALUE=11
t1
id | value
---+------
1 | 11 ONLY join on this row because t1.value=11
2 | 12
t2
id | value
---+------
1 | 101 this has t1.id=t2.id, so it does get joined
which would produce this final result:
id1 | value1 | value 2
----+--------+--------
1 | 11 | 101
2 | 12 | NULL
Moving the predicate T2.VALUE=11 to the where clause has a different series of events, as follows:
select f1.id as id1, t1.value as value1, t2.value as value2
from t1
left join t2 on t1.id=t2.id
WHERE T2.VALUE=11
t1
id | value
---+------
1 | 11 this row does meet t1.id=t2.id, so it gets joined
2 | 12 this row does NOT meet t1.id=t2.id, FAILS to join
t2
id | value
---+------
1 | 101 this row does meet t1.id=t2.id, so it gets joined
which would produce this INTERIM result:
id1 | value1 | value 2
----+--------+--------
1 | 11 | 101
2 | 12 | NULL
NOW the where clause is considered
id1 | value1 | value 2
----+--------+--------
1 | 11 | 101 T2.VALUE does equal 11 so this row will be returned
2 | 12 | NULL T2.VALUE does NOT = 11 so this row is NOT returned
Thus the final result is:
id1 | value1 | value 2
----+--------+--------
1 | 11 | 101

PostgreSQL Table Overlap Count

I am using postgresql.
I have a table that looks like this
| id1 | id2 |
------------------------------------
| 1 | 6 |
| 1 | 12 |
| 2 | 6 |
| 3 | 1 |
| 3 | 2 |
| 2 | 2 |
I am trying to design a query that given for example: id1=1, it will return all id1's with their overlap in id2 in relation to the given id1. Do not include the given id1 in the results.
For example, if it were given id1=1, the result should be:
| id1 | num_occurences |
------------------------------------
| 2 | 1 |
| 3 | 0 |
An id1 of 2 would return 1 because id1=1 and id1=2 have only id2=6 in common. id1 of 3 returns 0 because there is no overlap in occurrences.
I think I might want to use an INNER JOIN but I am not sure.
Any suggestions?
Since you also want zero results, you could use a LEFT JOIN to check the condition;
SELECT a.id1, COUNT(b.id1) num_occurences
FROM mytable a
LEFT JOIN mytable b ON a.id2 = b.id2 AND b.id1 = [id]
WHERE a.id1 <> [id]
GROUP BY a.id1
...where in your case, [id]=1.
What it does is check for each row in "b" (with id1=1) check if there's a row in "a" with the same id2 and an id1 <> 1. Then all it needs to do is group and count the results.
An SQLfiddle to test with.
SELECT id1, SUM( CASE
WHEN id1=id2 THEN 1
ELSE 0
END )
AS num_occurences
FROM table
GROUP by id1
Not a single JOIN was given that day.

Oracle SQL - Making a one to many join one to one based on logic

Sorry for the broad title, I had a hard time coming up with a brief way of describing what I am looking to do. I have two tables (examples below) that I want to join but under a certain condition.
The main table has a field called "DateVal", the second table has a field called "Day". After joining on field "JoinField" I only want to keep rows where the day value in "DateVal" is less than the value of "Day". However, if this criteria is met for multiple values of "Day" I only want it to keep the first instance.
In the second table below, for JoinField "A" there are three rows, for the first I only want it to return times when the day of the month is between 1-10, the second only with the day of the month is between 11-20, and the last 20-31.
A left or inner join will bring back all values, the only way I can think of to get around this is to do a complete join and only return for min("Day"). Can anyone think of a more efficient way?
Thanks in advance.
Table 1
-------------------------------
| ID | JoinField | DateVal |
-------------------------------
| 1 | A | 01/01/2014 |
| 2 | A | 01/16/2014 |
| 3 | B | 05/20/2013 |
-------------------------------
Table 2
--------------------------------
| JoinField | Day | FieldToAdd |
--------------------------------
| A | 10 | A |
| A | 20 | AA |
| A | 31 | AAA |
| B | 15 | B |
| B | 31 | BB |
--------------------------------
Desired Results
--------------------------------------------
| ID | JoinField | DateVal | FieldToAdd |
--------------------------------------------
| 1 | A | 01/01/2014 | A |
| 2 | A | 01/16/2014 | AA |
| 3 | B | 05/20/2014 | BB |
--------------------------------------------
You can do this in a variety of ways. I think a correlated subquery is the easiest way to express it, but unfortunately, the following doesn't work in Oracle:
select t1.*,
(select *
from (select t2.*
from table2 t2
where t2.day < extract(day from t1.dateval)
order by t2.day desc
) t
where rownum = 1
)
from table1 t1;
You can instead do this with join fancy window functions:
select *
from (select t1.*,
row_number() over (partition by t1.id order by t2.day desc) as seqnum
from table1 t1 left outer join
table2 t2
on t2.day < extract(day from t1.dateval)
) t
where seqnum = 1;