SQL: self join using each rows only once [duplicate]

SQL: self join using each rows only once [duplicate] - sql

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
combinations (not permutations) from cross join in sql
I've currently got a table with the following records:
A1
A2
A3
B1
B2
C1
C2
Where the same letter denotes some criteria in common (e.g. a common value for the column 'letter'). I do a self join on the criteria as follows:
SELECT mytable.*, self.* FROM mytable INNER JOIN mytable AS self
ON (mytable.letter = self.letter and mytable.number != self.number);
This join gives something like the following:
A1 A2
A2 A1
A1 A3
A3 A1
A2 A3
A3 A2
B1 B2
B2 B1
C1 C2
C2 C1
However, I only want to include each pair once (a combination instead of a permutation).
How would I get the following:
A1 A2
A1 A3
A2 A3
B1 B2
C1 C2

Changing the JOIN condition slightly will achieve what you want..
Instead of:
ON (mytable.letter = self.letter and mytable.number != self.number)
use
ON (mytable.letter = self.letter and mytable.number > self.number)
This will only include combinations where self.number is greater than mytable.number which in effect restricts the results to one valid ordering of each combination...

Related

Select entries that have non repeating values on a specific column (although other columns may have repeating or non repeating values) (SQL)

Let's say I have the following table:
A
B
C
D
a1
b1
c1
d1
a1
b1
c1
d2
a2
b2
c3
d3
a2
b2
c4
d3
I want to filter and see all four columns for entries that have the same value con column A but different on column C, so I get only this as a result:
A
B
C
D
a2
b2
c3
d3
a2
b2
c4
d3
I don't really care if values con columns B and D are the same or different, although I would like to have them in my table to do further analysis later.
Using the DISTINCT statement would give me all the columns as a result, as they all are different in some column, so that doesn't work for me.
I read some questions (like this one) and the answers recommended using the row_number() over(partition by...) clause, although the use they gave it doesn't quite fit my problem (I think), as it would also return the first row with a repeating value on column C.
Any ideas how this could be done?

You can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.a = t.a and t2.c <> t.c
)
order by t.a;

You could use a self join
select t1.*
from t t1
join t t2 on t1.a=t2.a and t1.c<>t2.c

Inconsistent results with jsonb_array_elements_text() twice in the SELECT list

Why does the behavior of the query below change when the number of elements in the array changes?
The following snippet expands two arrays on the same query and has two different behaviors:
When the two arrays have the same number of elements, one row per
element is returned
When the two arrays have different number of
elements, it behaves like a CROSS JOIN
All of this executed in Postgres 9.5.2:
CREATE TABLE test(a text, b jsonb, c jsonb);
INSERT INTO test VALUES
('A', '["b1","b2"]', '["c1","c2"]'),
('B', '["b1","b2"]', '["c1","c2","c3"]');
SELECT a, jsonb_array_elements_text(b) b, jsonb_array_elements_text(c) c
FROM test;
Here is the result:
A b1 c1
A b2 c2
B b1 c1
B b2 c2
B b1 c3
B b2 c1
B b1 c2
B b2 c3
Here is what I would expect:
A b1 c1
A b1 c2
A b2 c1
A b2 c2
B b1 c1
B b2 c2
B b1 c3
B b2 c1
B b1 c2
B b2 c3

Combining multiple set-returning functions in the SELECT list is not in the SQL standard, where all set-returning elements go into the FROM list. You can do that in Postgres, but it used to exhibit surprising behavior before version 10, where it was finally sanitized.
All of this is not directly related to the datatype jsonb or the function jsonb_array_elements_text() - beyond it being a set-returning function.
If you want the Cartesian product, reliably and not depending on your version of Postgres, use CROSS JOIN LATERAL instead (requires at least Postgres 9.3):
SELECT t.a, jb.b, jc.c
FROM test t
, jsonb_array_elements_text(t.b) jb(b)
, jsonb_array_elements_text(t.c) jc(c)
ORDER BY t.a, ???; -- your desired order seems arbitrary beyond a
The comma in the FROM list (,) is basically short syntax for CROSS JOIN LATERAL here.
See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Explanation for your actual question:
Why does the behavior of the query below change when the number of elements in the array changes?
What is the expected behaviour for multiple set-returning functions in SELECT clause?

SQL - Merge / Update & Delete from existing rows

Given the following table:
A B C D E
a1 b1 NULL NULL e1
NULL NULL c1 d1 NULL
a1 b1 c1 NULL NULL
I want to run a query that would merge/update&delete the existing rows whenever at least one of the columns have a equal value (except when NULL = NULL) to result in the following table:
A B C D E
a1 b1 c1 d1 e1
Please note that there are no unique IDs in any of the columns as any could have NULL values.
Could you please assist?
Edited:
If any of the columns do not share the same value, I would them to be a separate record; and this would not affect any other records.
For example, if row 3 was:
A B C D E
a1 b1 c1 NULL e2
then the desired output would be:
A B C D E
a1 b1 NULL NULL e1
a1 b1 c1 d1 e2
since row 1 already has a e1 <> e2 (so is left as a separate record); and row 2 and 3 are merged since they share c1 as a common C value AND also does not have any columns with differing values (other than NULLS).
Another example:
If there was an additional row, row 4 (with the original table):
A B C D E
NULL NULL c1 d2 NULL
Then the desired output would be:
A B C D E
a1 b1 c1 d1 e1
a1 b1 c1 d2 e1

This isn't really a complete answer, but rather an answer that will help others understand the problem and hopefully help steer them in the right direction to provide the complete answer:
As I understand it, the fundamental problem with the scenario you describe is that we need something like TRANSACTION (but probably not TRANSACTION) which will somehow allow us to accomplish the following steps.
Step 1.
In the following table, take the first row and compare it to all the other rows.
A B C D E
a1 b1 NULL NULL e1
NULL NULL c1 d1 NULL
a1 b1 c1 NULL NULL
For the first row, we find a match with the third row in column [A] and then we update columns [B], [C], [D], and [E], then we delete the third row - giving us the following table:
A B C D E
a1 b1 c1 NULL e1
NULL NULL c1 d1 NULL
Step 2
We look at column [B] and find no matches.
Step 3
We look at column [C] and find a match so we update the current row (row 1). We are ignoring / not merging NULLs so we get the following table
A B C D E
a1 b1 c1 d1 e1
The problem with this is that MERGE operations do not work as described. By that I mean once a row is MERGED, we move onto the next row.
This is a fairly complicated process so I personally am not sure how to tackle it, just wanted to help explain the problem in a more "developer-friendly" way.
I would expect the answer to look something like this (I don't think you can actually use isnull in the ON clause):
MERGE Table2 AS target
USING Table1 as source
ON (isnull(target.[A], -1) = isnull(source.[A], -2) OR isnull(target.[B], -1) = isnull(source.[B], -2), etc.... )
WHEN MATCHED
THEN UPDATE
SET target.[A] = source.[A]
and so on...

How to reduce number of joins?

I have to left join two tables where from right table I need some columns On a join condition of three columns, and some columns on join condition of two similar but one different column (again 3 columns) and some columns on join condition of one column (which is one of the non-matching columns in previous joins).
Let me explain by example
Table A has columns a1, a2, a3, a4, a5
Table B has columns b1, b2, b3, b4, b5, b6, b7
Now I need
a1,
a2,
a3,a4,
b1 when a2=b2, a3=b3, a4=b4,
b6 when a2=b5, a3=b3, a4=b4,
b7 when a2=b2
Now How can I achieve this without joining the tables multiple times, Or as less times as possible. With CASE WHEN THEN structure or anything else. The queries are for Hive but most of sql features are supported. Hive has different optimization techniques but sql guys are welcome.
Thanks in advance for your effort.

I'm pretty sure hive supports conditional aggregation. If I'm understanding your question correctly, you should be able to use that with a cross join:
select a1, a2, a3, a4,
max(case when a2 = b2 and a3 = b3 and a4 = b4 then b1 end) b1,
max(case when a2 = b5 and a3 = b3 and a4 = b4 then b6 end) b6,
max(case when a2 = b2 then b7 end) b7
from a cross join b
group by a1, a2, a3, a4

You want to do multiple joins:
select a.a1, a.a2, a.a3, a.a4, b1.b1, b2.b6, and b3.b7
from a join
b b1
on a.a2 = b1.b2 and a.a3 = b1.b3 and a.a4 = b1.b4 join
b b2
on a.a2 = b2.b5 and a.a3 = b2.b3 and a.a4 = b2.b4 join
b b3
on a.a2 = b.b2;
You may need left join if some conditions do not match.

Finding Permutations of columns in SQL

I have a reference data table having columns as codes and values.
For e.g. there are three code types viz. A, B, C.
The table is as below:
Code Value
---------------------
A1 a_one
A2 a_two
B1 b_one
B2 b_two
B3 b_three
C1 c_one
C2 c_two
C3 c_three
C4 c_four
---------------------
I have a requirement where the input will be code types and output should be all permutations between the input code types.
For e.g. if the input code types are A and C, the output of my sql should be:
col_1 col_2
---------------------
A1 C1
A1 C2
A1 C3
A1 C4
A2 C1
A2 C2
A2 C3
A2 C4
---------------------
Similarly if the input code types is A, B, C, the output of the sql will have three columns with all the permutations between A, B, C viz. A1 B1 C1 to A2 B3 C4.
I have no idea how to start on this. So any hints will be useful.
Thanks for reading!

If I understand your question correctly, this is one of those rare cases where a CROSS JOIN is actually what you want. A CROSS JOIN will give you the Cartesian product of two sets, which means all possible combinations between the values in those sets.
Example:
Table A with column 1 contains values 'a' and 'b'
Table B with column 2 contains values 'c' and 'd'
The following CROSS JOIN query (note there is no 'join condition' specified, on purpose):
SELECT *
FROM A
CROSS JOIN B
will return the following result:
1 2
--------
a c
a d
b c
b d
I created an SQL Fiddle to show you a possible solution. You can tweak it a bit to see if this is what you need. (Note it's an Oracle fiddle, as there is no DB2 option.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: self join using each rows only once [duplicate] - sql

Related

Select entries that have non repeating values on a specific column (although other columns may have repeating or non repeating values) (SQL)

Inconsistent results with jsonb_array_elements_text() twice in the SELECT list

SQL - Merge / Update & Delete from existing rows

How to reduce number of joins?

Finding Permutations of columns in SQL

Categories

Resources