SQL - extracting all pairs only once - sql

Problem statement:
a table with N columns, K out of which are used in a criterion to determine pairs of rows
such a criterion involving the K columns can simply be if columns c_1, c2, .. c_k are equal for the two different rows part of a pair (the criterion itself is not relevant, only the fact that it must be used)
the requirement is to extract all potential pairs, but only once. This means that if for a row there are more than 2 potential other rows that can form a pair given the above criterion, then only one pair must be extracted
Simple example:
Input table:
A | B | C
x | y | z
w | y | z
u | y | z
u | v | z
v | v | z
Criterion: B and C columns must be the same for two rows to be part of a pair.
Output:
x | y | z
w | y | z
u | v | z
v | v | z
What hints do you have for solving the problem in pure SQL (or in the Oracle dialect, if specific features help)?

If you can use window analytic function:
CREATE TABLE TT1 (A VARCHAR(4), B VARCHAR(4), C VARCHAR(4))
INSERT INTO TT1 VALUES ('x','y','z')
INSERT INTO TT1 VALUES ('w','y','z')
INSERT INTO TT1 VALUES ('u','y','z')
INSERT INTO TT1 VALUES ('u','v','z')
INSERT INTO TT1 VALUES ('v','v','z')
INSERT INTO TT1 VALUES ('k','w','z')
SELECT A.A, A.B, A.C
FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY B,C ORDER BY A DESC) RN, COUNT(*) OVER (PARTITION BY B,C ) RC
FROM TT1) A
WHERE A.RN <=2 AND RC>1
Output:
A B C
---- ---- ----
v v z
u v z
x y z
w y z

Use the COUNT() analytic function partitioning on those rows you want to match as pairs:
SELECT A, B, C
FROM (
SELECT t.*,
COUNT(*) OVER (
PARTITION BY B, C
ORDER BY A
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS current_rn,
COUNT(*) OVER (
PARTITION BY B, C
ORDER BY A
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING
) AS next_rn
FROM table_name t
)
WHERE MOD( current_rn, 2 ) = 0
OR MOD( next_rn, 2 ) = 0;
Output:
A B C
- - -
u y z
w y z
u v z
v v z

Related

Query to select rows with minimum distinct value of a column

I need to select row with minimum value of column B for each row of column A but it should be distinct from the other values that so far have been selected for column A. So the order of A maters. Also if the B is used up and none is left then the later values for A should be NULL or not appearing in the result.
Both A and B are numerical (or time stamp).
example:
A | B |
----+---+
1 | 3 |
1 | 5 |
1 | 6 |
2 | 3 |
2 | 5 |
9 | 3 |
9 | 5 |
So the desired result is:
A | B |
----+---+
1 | 3 |
2 | 5 |
select A, min(B) group by A obviously doesn't work because I don't want B to be repeated. Distinct also doesn't work because the rows are already distinct. I couldn't really find any question similar to this anywhere.
The actual data I am working with is the database of timeseries on redshift so A and B are timestamps. CTE's would be specifically welcome.
First I thought this could be solved with ROW_NUMBER () OVER (ORDER PARTITION BY B DESC) however there is a problem, the numbers in B should not be repeated.
At the moment the only thing that comes to mind is to make temporary tables, I know this is not the best way, but you can probably improve it
DECLARE #Tabla1 TABLE(A INT)
DECLARE #Tabla2 TABLE(B INT)
DECLARE #Tabla3 TABLE(A INT, B INT)
INSERT INTO #Tabla1 SELECT DISTINCT A FROM PRUEBA
WHILE (SELECT COUNT(*) FROM #Tabla1) > 0
BEGIN
DECLARE #A INT, #B INT;
SET #A = (SELECT TOP 1 * FROM #Tabla1);
SET #B = (SELECT MIN(B) FROM PRUEBA WHERE A = #A AND B NOT IN(SELECT * FROM #Tabla2));
INSERT INTO #Tabla2 VALUES (#B)
DELETE FROM #Tabla1 WHERE A = #A
INSERT INTO #Tabla3 SELECT A, B FROM PRUEBA WHERE A = #A AND B = #B
END
SELECT * FROM #Tabla3
Maybe you can use a cursor, but you would have to be calculated that takes more computational expense, the cursor or the temporary tables
This is basically a "find the diagonal" problem. You need to know the rank of B within A and the rank of A within all. I believe this works for the data given:
select A, B from (
select row_number() over (partition by A order by B) as RN,
dense_rank() over (order by A) as DR.
A, B
from <table> )
where RN = DR;
For more complex cases this solution will get more complex.
Addendum:
Because I know it will be asked and this is an interesting problem, I worked out what such a more complex solution would look like:
select min(A) as A, B from (
select decode(A <> nvl(min(A) over (order by DRB, DRA rows between unbounded preceding and 1 preceding),-1), true, 'good', 'no good') as Y,
A, B from (
select dense_rank() over (partition by B order by A) as DRA,
dense_rank() over ( order by B) as DRB,
A, B from <table>
)
where DRA <= DRB
)
where Y = 'good'
group by B
order by A, B;

SQL select only one row (with all attibute values) of each different value

in SQL Developer, i want to select only one row from my table (with all attibute values) of each different value.
It's not important what row is selected for each type, what matters is to select only one row for type.
for example i have this table:
| A | B | C |
X SS G
Y SB T
Z SB T
Note that in my table there aren't numbers.
The result i want is:
| A | B | C |
X SS G
Z SB T
But is correct also
| A | B | C |
X SS G
Y SB T
Thank you!
It isn't very clear what you want. You could get your result just with
Select distinct top 2 * from mytable
You want only 1 row from the rows with c = 'T', right?
select a, b, c from tablename where c <> 'T'
union all
select a, b, c from (
select a, b, c from tablename where c = 'T'
) where rownum <= 1
You can use the below if the value of A is not important
SELECT max(A) as A,B,C FROM your_table GROUP BY B,C
Thank you to all for your answers, but I solved in this way:
SELECT MAX(A), B, MAX(C)
FROM MY_TABLE
GROUP BY B;
With this query I can extract all values for each different type of B.
I hope this will be helpful for someone.

SQL Transposing columns to rows

I'm attempting to transpose a column of text and values to row headers. I've researched the PIVOT and UNPIVOT function but this function relies on aggregation from what I've gathered. Below is what I'm interested in achieving.
Source Table Schema:
[ID] [Category] [TextName]
1 A u
1 B v
1 C w
2 A x
2 B y
2 C z
Resulting transpose:
[ID] [A] [B] [C]
1 u v w
2 x y z
Is this possible?
SELECT id,
MIN( CASE WHEN Category = 'A' THEN TextName END ) AS A,
MIN( CASE WHEN Category = 'B' THEN TextName END ) AS B,
MIN( CASE WHEN Category = 'C' THEN TextName END ) AS C
FROM Table
GROUP BY id;
This is still a kind of aggregation even that we have a single value per cell (row-column combination).
Min/Max will give you the desired values since any basic type including strings have definition of Min/Max.
select *
from t pivot (min([TextName]) for [Category] in (A,B,C)) p
+----+---+---+---+
| ID | A | B | C |
+----+---+---+---+
| 1 | u | v | w |
+----+---+---+---+
| 2 | x | y | z |
+----+---+---+---+

Get all values that are the same across records in SQL

I need to get a list of all values that are the same across all records using SQL.
SELECT
Record,
Value
FROM Record r
INNER JOIN Value v
ON v.RecordId = r.RecordId
Record | Value
1 | a
1 | b
1 | c
2 | a
2 | b
3 | a
3 | b
3 | c
3 | d
I need the results to be:
Value
a
b
You don't need a JOIN for your query. I think this is equivalent:
SELECT v.RecordId, v.Value
FROM Value v;
From here, you want to find values that are in all records:
select v.value
from value v
group by v.value
having count(recordid) = (select count(distinct recordid) from value);
This finds values that have all records with values. If you want all records, then:
select v.value
from value v
group by v.value
having count(recordid) = (select count(*) from record r);

SQL ranking query to compute ranks and median in sub groups

I want to compute the Median of y in sub groups of this simple xy_table:
x | y --groups--> gid | x | y --medians--> gid | x | y
------- ------------- -------------
0.1 | 4 0.0 | 0.1 | 4 0.0 | 0.1 | 4
0.2 | 3 0.0 | 0.2 | 3 | |
0.7 | 5 1.0 | 0.7 | 5 1.0 | 0.7 | 5
1.5 | 1 2.0 | 1.5 | 1 | |
1.9 | 6 2.0 | 1.9 | 6 | |
2.1 | 5 2.0 | 2.1 | 5 2.0 | 2.1 | 5
2.7 | 1 3.0 | 2.7 | 1 3.0 | 2.7 | 1
In this example every x is unique and the table is already sorted by x.
I now want to GROUP BY round(x) and get the tuple that holds the median of y in each group.
I can already compute the median for the whole table with this ranking query:
SELECT a.x, a.y FROM xy_table a,xy_table b
WHERE a.y >= b.y
GROUP BY a.x, a.y
HAVING count(*) = (SELECT round((count(*)+1)/2) FROM xy_table)
Output: 0.1, 4.0
But I did not yet succeed writing a query to compute the median for sub groups.
Attention: I do not have a median() aggregation function available. Please also do not propose solutions with special PARTITION, RANK, or QUANTILE statements (as found in similar but too vendor specific SO questions). I need plain SQL (i.e., compatible to SQLite without median() function)
Edit: I was actually looking for the Medoid and not the Median.
I suggest doing the computing in your programming language:
for each group:
for each record_in_group:
append y to array
median of array
But if you are stuck with SQLite, you can order each group by y and select the records in the middle like this http://sqlfiddle.com/#!5/d4c68/55/0:
UPDATE: only bigger "median" value is importand for even nr. of rows, so no avg() is needed:
select groups.gid,
ids.y median
from (
-- get middle row number in each group (bigger number if even nr. of rows)
-- note the integer divisions and modulo operator
select round(x) gid,
count(*) / 2 + 1 mid_row_right
from xy_table
group by round(x)
) groups
join (
-- for each record get equivalent of
-- row_number() over(partition by gid order by y)
select round(a.x) gid,
a.x,
a.y,
count(*) rownr_by_y
from xy_table a
left join xy_table b
on round(a.x) = round (b.x)
and a.y >= b.y
group by a.x
) ids on ids.gid = groups.gid
where ids.rownr_by_y = groups.mid_row_right
OK, this relies on a temporary table:
create temporary table tmp (x float, y float);
insert into tmp
select * from xy_table order by round(x), y
But you could potentially create this for a range of data you were interested in. Another way would be to ensure the xy_table had this sort order, instead of just ordering on x. The reason for this is SQLite's lack of row numbering capability.
Then:
select tmp4.x as gid, t.* from (
select tmp1.x,
round((tmp2.y + coalesce(tmp3.y, tmp2.y)) / 2) as y -- <- for larger of the two, change to: (case when tmp2.y > coalesce(tmp3.y, 0) then tmp2.y else tmp3.y end)
from (
select round(x) as x, min(rowid) + (count(*) / 2) as id1,
(case when count(*) % 2 = 0 then min(rowid) + (count(*) / 2) - 1
else 0 end) as id2
from (
select *, rowid from tmp
) t
group by round(x)
) tmp1
join tmp tmp2 on tmp1.id1 = tmp2.rowid
left join tmp tmp3 on tmp1.id2 = tmp3.rowid
) tmp4
join xy_table t on tmp4.x = round(t.x) and tmp4.y = t.y
If you wanted to treat the median as the larger of the two middle values, which doesn't fit the definition as #Aprillion already pointed out, then you would simply take the larger of the two y values, instead of their average, on the third line of the query.