Unpivot in Potgresql - sql

How can I unpivot in Postgresql without using UNION? I have more than 100 columns, and I am looking for a neat way to do it.
Given table:
id c1 c2 c3
1 X Y Z
2 A B C
3 Y C Z
Desired table:
id col
1 X
1 Y
1 Z
2 A
2 B
2 C
3 Y
3 C
3 Z

Use jsonb functions:
select id, value as col
from my_table
cross join jsonb_each_text(to_jsonb(my_table))
where key <> 'id';
id | value
----+-------
1 | X
1 | Y
1 | Z
2 | A
2 | B
2 | C
3 | Y
3 | C
3 | Z
(9 rows)
Db<>Fiddle.
In Postgres 9.3 or 9.4 use to_json() and json_each_text().
In versions 9.1 or 9.2 install hstore:
create extension if not exists hstore;
select id, value as col
from my_table
cross join each(hstore(my_table))
where key <> 'id';

Related

SQL cross match IDs to create new cross-platform ID -> how to optimize

I have a Redshift table with two columns which shows which ID's are connected, that is, belonging to the same person. I would like to make a mapping (extra column) with a unique person ID using SQL.
The problem is similar to this one: SQL: creating unique id for item with several ids
However in my case the ID's in both columns are of a different kind, and therefor the suggested joining solution (t1.epid = t2.pid, etc..) will not work.
In below example there are 4 individual persons using 9 IDs of type 1 and 10 IDs of type 2.
ID_type1 | ID_type2
---------+--------
1 | A
1 | B
2 | C
3 | C
4 | D
4 | E
5 | E
6 | F
7 | G
7 | H
7 | I
8 | I
8 | J
9 | J
9 | B
What I am looking for is an extra column with a mapping to a unique ID for the person. The difficulty is in correctly identifying the IDs related to persons like x & z which have multiple IDs of both types. The result could look something this:
ID_type1 | ID_type2 | ID_real
---------+---------------------
1 | A | z
1 | B | z
2 | C | y
3 | C | y
4 | D | x
4 | E | x
5 | E | x
6 | F | w
7 | G | z
7 | H | z
7 | I | z
8 | I | z
8 | J | z
9 | J | z
9 | B | z
I wrote below query which goes up to 4 loops and does the job for a small dataset, however is struggling with larger sets as the number of rows after joining increase very fast each loop. I am stuck in finding ways to do this more effective / efficient.
WITH
T1 AS(
SELECT DISTINCT
l1.ID_type1 AS ID_type1,
r1.ID_type1 AS ID_type1_overlap
FROM crossmatch_example l1
LEFT JOIN crossmatch_example r1 USING(ID_type2)
ORDER BY 1,2
),
T2 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T1 l1
LEFT JOIN T1 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
T3 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T2 l1
LEFT JOIN T2 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
T4 AS(
SELECT DISTINCT
l1.ID_type1,
r1.ID_type1_overlap
FROM T3 l1
LEFT JOIN T3 r1 on l1.ID_type1_overlap = r1.ID_type1
ORDER BY 1,2
),
mapping AS(
SELECT ID_type1,
min(ID_type1_overlap) AS mapped
FROM T4
GROUP BY 1
ORDER BY 1
),
output AS(
SELECT DISTINCT
l1.ID_type1::INT AS ID_type1,
l1.ID_type2,
FUNC_SHA1(r1.mapped) AS ID_real
FROM crossmatch_example l1
LEFT JOIN mapping r1 on l1.ID_type1 = r1.ID_type1
ORDER BY 1,2)
SELECT * FROM output
What you're trying to do is called Transitive Closure. There are articles about how to implement it in SQL.
This is an example in Spark linq-like dsl https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkTC.scala.
The solution to the problem is iterative, and to fully resolve the graph, you may need to apply more iterations. What can be optimised is the input for each iteration. I remember working on it once, but cannot recall the details.

SQL MS ACCESS: selective operations between tables and filtering results

I need some help to perform the following actions in MS Access with a SQL query.
The operations I would like to perform are illustrated in the following example:
Initial tables
TABLE A
Name H1 H2 H3
A 5 10 5
B 1 2 3
C 7 3 1
TABLE B:
Name H1 H2 H3
1 1 1 1
2 2 2 2
1) First step: Results
NAME TABLE A NAME TABLE B H1 H2 H3
A 1 4 9 4
A 2 3 8 3
B 1 0 1 2
B 2 1 0 1
C 1 6 2 0
C 2 5 1 1
So, the first row of this new table is calculated as the ABSOLUTEVALUE( TABLE A (row A)-TABLE B(row1)), the second row of this table would be ABSOLUTEVALUE( TABLE A (row A)-TABLE B(row2)) and so on.
2) Second step: Results
NAME TABLE A NAME TABLE B H1 H2 H3 Total
A 1 4 9 4 17
A 2 3 8 3 14
B 1 0 1 2 3
B 2 1 0 1 2
C 1 6 2 0 8
C 2 5 1 1 7
So in this step, I will need to add a field whis is calculated as the sum of values H1, H2 and H3 of each row
3) Final step: Results
Name H1 H2 H3
A 3 8 3
B 1 0 1
C 5 1 1
And in the final step, we select those A, B & C rows from the previous table in which the field Total has the minimum value.
Thanks!
For Step 1 please try...
SELECT A.NameA AS [NAME TABLE A],
B.NameB AS [NAME TABLE B],
ABS( A.H1 - B.H1 ) AS H1,
ABS( A.H2 - B.H2 ) AS H2,
ABS( A.H3 - B.H3 ) AS H3
FROM A,
B;
For Step 2 please try...
SELECT A.NameA AS [NAME TABLE A],
B.NameB AS [NAME TABLE B],
ABS( A.H1 - B.H1 ) AS H1,
ABS( A.H2 - B.H2 ) AS H2,
ABS( A.H3 - B.H3 ) AS H3,
H1 + H2 + H3 AS [Total]
FROM A,
B;
For Step 3 please try either...
SELECT A.NameA AS [NAME TABLE A],
MIN( ABS( A.H1 - B.H1 ) ) AS H1,
MIN( ABS( A.H2 - B.H2 ) ) AS H2,
MIN( ABS( A.H3 - B.H3 ) ) AS H3
FROM A,
B
GROUP BY A.NameA;
As per my comment to AVG, this situation uses the Cartesian product of two tables, which is where each record in the first table is joined to each of the records from the second table. This can be achieved by performing a CROSS JOIN as I have done by placing FROM A, B in each of my statements. This join gives us the following dataset...
NameA | A.H1 | A.H2 | A.H3 | NameB | B.H1 | B.H2 | B.H3
------|------|------|------|-------|------|------|-----
A | 5 | 10 | 5 | 1 | 1 | 1 | 1
A | 5 | 10 | 5 | 2 | 2 | 2 | 2
A | 5 | 10 | 5 | 1 | 1 | 1 | 1
A | 5 | 10 | 5 | 2 | 2 | 2 | 2
A | 5 | 10 | 5 | 1 | 1 | 1 | 1
A | 5 | 10 | 5 | 2 | 2 | 2 | 2
(Please note that when a field is joined to another table and its name does not exist in that other table, then you will be able to continue referring to it just by its name without needing to specify the table name (though you can still do that if you choose). If the new field does share a name with a field in the other table, then each field will need to be referred to by both the table name and the field name.)
This dataset can be used for all three tasks.
For the first task the ABS() function can be used on the difference between the H1 values, etc. Please note that if you generate a field, such as with ABS( A.H1 - B.H1 ), and do not give it a name using AS, then the new field will be arbitrarily given a name, which is often the expression that generated the field (ABS( A.H1 - B.H1 ) in this case) or something else unwieldy. As such it is strongly recommended that you name all generated fields if you intend to refer to them in other parts of the equation (or elsewhere).
For the second task an expression that simply adds up the computed H fields, such as H1 + H2 + H3, will suffice.
For the third task we can use the dataset generated in the first task, sans the NameB column. We can then group the rows together by the value of NameA, and use the aggregate function MIN() to choose the minimum value from each H column.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
How to include this SQL subquery for absolute number's value? (on ABS())
How to use cross join in access? (on using a CROSS JOIN in Access)
http://www.w3resource.com/sql/joins/cross-join.php (on SQL Cross Joins in general)

Numbering all rows by an order

I have a table with a two column PK. I'd like to add a new column, nid, which numbers each row (1,2,3...), based on a particular ORDER BY.
So:
x | y | z
3 7 2
1 4 1
When numbered by z ASC becomes:
x | y | z | nid
3 7 2 | 2
1 4 1 | 1
Can I do this in SQL (Postgres 9.4)?
If I understand correctly, you can just use row_number():
select x, y, z, row_number() over (order by z) as nid
from t;

SQL Transposing columns to rows

I'm attempting to transpose a column of text and values to row headers. I've researched the PIVOT and UNPIVOT function but this function relies on aggregation from what I've gathered. Below is what I'm interested in achieving.
Source Table Schema:
[ID] [Category] [TextName]
1 A u
1 B v
1 C w
2 A x
2 B y
2 C z
Resulting transpose:
[ID] [A] [B] [C]
1 u v w
2 x y z
Is this possible?
SELECT id,
MIN( CASE WHEN Category = 'A' THEN TextName END ) AS A,
MIN( CASE WHEN Category = 'B' THEN TextName END ) AS B,
MIN( CASE WHEN Category = 'C' THEN TextName END ) AS C
FROM Table
GROUP BY id;
This is still a kind of aggregation even that we have a single value per cell (row-column combination).
Min/Max will give you the desired values since any basic type including strings have definition of Min/Max.
select *
from t pivot (min([TextName]) for [Category] in (A,B,C)) p
+----+---+---+---+
| ID | A | B | C |
+----+---+---+---+
| 1 | u | v | w |
+----+---+---+---+
| 2 | x | y | z |
+----+---+---+---+

How to select extra columns while using group by clause?

I have a table which contains data in this format.
productid filterName boolfilter numericfilter
1 X 1 NULL
1 Y NULL 99inch
1 Z 0 NULL
2 Y NULL 55kg
2 Y NULL 45kg
3 K NULL 20
3 M NULL 35
3 N NULL 25
4 X 1 NULL
4 K 1 NULL
I need data in this format.
Need products where only numeric filters are setup but no boolean filters
productid filterName numericfilter
2 Y 55kg
2 Y 45kg
3 K 20
3 M 35
3 N 25
I have written this query,
SELCT productid
FROM tbl_filters
GROUP BY productid
HAVING SUM(CAST(boolfilter AS INT)) IS NULL
I am getting prouctid 2 and 3, but i need the extra columns also as i have mentioned.
When i am using multiple columns in groupby clause i am not getting the required output.
SELECT t.productid, t.filterName, t.numericfilter
FROM Table_Name t
WHERE t.numericfilter IS NOT NULL
AND NOT EXISTS (SELECT 1
FROM TABLE_NAME
WHERE t.productid = productid
AND boolfilter IS NOT NULL)
Working SQL FIDDLE
| PRODUCTID | FILTERNAME | NUMERICFILTER |
|-----------|------------|---------------|
| 2 | Y | 55kg |
| 2 | Y | 45kg |
| 3 | K | 20 |
| 3 | M | 35 |
| 3 | N | 25 |
Use window functions instead:
SELECT productid, filterName, numericfilter
FROM (SELECT f.*,
MAX(boolfilter) OVER (PARTITION BY productid) as maxbf
FROM tbl_filters f
) f
WHERE maxbf is null;
Fiddle DEMO.
This calculates the maximum of boolfilter for each productid. If it is always NULL, then the result is NULL. Note that you don't need a cast() for this.