Pivot with column name in Postgres - sql

I have the following table tbl:
column1 | column2 | column 3
-----------------------------------
1 | 'value1' | 3
2 | 'value2' | 4
How to do "pivot" with column names to produce output like:
column1 | 1 | 2
column2 | 'value1' |'value2'
column3 | 3 | 4

As has been commented, the issue of data types is undefined in the question.
If you are OK with all result columns being type text (every data type can be converted to text), you can use one of these:
Plain SQL
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
(1, t.column1::text)
, (2, t.column2)
, (3, t.column3::text)
) nu(rn, c)
)
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn);
The same with useful column names:
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
('column1', t.column1::text)
, ('column2', t.column2)
, ('column3', t.column3::text)
) nu(rn, c)
)
SELECT * FROM (
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn)
) t (key, row1, row2);
Works in any modern version of Postgres.
The SQL string has to be adapted to the number of rows and columns. See fiddles below!
Using a document type as stepping stone
Makes for shorter code.
With many rows and many columns, performance of the SQL solution may scale better because the intermediate derived table is smaller.
(The thread is limited as you can't have more than ~ 1600 table columns in Postgres.)
Since everything is converted to text anyway, hstore seems most efficient. See:
Key value pair in PostgreSQL
SELECT key
, arr[1] AS row1
, arr[2] AS row2
FROM (
SELECT x.key, array_agg(x.value) AS arr
FROM tbl t, each(hstore(t)) x
GROUP BY 1
) sub
ORDER BY 1;
Technically speaking we would have to enforce the right sort order when in array_agg(), but that should work without explicit ORDER BY. To be absolutely sure you can add one: array_agg(x.value ORDER BY t.ctid) Using ctid for lack of information.
You can do the same with JSON functions in (Postgres 9.3+). Just replace each(hstore(t) with json_each_text(row_to_json(t). The rest is identical.
These fiddles demonstrate how to scale each query:
Original example with 2 rows of 3 columns:
db<>fiddle here
Scaled up to 3 rows of 4 columns:
db<>fiddle here

Related

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Accessing 2th element in varray column

Let's say a have a table with a varray column, defined as follow:
create or replace TYPE VARRAY_NUMBER_LIST AS VARRAY(15) OF NUMBER;
Now, I'm trying to select the first element of each varray column of my table. It works fine:
select (select * from table(myvarraycolumn) where rownum = 1) from mytable cc
It is returning an output like:
2
1
4
4
2
2
My issue occurs when I try to get the second element of each varray column with this SQL:
select (select * from table(myvarraycolumn) where rownum = 2) from mytable cc
In this case, all output lines are returning null. Please, let me know if I'm forgetting something or making some confusion.
You need to select rows 1 and 2 and then work out a way to filter out the unwanted preceding rows - one way is to use aggregation with a CASE statement to only match the second row:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE mytable ( myvarraycolumn ) AS
SELECT SYS.ODCINUMBERLIST( 1, 2, 3 ) FROM DUAL UNION ALL
SELECT SYS.ODCINUMBERLIST( 4, 5, 6 ) FROM DUAL;
Query 1:
SELECT (
SELECT MAX( CASE ROWNUM WHEN 2 THEN COLUMN_VALUE END )
FROM TABLE( t.myvarraycolumn )
WHERE ROWNUM <= 2
) AS second_element
FROM mytable t
Results:
| SECOND_ELEMENT |
|----------------|
| 2 |
| 5 |
My issue occurs when I try to get the second element of each varray column with this SQL:
select (select * from table(myvarraycolumn) where rownum = 2) from mytable cc
In this case, all output lines are returning null. Please, let me know if I'm forgetting something or making some confusion.
It is not working because: for the first row in the correlated inner query, ROWNUM is 1 and your filter is WHERE ROWNUM = 2 then this reduces to WHERE 1=2 and the filter is not matched and the row is discarded. The subsequent row will then be tested against a ROWNUM of 1 (since the previous row is no longer in the output and will not have a row number), which will again fail the test and be discarded. Repeat, ad nauseum and all rows fail the WHERE filter and are discarded.

Determine source on COALESCE fields

I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.
As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source
The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?

create a table of duplicated rows of another table using the select statement

I have a table with one column containing different integers.
For each integer in the table I would like to duplicate it as the number of digits -
For example:
12345 (5 digits):
1. 12345
2. 12345
3. 12345
4. 12345
5. 12345
I thought doing it using with recursion t (...) as () but I didn't manage, since I don't really understand how it works and what is happening "behind the scenes.
I don't want to use insert because I want it to be scalable and automatic for as many integers as needed in a table.
Any thoughts and an explanation would be great.
The easiest way is to join to a table with numbers from 1 to n in it.
SELECT n, x
FROM yourtable
JOIN
(
SELECT day_of_calendar AS n
FROM sys_calendar.CALENDAR
WHERE n BETWEEN 1 AND 12 -- maximum number of digits
) AS dt
ON n <= CHAR_LENGTH(TRIM(ABS(x)))
In my example I abused TD's builtin calendar, but that's not a good choice, as the optimizer doesn't know how many rows will be returned and as the plan must be a Product Join it might decide to do something stupid. So better use a number table...
Create a numbers table that will contain the integers from 1 to the maximum number of digits that the numbers in your table will have (I went with 6):
create table numbers(num int)
insert numbers
select 1 union select 2 union select 3 union select 4 union select 5 union select 6
You already have your table (but here's what I was using to test):
create table your_table(num int)
insert your_table
select 12345 union select 678
Here's the query to get your results:
select ROW_NUMBER() over(partition by b.num order by b.num) row_num, b.num, LEN(cast(b.num as char)) num_digits
into #temp
from your_table b
cross join numbers n
select t.num
from #temp t
where t.row_num <= t.num_digits
I found a nice way to perform this action. Here goes:
with recursive t (num,num_as_char,char_n)
as
(
select num
,cast (num as varchar (100)) as num_as_char
,substr (num_as_char,1,1)
from numbers
union all
select num
,substr (t.num_as_char,2) as num_as_char2
,substr (num_as_char2,1,1)
from t
where char_length (num_as_char2) > 0
)
select *
from t
order by num,char_length (num_as_char) desc

Adding Row Numbers To a SELECT Query Result in SQL Server Without use Row_Number() function

i need Add Row Numbers To a SELECT Query without using Row_Number() function.
and without using user defined functions or stored procedures.
Select (obtain the row number) as [Row], field1, field2, fieldn from aTable
UPDATE
i am using SAP B1 DIAPI, to make a query , this system does not allow the use of rownumber() function in the select statement.
Bye.
I'm not sure if this will work for your particular situation or not, but can you execute this query with a stored procedure? If so, you can:
A) Create a temp table with all your normal result columns, plus a Row column as an auto-incremented identity.
B) Select-Insert your original query, sans the row column (SQL will fill this in automatically for you)
C) Select * on the temp table for your result set.
Not the most elegant solution, but will accomplish the row numbering you are wanting.
This query will give you the row_number,
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
but there are some restrictions when you want to use it. You have to have one column in your table (in the example it is field) which is unique and numeric and you can use it as a reference. For example:
DECLARE #table TABLE
(
field INT,
otherField VARCHAR(10)
)
INSERT INTO #table(field,otherField) VALUES (1,'a')
INSERT INTO #table(field,otherField) VALUES (4,'b')
INSERT INTO #table(field,otherField) VALUES (6,'c')
INSERT INTO #table(field,otherField) VALUES (7,'d')
SELECT * FROM #table
returns
field | otherField
------------------
1 | a
4 | b
6 | c
7 | d
and
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
returns
row_number | field | otherField
-------------------------------
1 | 1 | a
2 | 4 | b
3 | 6 | c
4 | 7 | d
This is the solution without functions and stored procedures, but as I said there are the restrictions. But anyway, maybe it is enough for you.
RRUZ, you might be able to hide the use of a function by wrapping your query in a View. It would be transparent to the caller. I don't see any other options, besides the ones already mentioned.