How to concat columns data using loop in Postgres?
I have this table:
+------+------+------+--------+--------+--------+
| col1 | col2 | col3 | other1 | other2 | other3 |
+------+------+------+--------+--------+--------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+------+------+------+--------+--------+--------+
and want to concat columns (col*).
Expected output:
+----------------+--------+--------+--------+
| concatedcolumn | other1 | other2 | other3 |
+----------------+--------+--------+--------+
| **1**1**1** | 1 | 1 | 1 |
| **2**2**2** | 2 | 2 | 2 |
+----------------+--------+--------+--------+
I can concat using:
select concat('**', col1, '**',col2, '**', col3, '**') as concatedcolumn
,other1, other2, other3
from sample_table
I have some 200 columns with prefix "col" and don't want to spell out all columns in sql. How could I achieve this with a loop?
Questionable database design aside, you can generate the SELECT statement dynamically:
SELECT 'SELECT concat_ws(''**'', '
|| string_agg(quote_ident(attname), ', ') FILTER (WHERE attname LIKE 'col%')
|| ') AS concat_col, '
|| string_agg(quote_ident(attname), ', ') FILTER (WHERE attname NOT LIKE 'col%')
|| ' FROM public.tbl;' -- your table name here
FROM pg_attribute
WHERE attrelid = 'public.tbl'::regclass -- ... and here
AND attnum > 0
AND NOT attisdropped;
db<>fiddle here
Query the system catalog pg_attribute or, alternatively, the information schema table columns. I prefer the system catalog.
Related answer on dba.SE discussing "information schema vs. system catalogs"
Execute in a second step (after verifying it's what you want).
No loop involved. You can build the statement dynamically, but you cannot (easily) return the result dynamically as SQL demands to know the return type at execution time.
concat_ws() is convenient, but it ignores NULL values. I didn't deal with those specially. You may or may not want to do that. Related:
Combine two columns and add into one new column
How to concatenate columns in a Postgres SELECT?
Related
I'm struggling to find a value that might be in different tables but using UNION is a pain as there are a lot of tables.
[Different table that contains the suffixes from the TestTable_]
| ID | Name|
| -------- | -----------|
| 1 | TestTable1 |
| 2 | TestTable2 |
| 3 | TestTable3 |
| 4 | TestTable4 |
TestTable1 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------- |
| 1 | goose | withFeather? |featherID |
| 2 | rooster| withoutFeather?|shinyfeatherID |
| 3 | rooster| age | 20 |
TestTable2 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------------------|
| 1 | brazilian_goose | withFeather? |featherID |
| 2 | annoying_rooster | withoutFeather?|shinyfeatherID |
| 3 | annoying_rooster | no_legs? |dead |
TestTable3 content:
| id | Name | q1 | a1 |
| -------- | ---------------------------------------- |
| 1 | goose | withFeather? |featherID |
| 2 | rooster| withoutFeather?|shinyfeatherID |
| 3 | rooster| age | 15 |
Common columns: q1 and a1
Is there a way to parse through all of them to lookup for a specific value without using UNION because some of them might have different columns?
Something like: check if "q1='age'" exists in all those tables (from 1 to 50)
Select q1,*
from (something)
where q1 exists in (TestTable_*)... or something like that.
If not possible, not a problem.
You could use dynamic SQL but something I do in situations like this where I have a list of tables that I want to quickly perform the same actions on is to either use a spreadsheet to paste the list of tables into and type a query into the cell with something like #table then use the substitute function to replace it.
Alternative I just paste the list into SSMS and use SHIFT+ALT+ArrowKey to select the column and start typing stuff out.
So here is my list of tables
Then I use that key combo. As you can see my cursor has now selected all those rows.
Now I can start typing and all rows selected will get the input.
Then I just go to the other side of the table names and repeat the action
It's not a perfect solution but it's quick a quick and dirty way of doing something repetitive quickly.
If you want to find all the tables with that column name you can use information schema.
Select table_name from INFORMATION_SCHEMA.COLUMNS where COLUMN_NAME = 'q1'
Given the type of solution you are after I can offer a method that I've had to use on legacy systems.
You can query sys.columns for the name of the column(s) you need to find in N tables and join using object_id to sys.tables where type='U'. This will give you a list of table names.
From this list you can then build a working query for each table, and depending on your requirements (is this ad-hoc?) either just manually execute it yourself of build a procedure that will do it for you using sp_executesql
Eg
select t.name, c.name
into #workingtable
from sys.columns c
join sys.tables t on t.object_id=c.object_id
where c.name in .....
psudocode:
begin loop while rows exist in #working table
select top 1 row from #workingtable
set #sql=your query specific to that table and column(s)
exec(#sql) / sp_executesql / try/catch as necessary
delete row from working table
end loop
Hopefully that give ideas at least for how you might implement your requirements.
I have a dataset with big int array column in s3 and I want to filter rows efficiently based on array values. I know we can use gin index in sql table but need solution to work on s3 dataset. I am planning to use cluster id for each combinations of elements in array (as their cardinality is not huge. max 2500) and then store it as new column on which later on filter can applied.
Example,
Table A
+------+------+-----------+
| Col1 | Col2 | Col3 |
+------+------+-----------+
| 1 | 101 | [123,234] |
| 2 | 102 | [123] |
| 3 | 103 | [234,345] |
+------+------+-----------+
I am trying to add new column like,
Table B (column Col3 will be removed from actual schema)
+------+------+-----------+-----------+
| Col1 | Col2 | Col3 | Cid |
+------+------+-----------+-----------+
| 1 | 101 | [123,234] | 1 |
| 2 | 102 | [123] | 2 |
| 3 | 103 | [234,345] | 3 |
+------+------+-----------+-----------+
and there will be another table of mapping for col3 and Cid like,
Table C
+-----------+-----+
| Col3 | Cid |
+-----------+-----+
| [123,234] | 1 |
| [123] | 2 |
| [234,345] | 3 |
+-----------+-----+
This table C will be added a new entry if a new combination is created and B will be updated if any array element gets added or removed. Goal is to be able to filter out records from Table A based on values in array column efficiently. Queries like
123 = Any(Col3) can be served as Cid = 2 or queries like [123, 345] = Any(Col3) can be served as Cid in (2,3).
Is there any better way to do solve this problem?
Also I am thinking of creating required combinations at runtime to limit number of combinations. Is it a good idea to create minimum combinations?
In Postgres, you can create the table and use join to calculate the values:
create table array_dim as
select col3 as arr, row_number() over (order by min(col1)) as array_id
from t
group by col3;
You can then add the new column:
select a.*, ad.array_id
from a join
array_dim ad
on a.col3 = ad.arr
We have a table in Big Query like below.
Input table:
Name | Interests
-----+----------
Bob | ["a"]
Sue | ["a","b"]
Joe | ["b","c"]
We want to convert the above table to below format to make it BI/Visualisation friendly.
Target/Required table:
+------------------+
| Name | a | b | c |
+------------------+
| Bob | 1 | 0 | 0 |
| Sue | 1 | 1 | 0 |
| Joe | 0 | 1 | 0 |
+------------------+
Note: The Interests column is an array datatype. Is this sort of transformation possible in Big Query? If yes, Any reference query?
Thanks in advance!
Below is for BigQuery Standard SQL and uses scripting features of BQ
#standardSQL
create temp table ttt as (
select name, interest
from `project.dataset.table`,
unnest(interests) interest
);
EXECUTE IMMEDIATE (
SELECT """
SELECT name, """ ||
STRING_AGG("""MAX(IF(interest = '""" || interest || """', 1, 0)) AS """ || interest, ', ')
|| """
FROM ttt
GROUP BY name
"""
FROM (
SELECT DISTINCT interest
FROM ttt
ORDER BY interest
)
);
if to apply to sample data from your question - output is
I'm using Aginity Workbench for Netezza for the first time.
Does anyone know how to list columns and column types? The typical SQL code snippets I found online don't seem to work.
Thanks!
This snippet should do what you want.
SELECT
tablename,
attname AS COL_NAME,
b.FORMAT_TYPE AS COL_TYPE,
attnum AS COL_NUM
FROM _v_table a
JOIN _v_relation_column b
ON a.objid = b.objid
WHERE a.tablename = 'ATT_TEST'
AND a.schema = 'ADMIN'
ORDER BY attnum;
TABLENAME | COL_NAME | COL_TYPE | COL_NUM
-----------+-------------+----------------------+---------
ATT_TEST | COL_INT | INTEGER | 1
ATT_TEST | COL_NUMERIC | NUMERIC(10,2) | 2
ATT_TEST | COL_VARCHAR | CHARACTER VARYING(5) | 3
ATT_TEST | COL_DATE | DATE | 4
(4 rows)
I have a table with N columns. Let's call them c1, c2, c3, c4, ... cN. Among multiple rows, I want to get a single row with COUNT DISTINCT(cX) for each X in [1, N].
c1 | c2 | ... | cn
0 | 4 | ... | 1
Is there a way I can do this (in a stored procedure) without writing every column name into the query manually?
Why?
We've had a problem where bugs in application servers mean we rewrite good column values with garbage inserted later. To solve this, I'm storing the information log-structure, where each row represents a logical UPDATE query. Then, when given a signal that the record is complete, I can determine if any values were (erroneously) overwritten.
An example of a single correct record in multiple rows: there is at most one value for each column.
| id | initialize_time | start_time | end_time |
| 1 | 12:00am | NULL | NULL |
| 1 | 12:00am | 1:00pm | NULL |
| 1 | 12:00am | NULL | 2:00pm |
Reconciled row:
| 1 | 12:00am | 1:00pm | 2:00pm |
An example of an irreconcilable record that I want to detect:
| id | initialize_time | start_time | end_time |
| 1 | 12:00am | NULL | NULL |
| 1 | 12:00am | 1:00pm | NULL |
| 1 | 9:00am | 1:00pm | 2:00pm | -- New initialize time => irreconcilable!
You need dynamic SQL for that, which means you have to create a function or run a DO command. Since you cannot return values directly from the latter, a plpgsql function it is:
CREATE OR REPLACE function f_count_all(_tbl text
, OUT columns text[]
, OUT counts bigint[])
RETURNS record LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE (
SELECT 'SELECT
ARRAY[' || string_agg('''' || quote_ident(attname) || '''', ', ') || ']
, ARRAY[' || string_agg('count(' || quote_ident(attname) || ')' , ', ') || ']
FROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl::regclass
AND attnum >= 1 -- exclude tableoid & friends (neg. attnum)
AND NOT attisdropped -- exclude deleted columns
GROUP BY attrelid
)
INTO columns, counts;
END
$func$;
Call:
SELECT * FROM f_count_all('myschema.mytable');
Returns:
columns | counts
--------------+--------
{c1, c2, c3} | {17, 1, 0}
More explanation and links about dynamic SQL and EXECUTE in this related question - or a couple more here on SO, try this search.
Related:
Count values for every column in a table
You could even try and return a polymorphic record type to get single columns dynamically, but that's rather complex and advanced. Probably too much effort for your case. More in this related answer.