Convert SQL Data in columns into an array

Convert SQL Data in columns into an array - sql

Imagine you have a simple table
Key c1 c2 c3
A id1 x y z
B id2 q r s
what I would like is a query that gives me the result as 2 arrays
so something like
Select
id1,
ARRAY_MAGIC_CREATOR(c1, c2, c3)
from Table
With the result being
id1, <x,y,z>
id2, <q,r,s>
Almost everything I have searched for end up converting rows to arrays or other similar sounding but very different requests.
Does something like this even exist in SQL?
Please note that the data type is NOT a string so we can't use string concat. They are all going to be treated as floats.

It is called ARRAY:
Select id1, ARRAY[c1, c2, c3] as c_array
from Table

This will also work :o)
select key, [c1, c2, c3] c
from `project.dataset.table`

Consider below generic option which does not require you to type all columns names or even to know them in advance - more BigQuery'ish way of doing business :o)
select key,
regexp_extract_all(
to_json_string((select as struct * except(key) from unnest([t]))),
r'"[^":,]+":([^":,]+)(?:,|})'
) c
from `project.dataset.table` t
If applied to sample data in your question - output is

Related

BigQuery - Concatenate multiple columns into a single column for large numbers of columns

I have data that looks like:
row
col1
col2
col3
...
coln
1
A
null
B
...
null
2
null
B
C
...
D
3
null
null
null
...
A
I want to condense the columns together to get:
row
final
1
A, B
2
B, C, D
3
A
The order of the letters doesn't matter, and if the solution includes the nulls eg. A,null,B,null ect. I can work out how to remove them later. I've used up to coln as I have about 200 columns to condense.
I've tried a few things and if I were trying to condense rows I could use STRING_AGG() example
Additionally I could do this:
SELECT
CONCAT(col1,", ",col2,", ",col3,", ",coln) #ect.
FROM mytable
However, this would involve writing out each column name by hand which isn't really feasible. Is there a better way to achieve this ideally for the whole table.
Additionally CONCAT returns NULL if any value is NULL.

#standardSQL
select row,
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.* except(row))), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as final
from `project.dataset.table` t
if to apply to sample data in your question - output is

Not in exact format that you asked for, but you can try if this simplifies things for you:
SELECT TO_JSON_STRING(mytable) FROM mytable
If you want the exact format, you can write a regex to extract values from the output JSON string.

Postgres union of queries in loop

I have a table with two columns. Let's call them
array_column and text_column
I'm trying to write a query to find out, for K ranging from 1 to 10, in how many rows does the value in text_column appear in the first K elements of array_column
I'm expecting results like:
k | count
________________
1 | 70
2 | 85
3 | 90
...
I did manage to get these results by simply repeating the query 10 times and uniting the results, which looks like this:
SELECT 1 AS k, count(*) FROM table WHERE array_column[1:1] #> ARRAY[text_column]
UNION ALL
SELECT 2 AS k, count(*) FROM table WHERE array_column[1:2] #> ARRAY[text_column]
UNION ALL
SELECT 3 AS k, count(*) FROM table WHERE array_column[1:3] #> ARRAY[text_column]
...
But that doesn't looks like the correct way to do it. What if I wanted a very large range for K?
So my question is, is it possible to perform queries in a loop, and unite the results from each query? Or, if this is not the correct approach to the problem, how would you do it?
Thanks in advance!

You could use array_positions() which returns an array of all positions where the argument was found in the array, e.g.
select t.*,
array_positions(array_column, text_column)
from the_table t;
This returns a different result but is a lot more efficient as you don't need to increase the overall size of the result. To only consider the first ten array elements, just pass a slice to the function:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t;
To limit the result to only rows that actually contain the value you can use:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t
where text_column = any(array_column[1:10]);
To get your desired result, you could use unnest() to turn that into rows:
select k, count(*)
from the_table t, unnest(array_positions(array_column[1:10], text_column)) as k
where text_column = any(array_column[1:10])
group by k
order by k;

You can use the generate_series function to generate a table with the expected number of rows with the expected values and then join to it within the query, like so:
SELECT t.k AS k, count(*)
FROM table
--right join ensures that you will get a value of 0 if there are no records meeting the criteria
right join (select generate_series(1,10) as k) t
on array_column[1:t.k] #> ARRAY[text_column]
group by t.k
This is probably the closest thing to using a loop to go through the results without using something like PL/SQL to do an actual loop in a user-defined function.

Only reads columns with values + print date from same column (complicated)

Here is something for your brains to bite on :D
Im not able to solve this out by myself. My table has the same princip as the fiddle example but col1-col32 instead of only col1-col5 like in the example.
http://sqlfiddle.com/#!6/6f6da
Goal is to get the output:
Apples, 20120104, 9.73
Berries, 20120101, 4.00
Berries, 20120103, 3.50
Bananas, 20120101,2.30
Kiwi, 20120103, 5.55
I know that the table has bad columns names and that the data is badly stored. Im not searching for help how to change the table, i have to work with the data as it is.
Thanks for your help

It is not so complicated as it seems:
;with cte as(
Select * from example
unpivot(c for d in([col2],[col3],[col4],[col5]))u
)
select c2.col1, c2.c, c1.c from cte c1
Join cte c2 on c1.d = c2.d
where c1.col1 = 'datum' and c2.col1 <> 'datum' and c2.c <> '0.00'
Fiddle http://sqlfiddle.com/#!6/6f6da/22

Browse subcolumns, but discard some

I have a table (or view) in my PostgreSQL database and want to do the following:
Query the table and feed a function in my application subsequent n-tuples of rows from the query, but only those that satisfy some condition. I can do the n-tuple listing using a cursor, but I don't know how to do the condition checking on database level.
For example, the query returns:
3
2
4
2
0
1
4
6
2
And I want triples of even numbers. Here, they would be:
(2,4,2) (4,2,0) (4,6,2)
Obviously, I cannot discard the odd numbers from the query result. Instead using cursor, a query returning arrays in similar manner would also be acceptable solution, but I don't have any good idea how to use them to do this.
Of course, I could check it at application level, but I think it'd be cleaner to do it on database level. Is it possible?

With the window function lead() (as mentioned by #wildplasser):
SELECT *
FROM (
SELECT tbl_id, i AS i1
, lead(i) OVER (ORDER BY tbl_id) AS i2
, lead(i, 2) OVER (ORDER BY tbl_id) AS i3
FROM tbl
) sub
WHERE i1%2 = 0
AND i2%2 = 0
AND i3%2 = 0;
There is no natural order of rows - assuming you want to order by tbl_id in the example.
% .. modulo operator
SQL Fiddle.

You can also use an array aggregate for this instead of using lag:
SELECT
a[1] a1, a[2] a2, a[3] a3
FROM (
SELECT
array_agg(i) OVER (ORDER BY tbl_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM
tbl
) x(a)
WHERE a[1] % 2 = 0 AND a[2] % 2 = 0 AND a[3] % 2 = 0;
No idea if this'll be better, worse, or the same as Erwin's answer, just putting it in for completeness.

Unique aggregate function when singular value is guaranteed by the WHERE clause

Given the following:
CREATE TABLE A (A1 INTEGER, A2 INTEGER, A3 INTEGER);
INSERT INTO A(A1, A2, A3) VALUES (1, 1, 1);
INSERT INTO A(A1, A2, A3) VALUES (2, 1, 1);
I want to select the maximum A1 given specific A2 and A3 values, and have those values (A2 and A3) also appear in the returned row (e.g. so that I may use them in a join since the SELECT below is meant for a sub-query).
It would seem logical to be able to do the following, given that A2 and A3 are hardcoded in the WHERE clause:
SELECT MAX(A1) AS A1, A2, A3 FROM A WHERE A2=1 AND A3=1
However, PostgreSQL (and I suspect other RDBMs as well) balks at that and requests an aggregate function for A2 and A3 even though their value is fixed. So instead, I either have to do a:
SELECT MAX(A1) AS A1, MAX(A2), MAX(A3) FROM A WHERE A2=1 AND A3=1
or a:
SELECT MAX(A1) AS A1, 1, 1 FROM A WHERE A2=1 AND A3=1
The first alternative I don't like cause I could have used MIN instead and it would still work, whereas the second alternative doubles the number of positional parameters to provide values for when used from a programming language interface. Ideally I would have wanted a UNIQUE aggregate function which would assert that all values are equal and return that single value, or even a RANDOM aggregate function which would return one value at random (since I know from the WHERE clause that they are all equal).
Is there an idiomatic way to write the above in PostgreSQL?

Even simpler, you only need ORDER BY / LIMIT 1
SELECT a1, a2, a3 -- add more columns as you please
FROM a
WHERE a2 = 1 AND a3 = 1
ORDER BY 1 DESC -- 1 is just a positional reference (syntax shorthand)
LIMIT 1;
LIMIT 1 is Postgres specific syntax.
The SQL standard would be:
...
FETCH FIRST 1 ROWS ONLY
My first answer with DISTINCT ON was for the more complex case where you'd want to retrieve the maximum a1 per various combinations of (a2,a3)
Aside: I am using lower case identifiers for a reason.

how about group by
select
a2
,a3
,MAX(a1) as maximumVal
from a
group by a2, a3

Does this work for you ?
select max(A1),A2,A3 from A GROUP BY A2,A3;
EDIT
select A1,A2,A3 from A where A1=(select max(A1) from A ) limit 1

A standard trick to obtain the maximal row without an aggregate function is to guantee the absense of a larger value by means of a NOT EXISTS subquery. (This does not work when there are ties, but neither would the subquery with the max) When needed, it would not be too difficult to add a tie-breaker condition.
Another solution would be a subquery with a window function row_number() or rank()
SELECT *
FROM a src
WHERE NOT EXISTS ( SELECT * FROM a nx
WHERE nx.a2 = src.a2
AND nx.a3 = src.a3
AND nx.a1 > src.a1
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert SQL Data in columns into an array - sql

It is called ARRAY: Select id1, ARRAY[c1, c2, c3] as c_array from Table

This will also work :o) select key, [c1, c2, c3] c from `project.dataset.table`

Related

BigQuery - Concatenate multiple columns into a single column for large numbers of columns

Postgres union of queries in loop

Only reads columns with values + print date from same column (complicated)

Browse subcolumns, but discard some

Unique aggregate function when singular value is guaranteed by the WHERE clause

Categories

Resources