SQL Group by on multiple columns - sql

My input is:
a b c
-------
A 5 3
A 4 2
B 3 1
B 5 3
I would like to get all a values having the same values in b and c, so the output should be as:
{A,B} 5 3
I am using the group by, but I am not achieving my goal.

In standard SQL, this would look like:
select b, c, listagg(a, ',') within group (order by a)
from t
group by b, c;
Not all databases support listagg(), but most have a method for concatenating strings.
In Hive, you would use collect_list() or collect_set():
select b, c, collect_list(a, ',')
from t
group by b, c;
You can convert the array back to a string, but I recommend keeping it as an array.

Related

Group by into single json colum

Using SQL I'd like to convert a table that looks like this
id
col11
col2
1
a
b
1
c
d
2
e
f
2
g
h
Into something that looks like this:
id
combined
1
[{col1: a, col2:b}, {col1: c, col2:d}]
1
[{col1: e, col2:f}, {col1: g, col2:h}]
We can try to use json_build_object function to build a JSON object out of a variadic argument list then use json_agg function.
SELECT id,
json_agg(json_build_object('col1',col11,
'col2',col2)) combined
FROM t
GROUP BY id
sqlfiddle

Convert column of arrays to table of K,V pairs in Presto

I have a presto with a column of string arrays I would like to convert to a table of each element in the array mapped to its number of occurrences.
A, B, C, D, E, F are all strings
set
---------
[A,B,C,D]
[A,C,E,F]
string|count
-------------
A 2
B 1
C 2
D 1
E 1
F 1
Well, you can use unnest() and aggregate:
select char, count(*)
from t cross join
unnest(t.set) as u(char)
group by char

Presto - how to perform correlations on between all columns in one query

I have a table in the following format:
A B C D
7 7 2 12
2 2 3 4
2 2 2 4
2 2 2 3
5 5 2 7
I would like to calculate correlations between each of the columns using the build-in correlation function (https://prestodb.io/docs/current/functions/aggregate.html corr(y, x) → double)
I could run over all the columns and perform the corr calculation each time with:
select corr(A,B) from table
but I would like to reduce the number of times I access presto and run it in one query if its possible.
Would it be possible to get as a result the column names that pass a certain threshold or at least the correlation scores between all possible combinations in one query?
Thanks.
I would like to calculate correlations between each of the columns
Correlation involves two series of data (in SQL, two columns). So I understand your question as: how to compute the correlation for each and every possible combination of columns in the table. That would look like:
select
corr(a, b) corr_a_b,
corr(a, c) corr_a_c,
corr(a, d) corr_a_d,
corr(b, c) corr_b_c,
corr(b, d) corr_c_d,
corr(c, d) corr_c_d
from mytable
You can use a lateral join to unpivot the table, then a self join and aggregation:
with v as (
select v.*, t.id
from (select t.*,
row_number() over (order by a) as id
from t
) t cross join lateral
(values ('a', a), ('b', b), ('c', c), ('d', d)
) v(col, val)
)
select v1.col, v2.col, corr(v1.val, v2.val)
from v v1 join
v v2
on v1.id = v2.id and v1.which < v2.which
group by v1.col, v2.col;
The row_number() is only to generate a unique id for each row, which is then used for the self-join. You may already have a column with this information, so that might not be necessary.

how to mix 2 tables(A,B) in 1 table (AB) with sql (db2 dialog) with special order between records

Please how to mix 2 tables(A,B) in 1 table(AB) with special order.
It is 2 tables, A and B with only 1 col. So it is a list/array.
I must order the row like this:
A.col1,A.col1,B.col1,B.col1,A.col1,A.col1,B.col1,B.col1,A.col1,A.col1,B.col1,B.col1 and so on.
To see it easily, it must be:
A,A,B,B,A,A,B,B,A,A
So 2 row from A, 2 row from B, 2 row from A, 2 row from B and so on
I would prefer with db2 sql dialog language, but if it isnt specific would be useful in any sql dialog
thanks
Try this:
SELECT Field1 FROM
(SELECT Field1, 1 AS S, ROW_NUMBER() OVER() AS N, FLOOR(ROW_NUMBER() OVER()/2) AS G FROM A
UNION ALL
SELECT Field1, 2 AS S, ROW_NUMBER() OVER() AS N, FLOOR(ROW_NUMBER() OVER()/2) AS G FROM B)
ORDER BY G, S, N
That should work in DB2. Unfortunately I don't have a DB2 database handy to test it so the code goes without any warranty.

Counting the rows of multiple distinct columns

I'm trying to count the number of rows that have distinct values in both of the columns "a" and "b" in my Sybase ISQL 9 database.
What I means is, the following dataset will produce the answer "4":
a b
1 9
2 9
3 8
3 7
2 9
3 7
Something like the following syntax would be nice:
SELECT COUNT(DISTINCT a, b) FROM MyTable
But this doesn't work.
I do have a solution:
SELECT COUNT(*) FROM
(SELECT a, b
FROM MyTable
WHERE c = 'foo'
GROUP BY a, b) SubTable
But I was wondering if there is a neater way of constructing this query?
How about:
SELECT COUNT(*)
FROM (SELECT DISTINCT a, b FROM MyTable)
For more information on why this can't be done in a simpler way (besides concatenating strings as noted in a different answer), you can refer to the this Google Answers post: Sql Distinct Count.
You could concatenate a and b together into 1 string like this (TSQL, hopefully something very similar in Sybase:
SELECT COUNT(DISTINCT(STR(a) + ',' + STR(b)))
FROM #YourTable