Snowflake SQL: concat values from multiple rows based on shared key - sql

I have a table of values where there are a variable number of rows per each key value. I want to output a table that concats those row values together onto each distinct key value.
INPUT TABLE
KEY_ID
SOURCE_VAL
1
a
1
b
1
c
2
d
3
e
3
f
Target OUTPUT TABLE
KEY_ID
OUTPUT_VAL
1
a,b,c
2
d
3
e,f
What is the most efficient way to write this in Snowflake SQL?

It could be done with LISTAGG:
SELECT KEY_ID,
LISTAGG(SOURCE_VAL, ',') WITHIN GROUP(ORDER BY SOURCE_VAL) AS OUTPUT_VAL
FROM tab
GROUP BY KEY_ID

Related

POSTGRESQL: Join columns

I have a POSTGRESQL version 12.5 database where I have a table that has three columns: c_id, columna and columnb. The three columns can have a different values.
I need to do a join their values into a single object like this:
Here is the sample data
I have a table that has 3 columns with the same type
c_id columna columnb
1 a b
2 c d
3 x y
I need to run a query that will join the columns columna and columnb like this:
c_id merge_column
1 {"columna":a, "columnb": "b"}
2 {"columna":d, "columnb": "d"}
3 {"columna":x, "columnb": "y"}
Any ideas?
You can convert the whole row into a JSON, the remove the c_id key:
select t.c_id, to_jsonb(t) - 'c_id' as merge_column
from the_table t
If there are more columns than you have shown, and you only want to get two of them, using jsonb_build_object() is probably easier:
select t.c_id,
jsonb_build_object('columna', t.columna, 'columnb', t.columnb) as merge_column
from the_table t

Combining two columns into a single map column in hive table

I have a hive table like
a b
-------------
1 2
3 4
How can I create a map column c
c
-----------
1: 2
3: 4
where column a is the keys and column b is the values?
Use map() construct in Hive:
select map(a,b) as c from mytable
In Presto you can use map(array[key], array[value])
select map(array[a],array[b]) as c from mytable
Or map_agg()
select map_agg(a,b) as c from mytable group by a,b

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Derive groups of records that match over multiple columns, but where some column values might be NULL

I would like an efficient means of deriving groups of matching records across multiple fields. Let's say I have the following table:
CREATE TABLE cust
(
id INT NOT NULL,
class VARCHAR(1) NULL,
cust_type VARCHAR(1) NULL,
terms VARCHAR(1) NULL
);
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL);
What I am looking to get is the set of IDs for which matching values unify a set of records over the three fields (class, cust_type and terms), so that I can apply a unique ID to the group.
In the example, records 1-4 constitute one match group over the three fields, while records 5-6 form a separate match.
The following does the job:
SELECT
DISTINCT
a.id,
DENSE_RANK() OVER (ORDER BY max(b.class),max(b.cust_type),max(b.terms)) AS match_group
FROM cust AS a
INNER JOIN
cust AS b
ON
a.class = b.class
OR a.cust_type = b.cust_type
OR a.terms = b.terms
GROUP BY a.id
ORDER BY a.id
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 2
6 2
**But, is there a better way?** Running this query on a table of over a million rows is painful...
As Graham pointed out in the comments, the above query doesn't satisfy the requirements if another record is added that would group all the records together.
The following values should be grouped together in one group:
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL),
(7,'D','B','C');
Would yield:
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 1
6 1
...because the class value of D groups records 5, 6 and 7. The terms value of C matches records 1, 2 and 4 to that group, and cust_type value B ( or class value A) pulls in record 3.
Hopefully that all makes sense.
I don't think you can do this with a (recursive) Select.
I did something similar (trying to identify unique households) using a temporary table & repeated updates using following logic:
For each class|cust_type|terms get the minimum id and update that temp table:
update temp
from
(
SELECT
class, -- similar for cust_type & terms
min(id) as min_id
from temp
group by class
) x
set id = min_id
where temp.class = x.class
and temp.id <> x.min_id
;
Repeat all three updates until none of them updates a row.

How to delete only recode from table?

I have one table in my database
Id Name
-------------------------
1 1 a
2 1 a
3 1 a
4 2 b
5 2 b
6 2 b
This my database table it's has 6 rows and 2 columns Id and Name
In this table field Id is not a primary key and i want to delete 2 number row from my by id field table
After Delete 2 row of table i want output like this
Id Name
-------------------------
1 1 a
3 1 a
4 2 b
5 2 b
6 2 b
Is it possible?
Your ID should be unique but here is the sql to delete all IDs that are 2.
Delete FROM table WHERE table.Id=2;
Replace 'table' with your table name.
Edit:
It appears like you want to delete the second result. I don't know why but here is the sql:
with rn AS
(
SELECT *, rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM table
)
DELETE
FROM rn
WHERE rn = 2
There must be some criteria in a table by which you could identify its rows. That is the primary key. How do know that the order of the rows stays the same? Your table is not even sortable, I mean you can't be sure that the same SELECT statement returns rows in the same order.
That's why I'd answer that you CAN'T delete only and exactly record number two, cause you have no order in your table. And one SELECT would result in different rows on the 2nd position.
If the Id field must have such values, probably you could add a surrogate primary key.