Snowflake - Count distinct values in comma seperated list

Snowflake - Count distinct values in comma seperated list - sql

I basically have a column that looks like below.
"[
""what"",
""how"",
]"
"[
""how"",
""what"",
]"
"[
""project_management"",
""do it"",
""personal""
]"
"[
""do it"",
""finance"",
""events"",
""save""
]"
"[
""do it"",
""sales"",
""events""
]"
"[
""finance"",
""sales"",
""events""
]"
"[
""events""
]"
I am simple trying to get a count of each unique instance/value within the column and output value counts for each value that is seperated by a string. The output should look like the following:
What: 2
how: 2
do it: 3
Finance: 4
etc.
I tried the following but the problem is it only counts lists that repeat itself and not the individual values within the list itself
select (i.OUTCOMES), count(i.OUTCOMES)
from table i
GROUP BY 1;

You'll need to flatten the values.
If the variant is an array as described:
with data as (
select parse_json('["a", "b"]') v
union select parse_json('["a", "a", "c"]')
)
select x.value::string val, count(*) c
from data, table(flatten(v)) x
group by 1
;

It seems like an array, so you need to use flatten two times:
with data as (
select ARRAY_CONSTRUCT( ARRAY_CONSTRUCT('what','how'),
ARRAY_CONSTRUCT('how','what'),
ARRAY_CONSTRUCT('project_management','do it', 'personal') ) OUTCOMES
)
select item.VALUE::string, count(*) from data,
lateral flatten( OUTCOMES ) v,
lateral flatten( v.VALUE ) item
group by item.VALUE;
+--------------------+----------+
| ITEM.VALUE::STRING | COUNT(*) |
+--------------------+----------+
| what | 2 |
| how | 2 |
| project_management | 1 |
| do it | 1 |
| personal | 1 |
+--------------------+----------+

Using SPLIT_TO_TABLE & REPLACE FUNCTIONS
SELECT COL_VAL,COUNT(COL_VAL) FROM
(
SELECT REPLACE(REPLACE(REPLACE(VALUE,'['),'"'),']') COL_VAL FROM TABLE( SPLIT_TO_TABLE('"[
""what"",
""how"",
]"
"[
""how"",
""what"",
]"
"[
""project_management"",
""do it"",
""personal""
]"
"[
""do it"",
""finance"",
""events"",
""save""
]"
"[
""do it"",
""sales"",
""events""
]"
"[
""finance"",
""sales"",
""events""
]"
"[
""events""
]"',','))) GROUP BY COL_VAL;

Related

Split large texts into chunks in separate rows

I have a table, where some texts are atrociously big. I want to make sure every row in the query output does not exceed, say, 100.000 characters. How do I do that?
Here is a quick sample:
WITH large_texts AS (
(SELECT 'humongous text goes here' AS text ,'1' AS id)
UNION ALL
(SELECT 'small one' AS text ,'2' AS id)
UNION ALL
(SELECT 'and another big one over here' AS text ,'3' AS id)
)
SELECT * FROM large_texts
Let's say I want output text column to be less than 10 characters. So, I need this result:
+----+------------+
| id | text |
+----+------------+
| 1 | humongous |
+----+------------+
| 1 | text goes |
+----+------------+
| 1 | here |
+----+------------+
| 2 | small one |
+----+------------+
| 3 | and anothe |
+----+------------+
| 3 | r big one |
+----+------------+
| 3 | over here |
+----+------------+
It would be even better if I could also avoid splitting in the middle of words.

It would be even better if I could also avoid splitting in the middle of words.
Consider below approach
create temp function split_with_limit(text STRING, len FLOAT64)
returns ARRAY<STRING>
language js AS r"""
let input = text.trim().split(' ');
let [index, output] = [0, []]
output[index] = '';
input.forEach(word => {
let temp = `${output[index]} ${word}`.trim()
if (temp.length < len) {
output[index] = temp;
} else {
index++;
output[index] = word;
}
})
return output
""";
select id, small_chunk
from yourtable_with_large_texts,
unnest(split_with_limit(text, 10)) small_chunk with offset
order by id, offset
If applied to sample data in your question - output is

One option is to use the REGEXP_EXTRACT_ALL function specifying the number of characters as a pattern:
SELECT id, chunk
FROM large_texts
CROSS JOIN UNNEST(REGEXP_EXTRACT_ALL(text, '.{1,10}')) AS chunk;

If you would like to split word by word, then try this:
WITH large_texts AS (
(SELECT 'humongous text goes here' AS text ,'1' AS id)
UNION ALL
(SELECT 'small one' AS text ,'2' AS id)
UNION ALL
(SELECT 'and another big one over here' AS text ,'3' AS id)
)
SELECT id, newtext FROM large_texts
CROSS JOIN UNNEST(split(text,' ')) AS newtext;
The result will look like:

Try this
with cte2 as
(
SELECT 'humongous text goes here' AS text ,'1' AS id
UNION ALL
SELECT 'small one' AS text ,'2' AS id
UNION ALL
SELECT 'and another big one over here' AS text ,'3' AS id
)
, cte3 as
(
select id,text ,generate_array(0,length(text),10) arr
from cte2
),
cte4 as
(
select *
from cte3,unnest (arr) as idx with offset as rn
)
select id,substr(text,idx + 1,10)
from cte4
output

Equivalent function in HANA DB for json_object

I would like to return the query results into json format in HANA DB.
There is a json_object function in oracle to achieve this requirement, but I am not seeing any function in HANA.
Does anyone knows if this kind of function exists in HANA
For example:
Table Author contains non-json data as follows:
---------------------------------------------
| firstName | lastName |
---------------------------------------------
| Paulo | Coelho |
| George | Orwell |
---------------------------------------------
write a select statement to return result as json.
In Oracle it can be returned using query:
SELECT json_object(
KEY 'firstName' VALUE author.first_name,
KEY 'lastName' VALUE author.last_name
)
FROM author
Output looks like this:
---------------------------------------------
| json_array |
---------------------------------------------
| {"firstName":"Paulo","lastName":"Coelho"} |
| {"firstName":"George","lastName":"Orwell"} |
----------------------------------------------
Does anyone knows query or function in HANA to achieve the same result?

you can use the already mentioned function in SAP HANA too
JSON_QUERY (
<JSON_API_common_syntax>
[ <JSON_output_clause> ]
[ <JSON_query_wrapper_behavior> ]
[ <JSON_query_empty_behavior> ON EMPTY ]
[ <JSON_query_error_behavior> ON ERROR ]
)
research

For 2.0 SP04 and above there's a for json addition to the select statement. As documentation says, it is only permitted in subqueries, so you need to select individual columns in subselect (if you need a result set of JSON objects) of generate a JSON array as a single scalar result. Column names are inherited from subquery aliases.
Case 1:
with a as (
select 'AAA' as field1, 'Value 1' as val from dummy union all
select 'BBB' as field1, 'Value 2' as val from dummy
)
select
/*Use correlated subquery with single row*/
json_value((select a.field1, a.val from dummy for json), '$[0]') as res
from a
Or more effort to type-in, but less structure-dependent:
with a as (
select 'AAA' as field1, 'Value 1' as val from dummy union all
select 'BBB' as field1, 'Value 2' as val from dummy
)
, json_source as (
/*Intermediate query to use as correlation source in JSON_TABLE*/
select (select * from a for json) as tmp_json
from dummy
)
select json_parsed.*
from json_source,
json_table(
json_source.tmp_json
/*Access individual items*/
, '$[*]'
columns (
res nvarchar(1000) format json path '$'
)
) as json_parsed
Both return:
RES
{"FIELD1":"AAA","VAL":"Value 1"}
{"FIELD1":"BBB","VAL":"Value 2"}
Or as a scalar query returning JSON array (Case 2):
with a as (
select 'AAA' as field1, 'Value 1' as val from dummy union all
select 'BBB' as field1, 'Value 2' as val from dummy
)
select *
from (select * from a for json)
JSONRESULT
[{"FIELD1":"AAA","VAL":"Value 1"},{"FIELD1":"BBB","VAL":"Value 2"}]

Creating a table with dynamic json column - postgresql

I have 3 tables containing array columns: config_table, main_table and field_table. My objective is to get an output_table that can have variable column names (and obviosuly values) depending on what is configured in the config_table.
config_table:
type_id fields feild_ids
1 {A} {1}
1 {B,C} {2,3}
1 {D} {4}
main_table:
type_id value
1 12
1 34
2 56
2 78
3 99
field_table:
value field_data
12 {"1": "Hello",
"2": "foo",
"3": "bar",
"4": "Hi",
"5": "ignore_this",
"6": "ignore_this_too" }
34 {"1": "Iam",
"2": "out",
"3": "of",
"4": "words",
"5": "orange",
"6": "banana" }
56 ...
78 ...
99 ...
EDIT
Since having dynamic/variable column names will not be feasible, the ideal output_table format would be:
type_id value json_data
1 12 {"A": "Hello", "B-C": "foo-bar", D: "Hi"}
1 34 {"A": "Iam", "B-C": "out-of", D: "words"}
I am trying to realize a general solution that would allow me to create output_table for N values in a single "field_ids" in config_table.
EDIT 2
removed the redundant type column, added fields 5 and 6 to field_table.field_data as well as type_id 2 and 3 to main_table.type_id (which need to be ignored in output_table because of their absence in config_table) and to make the question easier to understand

This gave me the desired output:
select M.type_id,
M.value,
--F.field_data,
JSON_OBJECT(ARRAY( select TD.temp_keys
from ( select array_to_string(TMD.fields, '-') as temp_keys,
array_to_string(TMD.field_values, '-') as temp_values
from (select MC.fields,
MC.field_ids,
array(select TF.value
from jsonb_each_text(F.field_data) TF
where TF.key::integer = any(MC.field_ids)) as field_values
from config_table MC
--where MC.field_ids = any(F.fields)
) TMD
) TD
),
ARRAY( select TD.temp_values
from ( select array_to_string(TMD.fields, '-') as temp_keys,
array_to_string(TMD.field_values, '-') as temp_values
from (select MC.fields,
MC.field_ids,
array(select TF.value
from jsonb_each_text(F.field_data) TF
where TF.key::integer = any(MC.field_ids)) as field_values
from main_table MC
--where MC.field_ids = any(F.fields)
) TMD
) TD
)
)
from main_table M
inner join field_table F
on M.value = F.value
where M.type_id in (select distinct CC.type_id from config_table CC)

Postgres json alias

I write SQL-query in PostgreSQL. I have now:
SELECT T.order,
(SELECT row_to_json(item) FROM (
SELECT T.title, T.subtitle, T.text
FROM table T
WHERE T.id = 1
) AS item)
FROM table T
WHERE T.id = 1;
The result is:
order | row_to_json
---------------+------------------------------------------------------
2 | {"title":"AAA","subtitle":"aaaa","text":"aaaa"}
But I need result:
order | row_to_json
---------------+------------------------------------------------------
2 | {"item":{"title":"AAA","subtitle":"aaaa","text":"aaaa"}}
Could you tell me how I can get it?

You wouldn't need subquery for such result and using Postgres jsonb function jsonb_build_object you can achieve your goal like that:
-- Test data:
WITH "table"( id, "order", title, subtitle, "text" ) AS (
VALUES ( 1::int, 2::int, 'AAA'::text, 'aaaa'::text, 'aaaa'::text),
( 2::int, 3::int, 'BBB'::text, 'bbbb'::text, 'bbbb'::text)
)
-- The query:
SELECT "order",
jsonb_build_object(
'items',
jsonb_build_object(
'title', title,
'subtitle', subtitle,
'text', "text"
)
) AS myjson
FROM "table"
WHERE id = 1;
-- Result:
order | myjson
-------+-----------------------------------------------------------------
2 | {"items": {"text": "aaaa", "title": "AAA", "subtitle": "aaaa"}}
(1 row)

BigQuery : filter repeated fields with legacy SQL

I have the following table :
row | query_params | query_values
1 foo bar
param val
2 foo baz
JSON :
{
"query_params" : [ "foo", "param"],
"query_values" : [ "bar", "val" ]
}, {
"query_params" : [ "foo" ],
"query_values" : [ "baz" ]
}
Using legacy SQL I want to filter repeated field on their value, something like
SELECT * FROM table WHERE query_params = 'foo'
Which would output
row | query_params | query_values
1 foo bar
2 foo baz
PS : this question is related to the same question but using standard SQL answered here

I can't think of any better ideas for legacy SQL aside from using a JOIN after flattening each array separately. If you have a table T with the contents indicated above, you can do:
SELECT
[t1.row],
t1.query_params,
t2.query_values
FROM
FLATTEN((SELECT [row], query_params, POSITION(query_params) AS pos
FROM T WHERE query_params = 'foo'), query_params) AS t1
JOIN
FLATTEN((SELECT [row], query_values, POSITION(query_values) AS pos
FROM T), query_values) AS t2
ON [t1.row] = [t2.row] AND
t1.pos = t2.pos;
The idea is to associate the elements of the two arrays by row and position after filtering for query_params that are equal to 'foo'.

Try below version
SELECT [row], query_params, query_values
FROM (
SELECT [row], query_params, param_pos, query_values, POSITION(query_values) AS value_pos
FROM FLATTEN((
SELECT [row], query_params, POSITION(query_params) AS param_pos, query_values
FROM YourTable
), query_params)
WHERE query_params = 'foo'
)
WHERE param_pos = value_pos

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Snowflake - Count distinct values in comma seperated list - sql

You'll need to flatten the values. If the variant is an array as described: with data as ( select parse_json('["a", "b"]') v union select parse_json('["a", "a", "c"]') ) select x.value::string val, count(*) c from data, table(flatten(v)) x group by 1 ;

Related

Split large texts into chunks in separate rows

Equivalent function in HANA DB for json_object

Creating a table with dynamic json column - postgresql

Postgres json alias

BigQuery : filter repeated fields with legacy SQL

Categories

Resources