Put strings inside quotes with string_to_array() - sql

I am using the following query:
WITH a as (SELECT unnest(string_to_array(animals, ',')) as "pets" FROM all_animals where id = 100)
select * from a
which returns the following data:
1 Cat
2 Dog
3 Bird
My question is, how can I format my string_to_array select above to include single quotes for the returned data to look like this:
1 'Cat'
2 'Dog'
3 'Bird'

Use quote_literal() to safely single-quote strings:
WITH a AS (
SELECT unnest(string_to_array(animals, ',')) AS pets
FROM all_animals
WHERE id = 100
)
SELECT quote_literal(pets) AS pets
FROM a;
Or shorter without the CTE:
SELECT quote_literal(unnest(string_to_array(animals, ','))) AS pets
FROM all_animals
WHERE id = 100;
db<>fiddle here

Related

selecting row if value of attribute in array of objects one of multiple values

I have data shaped like this: arrays of objects in a jsonb column in postgres
id
data
1
[{"a":3, "b":"green"} ,{"a":5, "b":"blue"}]
2
[{"a":3, "b":"red"} ,{"a":5, "b":"yellow"}]
3
[{"a":3, "b":"orange"} ,{"a":5, "b":"blue"}]
I am trying to select the rows where b is either "green" or "yellow"
I know I can unroll the data using jsonb_array_elements to get all the b values
select jsonb_array_elements(data) ->> 'b' from table
but I am failing to use that in a where query like this
select * from table where jsonb_array_elements(data) ->> 'b' && ARRAY["green","yellow"]::varchar[]
(not working "set-returning functions are not allowed in WHERE")
You can use the #> operator
select *
from the_table
where data #> '[{"b": "green"}]'
or data #> '[{"b": "yellow"}]'
Or a JSON path expression:
select *
from the_table
where data ## '$[*].b == "green" || $[*].b == "yellow"';
Or by unnesting the array with an EXISTS condition:
select t.*
from the_table t
where exists (select *
from jsonb_array_elements(t.data) as x(item)
where x.item ->> 'b' in ('green', 'yellow'))
You can try to use subquery with a column alias name and ANY like below
SELECT *
FROM (
select *,jsonb_array_elements(data) ->> 'b' val
from t
) t1
WHERE t1.val = ANY (ARRAY['green','yellow'])
sqlfiddle
NOTE
ARRAY filter value need to use single quote instead of double quote

Return five rows of random DNA instead of just one

This is the code I have to create a string of DNA:
prepare dna_length(int) as
with t1 as (
select chr(65) as s
union select chr(67)
union select chr(71)
union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i, round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;
execute dna_length(20);
I am trying to figure out how to re-write this to give a table of 5 rows of strings of DNA of length 20 each, instead of just one row. This is for PostgreSQL.
I tried:
CREATE TABLE dna_table(g int, dna text);
INSERT INTO dna_table (1, execute dna_length(20));
But this does not seem to work. I am an absolute beginner. How to do this properly?
PREPARE creates a prepared statement that can be used "as is". If your prepared statement returns one string then you can only get one string. You can't use it in other operations like insert, e.g.
In your case you may create a function:
create or replace function dna_length(int) returns text as
$$
with t1 as (
select chr(65) as s
union
select chr(67)
union
select chr(71)
union
select chr(84))
, t2 as (select s,
row_number() over () as rn
from t1)
, t3 as (select generate_series(1, $1) as i,
round(random() * 4 + 0.5) as rn)
, t4 as (select t2.s
from t2
join t3 on (t2.rn = t3.rn))
select array_to_string(array(select s from t4), '') as dna
$$ language sql;
And use it in a way like this:
insert into dna_table(g, dna) select generate_series(1,5), dna_length(20)
From the official doc:
PREPARE creates a prepared statement. A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
About functions.
This can be much simpler and faster:
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1,100) g -- 100 = 5 rows * 20 nucleotides
GROUP BY g%5;
random() produces random value in the range 0.0 <= x < 1.0. Multiply by 4 and take the mathematical ceiling with ceil() (cheaper than round()), and you get a random distribution of the numbers 1-4. Convert to ACTG, and aggregate with GROUP BY g%5 - % being the modulo operator.
About string_agg():
Concatenate multiple result rows of one column into one, group by another column
As prepared statement, taking
$1 ... the number of rows
$2 ... the number of nucleotides per row
PREPARE dna_length(int, int) AS
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1, $1 * $2) g
GROUP BY g%$1;
Call:
EXECUTE dna_length(5,20);
Result:
| dna |
| :------------------- |
| ATCTTCGACACGTCGGTACC |
| GTGGCTGCAGATGAACAGAG |
| ACAGCTTAAAACACTAAGCA |
| TCCGGACCTCTCGACCTTGA |
| CGTGCGGAGTACCCTAATTA |
db<>fiddle here
If you need it a lot, consider a function instead. See:
What is the difference between a prepared statement and a SQL or PL/pgSQL function, in terms of their purposes?

How to get count of matches in field of table for list of phrases from another table in bigquery?

Given an arbitrary list of phrases phrase1, phrase2*, ... phraseN (say these are in another table Phrase_Table), how would one get the count of matches for each phrase in a field F in a bigquery table?
Here, "*" means there must be some non-empty/non-blank string after the phrase.
Lets say you have a table with and ID field and two string fields Field1, Field2
Output would look something like
id, CountOfPhrase1InField1, CountOfPhrase2InField1, CountOfPhrase1InField2, CountOfPhrase2InField2
or I guess instead of all of those output fields maybe there's a single json object field
id, [{"fieldName": Field1, "counts": {phrase1: m, phrase2: mm, ...},
{"fieldName": Field2, "counts": {phrase1: m2, phrase2: mm2, ...},...]
Thanks!
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
SELECT 'foo' key UNION ALL
SELECT 'test'
)
SELECT str, ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches)) all_matches
FROM `project.dataset.table`
CROSS JOIN `project.dataset.keywords`
GROUP BY str
with result
Row str all_matches.key all_matches.matches
1 foo1 foo foo40 foo 2
test 0
2 test1 test test2 test foo 0
test 2
If you prefer output as json you can add TO_JSON_STRING() as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
SELECT 'foo' key UNION ALL
SELECT 'test'
)
SELECT str, TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches))) all_matches
FROM `project.dataset.table`
CROSS JOIN `project.dataset.keywords`
GROUP BY str
with output
Row str all_matches
1 foo1 foo foo40 [{"key":"foo","matches":2},{"key":"test","matches":0}]
2 test1 test test2 test [{"key":"foo","matches":0},{"key":"test","matches":2}]
there are endless ways of presenting outputs like above - hope you will adjust it to whatever exactly you need :o)

How to search for multiple matches using the IN operator in BigQuery?

Right now I am filtering my rows by using the WHERE operator and 2 conditional statements. It seems somewhat inefficient that I am writing 2 conditions. Would it be possible to check if "amznbida" and "ksga" are in the array by only writing in one statement?
standardSQL
-- Get all the keys
SELECT
*
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE
"amznbida" IN UNNEST(ARRAY(SELECT name FROM UNNEST(keywords)))
AND
"ksga"IN UNNEST(ARRAY(SELECT name FROM UNNEST(keywords)))
Just remove the UNNEST(ARRAY( part and leave the subquery - you should be fine.
working example:
SELECT
*,
t in (select * from unnest(a)) condition
FROM unnest([
struct('a' as t, ['a', 'b', 'c'] as a),
('b',['r', 'f'])
])
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE 2 = (SELECT COUNT(DISTINCT name) FROM UNNEST(keywords) WHERE name IN ("amznbida", "ksga"))
Yu can test, play with above using dummy data as below
#standardSQL
WITH `encoded-victory-198215.DFP_TEST.test3` AS (
SELECT
ARRAY<STRUCT<value ARRAY<STRING>, name STRING>>[
STRUCT(['ksg-1', 'ksg-2'], 'ksga'), STRUCT(['amznbid-1', 'amznbid-2'], 'amznbida')
] keywords,
1 impression UNION ALL
SELECT
ARRAY<STRUCT<value ARRAY<STRING>, name STRING>>[
STRUCT(['xxx-1', 'xxx-2'], 'xxxa'), STRUCT(['amznbid-1', 'amznbid-2'], 'amznbida')
] keywords,
2 impression
)
SELECT *
FROM `encoded-victory-198215.DFP_TEST.test3`
WHERE 2 = (SELECT COUNT(DISTINCT name) FROM UNNEST(keywords) WHERE name IN ("amznbida", "ksga"))
with result
Row keywords.value keywords.name impression
1 ksg-1 ksga 1
ksg-2
amznbid-1 amznbida
amznbid-2

Count for a list of items with zero for does not exist

If I have a table t1 with:
my_col
------
foo
foo
bar
And I have a list with foo and hello
How can I get:
my_col | count
-------|-------
foo | 2
hello | 0
If I just do
SELECT my_col, COUNT(*)
FROM t1
WHERE my_col in ('foo', 'hello')
GROUP BY my_col
I get
my_col | count
-------|------
foo | 2
without any value for hello.
I'm specifically wanting this to be in reference to a list of items because this will be called in a program where the list is a variable.
Ideally you should maintain a separate table with all the possible column values which you want to appear in your report. In the absence of that, we can try using a CTE here:
WITH cte AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'bar' UNION ALL
SELECT 'hello'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM cte a
LEFT JOIN t1 b
ON a.my_col = b.my_col
WHERE
a.my_col IN ('foo', 'hello')
GROUP BY
a.my_col;
Demo
Here's yet another way, using values:
select
t2.my_col, count (t1.my_col)
from
(values ('foo'), ('hello')) as t2 (my_col)
left join t1 on t1.my_col = t2.my_col
group by
t2.my_col
Note that count (t1.my_col) returns a 0 for "hello" since nulls are not counted. count (*) by contast would have returned 1 for "hello" because it was counting the row.
You can turn your list into a set of rows and use a LEFT JOIN, like :
SELECT x.val, COUNT(t.my_col)
FROM
(SELECT 'foo' val UNION SELECT 'hello') x
LEFT JOIN t ON t.my_col = x.val
GROUP BY x.val
Postgres solution:
One way is to place the 'list' into an ARRAY, and then convert the ARRAY into a column using unnest. Then perform a left join on that column with the other table and perform a count.
WITH t1 AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'foo' UNION ALL
SELECT 'bar'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM unnest(ARRAY['foo', 'hello']) a (my_col)
LEFT JOIN t1 b
ON a.my_col = b.my_col
GROUP BY
a.my_col;
The issue I had with the other answers is that (while they they helped me get to the solution) they did not provide a solution where the items of interest were in a single list (which isn't an actual sql term, so the fault is on me).
However, my real use case is to perform a native query using java and hibernate, and unfortunately the above does not work because the typing cannot be determined. Instead I converted my list into a single string and used string_to_array in place of the ARRAY function.
So the solution that worked best for my use case is below (but at this point, the other answers would be just as correct since I'm now having to do manual string manipulation, but I'm leaving this here for the sake of posterity)
WITH t1 AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'foo' UNION ALL
SELECT 'bar'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM unnest(string_to_array('foo, hello', ',')) a (my_col)
LEFT JOIN t1 b
ON a.my_col = b.my_col
GROUP BY
a.my_col;