Postgres Query of an Array using LIKE - sql

I am querying a database in Postgres using psql. I have used the following query to search a field called tags that has an array of text as it's data type:
select count(*) from planet_osm_ways where 'highway' = ANY(tags);
I now need to create a query that searches the tags fields for any word starting with the letter 'A'. I tried the following:
select count(*) from planet_osm_ways where 'A%' LIKE ANY(tags);
This gives me a syntax error. Any suggestions on how to use LIKE with an array of text?

Use the unnest() function to convert array to set of rows:
SELECT count(distinct id)
FROM (
SELECT id, unnest(tags) tag
FROM planet_osm_ways) x
WHERE tag LIKE 'A%'
The count(dictinct id) should count unique entries from planet_osm_ways table, just replace id with your primary key's name.
That being said, you should really think about storing tags in a separate table, with many-to-one relationship with planet_osm_ways, or create a separate table for tags that will have many-to-many relationship with planet_osm_ways. The way you store tags now makes it impossible to use indexes while searching for tags, which means that each search performs a full table scan.

Here is another way to do it within the WHERE clause:
SELECT COUNT(*)
FROM planet_osm_ways
WHERE (
0 < (
SELECT COUNT(*)
FROM unnest(planet_osm_ways) AS planet_osm_way
WHERE planet_osm_way LIKE 'A%'
)
);

Related

How to select a column whose name is a value in another column in POSTGRESQL?

I know this isn't valid SQL, but I'd like to do something like:
SELECT items.{SELECT items.preferred_column}
To elaborate, to achieve what I'm trying to achieve, I could write a long case when statement:
SELECT
CASE WHEN items.preferred_column = "column_a" THEN items.column_a
CASE WHEN items.preferred_column = "column_b" THEN items.column_b
CASE WHEN items.preferred_column = "column_c" THEN items.column_c
... and so on...
But that seems wrong. I would prefer to write a query that looks at the value of items.preferred_column and loads that column.
Is this possible?
My use case involves an Active Record (the ORM for Rails) query, which limits me. I'm not able to use "INTO" for example.
Doing this without creating a SQL function would preferred, though if it's not possible without creating a SQL function that would be good to know.
Thanks in advance for lending your expertise!
You can try transforming the table rows with row_to_json() and then using json_each(), you can join the resultant "key" field on the preferred_column:
WITH CTE AS (
SELECT
row_to_json(Z.*)::jsonb as rcr,
row_number() over(partition by null order by <whatever comparator clause>) as rn,
Z.*
FROM items Z)
SELECT b.value, a.*
FROM CTE a, jsonb_each(rcr) b, CTE c
WHERE c.rn=a.rn AND b.key = ( c.preferred_column )
Note that this essentially operates as a quasi-pivot, so you'll need to maintain an index (the row_number invocation) to self-join the table when extracting the appropriate key-value pairs from jsonb_each's set-return. Casting to jsonb will be helpful in that the binary form will alphabetize the key-value pairs by key order within the object itself.
If you need to get the resultant value as a text string instead of a json primitive, you can do
b.value #>>'{}'
instead of using jsonb_each_text(), which will preserve any json columns.

Extract nested values as columns Google BigQuery?

I have a table with nested values, like the following:
I'd like to grab the values, with keys as columns without multiple cross joins.
i.e.
SELECT
owner_id,
owner_type,
domain,
metafields.value AS name,
metafields.value AS image,
metafields.value AS location,
metafields.value AS draw
FROM
example_table
Obviously, the above won't work for this, but the following output would be desired:
In the actual table there are hundreds of metafields per owner_id, and hundreds of owner_ids, and owner_types. Multiple joins to other tables for owner_types is fine, but for the same owner type, I don't want to have to join multiple times.
Basically, I need to be able to select the key to which the column corresponds, and display the relevant value for that column. Without, having to display every metafield available.
Any way of doing this?
Consider below approach
select * except(id) from (
select t.* except(metafields),
to_json_string(t) id, key, value
from your_table t, unnest(metafields) kv
)
pivot (min(value) for key in ('name', 'image', 'location', 'draw'))
if applied to sample data in your question - output is
You can use the subqueries and SAFE_offset statement and get a value from an array at a specific location.
Also, you need to use STRING_AGG, which returns a value (either STRING or BYTES) obtained by concatenating non-null values.
With the information you shared, you can use the query below.
With this code, you will get all the columns separated by a comma:
WITH sequences AS
(
SELECT 1 as ID,"product" AS owner_type,"beta.com" AS domain,["name","image","lcation","draw"] AS metalfields_key, ["big","pic.png","utha","1"] AS metalfields_value
),
Val as(
SELECT distinct id, owner_type,domain, value FROM sequences, sequences.metalfields_value as value, sequences.metalfields_key
), text as(
SELECT
id, owner_type, domain,
STRING_AGG(value ORDER BY value) AS Text
FROM Val
GROUP BY owner_type, domain, id
)
In this code, you will get each element that is separated by a comma and return them by columns.
SELECT DISTINCT t1.id, t1.owner_type,domain,
split(t1.text, ',')[SAFE_offset(1)] as name,
split(t1.text, ',')[SAFE_offset(2)] as image,
split(t1.text, ',')[SAFE_offset(3)] as location,
split(t1.text, ',')[SAFE_offset(0)] as draw
from text as t1
You can see the result.

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

SQL query with regex

I have a table vehicles has a column 'name'
Values are stored like car/tesla, car/honda, truck/daimler (each value is stored as type/brand)
I want to query the table using only brand. If I look up tesla, it should return the row corresponding to car/tesla. How do I do it? I'm using postgres.
There is no need in regex in your case. Just use old good like:
select name
from vehicles
where name like '%/tesla'
2 solutions are available a select query with LIKE operand or a select query with contains operand.
select * from vehicles where name LIKE '%tesla%'
Select * from vehicles where Contains(name, "tesla");

Get each <tag> in String - stackexchange database

Mockup code for my problem:
SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'
The problem in detail
I am trying to get tags used in 2011 from apple.stackexchange data. (this query)
As you can see, tags in tag changes are stored as plain text in the Text field.
<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>
How can I create a unique list of the tags, to look them up in the Tags table, instead of this hardcoded version:
SELECT * FROM Tags
WHERE TagName = 'osx'
OR TagName = 'keyboard'
OR TagName = 'security'
Here is a interactive example.
Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.
Assuming this table definition:
CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);
Depending on what you want exactly:
To convert the string to an array, trim leading and trailing '<>', then treat '><' as separator:
SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM posthistory;
To get list of unique tags for whole table (I guess you want this):
SELECT DISTINCT tag
FROM posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;
The implicit LATERAL join requires Postgres 9.3 or later.
This should be substantially faster than using regular expressions. If you want to try regexp, use regexp_split_to_table() instead of regexp_split_to_array() followed by unnest() like suggested in another answer:
SELECT DISTINCT tag
FROM posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;
Also with implicit LATERAL join. Related:
Split column into multiple rows in Postgres
What is the difference between LATERAL and a subquery in PostgreSQL?
To search for particular tags:
SELECT *
FROM posthistory
WHERE tags LIKE '%<security>%'
AND tags LIKE '%<osx>%';
SQL Fiddle.
Applied to your search in T-SQL in our data explorer:
SELECT TOP 100
PostId, UserId, Text AS Tags FROM PostHistory
WHERE year(CreationDate) = 2011
AND PostHistoryTypeId IN (3 -- initial tags
, 6 -- edit tags
, 9) -- rollback tags
AND Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');
(T-SQL syntax uses the non-standard + instead of ||.)
https://data.stackexchange.com/apple/query/edit/417055
I've simplified the data to the relevant column only and called it tags to present the example.
Sample data
create table posthistory(tags text);
insert into posthistory values
('<lion><backup><time-machine>'),
('<spotlight><alfred><photo-booth>'),
('<lion><pdf><preview>'),
('<pdf>'),
('<asd>');
Query to get unique list of tags
SELECT DISTINCT
unnest(
regexp_split_to_array(
trim('><' from tags), '><'
)
)
FROM
posthistory
First we're removing all occurences of leading and trailing > and < signs from each row, then using regexp_split_to_array() function to get values into arrays, and then unnest() to expand an array to a set of rows. Finally DISTINCT eliminates duplicate values.
Presenting SQLFiddle to preview how it works.