How do I get rid of inherited SELECTs in Postgres queries? - sql

Postgres 9.4
I guess, queries like this is not the best approach in terms of database performance:
SELECT t.name,
t.description,
t.rating,
t.readme,
t.id AS userid,
t.notifications
FROM ( SELECT "user".name,
"user".description,
"user".rating,
"user".readme,
"user".id,
( SELECT array_to_json(array_agg(row_to_json(notifications.*))) AS array_to_json
FROM ( SELECT notification.id,
notification.action_type,
notification.user_id,
notification.user_name,
notification.resource_id,
notification.resource_name,
notification.resource_type,
notification.rating,
notification.owner
FROM notification
WHERE (notification.owner = "user".id)
ORDER BY notification.created DESC) notifications) AS notifications
FROM "user") t
Column notification contains json object with all the matched rows from notification table.
How should I rebuild this query to receive data in the same manner? I suppose, I should use JOIN commands somehow.
I have request, which utilise more than one inherited SELECT.
Thank you for your time!

The outermost query only aliases id to userid. You can move the alias to the inner query, and omit the outer query entirely.
Then you can create a function to create the notification JSON:
create or replace function get_user_notifications(user_id bigint)
returns json language sql as
$$
select array_to_json(array_agg(row_to_json(n)))
from (
select id
, action_type
, ... other columns from notification ...
from notification
-- Use function name to refer to parameter not column
where user_id = get_user_notifications.user_id
order by
created desc
) n
$$;
Now you can write the query as:
select id as userid
, ... other columns from "user" ...
, get_user_notifications(id) as notifications
from "user" u;
Which looks a lot better, at the cost of having to maintain Postgres functions.

Related

PostgreSQL subqueries as values

I am trying to use a postgreSQL INSERT query with a subquery as parameter value. This is to find the corresponding user_id from an accompanying auth_token in user_info tabel first and then create a new entry in a different table with the corresponding user_id.
My query looks something like this
INSERT INTO user_movies(user_id, date, time, movie, rating)
VALUES ((SELECT user_id FROM user_info where auth_token = $1),$2,$3,$4,$5)
RETURNING *
I know that a query such as this will work with a single value
INSERT INTO user_movies(user_id)
SELECT user_id FROM user_info where auth_token = $1
RETURNING *
but how do I allow for multiples input values. Is this even possible in postgreSQL.
I am also using nodejs to run this query -> therefore the $ as placeholders.
To expand on my comment (it is probably a solution, IIUC): Easiest in this case would be to make the inner query return all the values. So, assuming columns from the inner query have the right names, you could just
INSERT INTO user_movies(user_id, date, time, movie, rating)
SELECT user_id,$2,$3,$4,$5 FROM user_info where auth_token = $1
RETURNING *
Note this form is also without VALUES, it uses a query instead.
Edited 20220424: a_horse_with_no_name removed the useless brackets around SELECT ... that appeared in my original version; thanks!
YOu could try uising where IN clause
INSERT INTO user_movies(user_id)
SELECT user_id
FROM user_info
WHERE auth_token IN ($1,$2,$3,$4,$5)
RETURNING *

How to dynamically SELECT from manually partitioned table

Suppose I have table of tenants like so;
CREATE TABLE tenants (
name varchar(50)
)
And for each tenant, I have a corresponding table called {tenants.name}_entities, so for example for tenant_a I would have the following table.
CREATE TABLE tenant_a_entities {
id uuid,
last_updated timestamp
}
Is there a way I can create a query with the following structure? (using create table syntax to show what I'm looking for)
CREATE TABLE all_tenant_entities {
tenant_name varchar(50),
id uuid,
last_updated timestamp
}
--
I do understand this is a strange DB layout, I'm playing around with foreign data in Postgres to federate foreign databases.
Did you consider declarative partitioning for your relational design? List partitioning for your case, with PARTITION BY LIST ...
To answer the question at hand:
You don't need the table tenants for the query at all, just the detail tables. And one way or another you'll end up with UNION ALL to stitch them together.
SELECT 'a' AS tenant_name, id, last_updated FROM tenant_a_entities
UNION ALL SELECT 'b', id, last_updated FROM tenant_b_entities
...
You can add the name dynamically, like:
SELECT tableoid::regclass::text, id, last_updated FROM tenant_a_entities
UNION ALL SELECT tableoid::regclass::text, id, last_updated FROM tenant_a_entities
...
See:
Get the name of a row's source table when querying the parent it inherits from
But it's cheaper to add a constant name while building the query dynamically in your case (the first code example) - like this, for example:
SELECT string_agg(format('SELECT %L AS tenant_name, id, last_updated FROM %I'
, split_part(tablename, '_', 2)
, tablename)
, E'\nUNION ALL '
ORDER BY tablename) -- optional order
FROM pg_catalog.pg_tables
WHERE schemaname = 'public' -- actual schema name
AND tablename LIKE 'tenant\_%\_entities';
Tenant names cannot contain _, or you have to do more.
Related:
Table name as a PostgreSQL function parameter
How to check if a table exists in a given schema
You can wrap it in a custom function to make it completely dynamic:
CREATE OR REPLACE FUNCTION public.f_all_tenant_entities()
RETURNS TABLE(tenant_name text, id uuid, last_updated timestamp)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
(
SELECT string_agg(format('SELECT %L AS tn, id, last_updated FROM %I'
, split_part(tablename, '_', 2)
, tablename)
, E'\nUNION ALL '
ORDER BY tablename) -- optional order
FROM pg_tables
WHERE schemaname = 'public' -- your schema name here
AND tablename LIKE 'tenant\_%\_entities'
);
END
$func$;
Call:
SELECT * FROM public.f_all_tenant_entities();
You can use this set-returning function (a.k.a "table-function") just like a table in most contexts in SQL.
Related:
How to UNION a list of tables retrieved from another table with a single query?
Simulate CREATE DATABASE IF NOT EXISTS for PostgreSQL?
Function to loop through and select data from multiple tables
Note that RETIRN QUERY does not allow parallel queriies before Postgres 14. The release notes:
Allow plpgsql's RETURN QUERY to execute its query using parallelism (Tom Lane)

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

Use same alias for union query SQL

So I came across a problem/question yesterday.
I am building a chat (with AJAX) and use two tables:
TABLE users -> 'name', 'username', 'password', 'time'
TABLE messages -> 'sendFrom', 'sendTo', 'message', 'time'
So an example message now would be
'foo' | 'bar' | 'Hey, how are you?' | 130611134427611
I was told the correct way to do this is, instead, to use an ID column, and use that as a Primary Key instead of the username (which, anyway, makes sense).
OK, so now this looks like
TABLE users -> 'ID', 'name', 'username', 'password', 'time'
TABLE messages -> 'sendFrom', 'sendTo', 'message', 'time'
So an example message now would be
'22' | '7' | 'Hey, how are you?' | 130611134427611
I've managed to JOIN both tables to return the rows as on the first example message, but since I am detecting user keypresses too, I need to scan the table twice, so:
SELECT *
FROM (SELECT *
FROM (SELECT *
FROM messages
WHERE sendTo = '$username'
AND time > (SELECT time FROM users
WHERE username = '$username' LIMIT 1)
AND message <> '$keypressCode'
ORDER BY time DESC LIMIT 30)
ORDER BY time ASC)
UNION
SELECT *
FROM (SELECT *
FROM messages
WHERE message = '$keypressCode'
AND time > (SELECT time FROM users
WHERE username = '$username' LIMIT 1)
AND sendTo = '$username' LIMIT 1);
But now, of course, I don't just select from messages; instead, I use a long query like
SELECT * FROM (
SELECT u1.ID as sendTo, u2.ID as sendFrom, messages.message, .....
.....
.....
.....
.....
) as messages;
that MUST BE INSERTED just in the place of messages (I haven't tried this yet, but I think is like that. See, the thing is I DuckDuckGo'ed and Googled and found nothing, so I came here)
My first question is:
Is there a way to use ALIAS for the table messages so I don't have to scan it TWICE? So, instead, I just save the above query using ALIAS as a table called messages and select data from it twice, once in each part of UNION.
In addition, the answer to the first question would also be an answer for:
Is there a way to use ALIAS to save the time selected from the table? (since, again, I am searching for it TWICE).
In practice, what I am doing may not be unefficient (since there will be at most 20 users), but what if?
Also, I am a mathematician, and like it or not, I like to worry a lot about efficiency!
Thank you so much in advance, I hope I made myself clear.
I am not sure but it does look as if you want a view.
Define that query like this:
CREATE VIEW MyMessageView
AS
SELECT ...
FROM ...
...
Now you can use that view in any context where an ordinary table can be used: in a FROM clause, in a JOIN clause, as a subquery etc.:
SELECT ...
FROM MyMessageView
WHERE ...
...
UNION
SELECT ...
FROM MyMessageView
WHERE ...
Instead of using UNION, put an OR in the condition:
SELECT *
FROM messages
WHERE time > (SELECT time FROM users
WHERE username = '$username' LIMIT 1)
AND (message <> '$keypressCode' AND sendTo = '$username'
OR message = '$keypressCode')
I am answering my own question, since I consider what people might be looking for is a VIEW.
First, define that query like this:
CREATE VIEW MyViewTable
AS
SELECT ...
FROM ...
...
...;
Now you can use that view (which is a sepparate query) in any context where an ordinary table can be used: in a FROM clause, in a JOIN clause, as a subquery etc.:
SELECT ...
FROM MyViewTable
WHERE ...
...
UNION
SELECT ...
FROM MyViewTable
WHERE ...
but with a few restrictions:
You cannot SELECT from your view using subqueries, such as
SELECT * FROM MyViewTable WHERE someColumn = (SELECT ... ...)
but (as normal) you can use subqueries when creating the VIEW and in the main query.
The SELECT statement cannot refer to prepared statement parameters.
The definition cannot refer to a TEMPORARY table, and you cannot create a TEMPORARY view.
(there are more, but this are, in my opinion, among the most common queries, so the restrictions might be among the most common errors. See SQLite reference for more information. ).

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique