I am creating an rpc function to get the number of likes for each post for a user, so I have created a function that takes userId as an argument, which is the uuid of the user that is in the session. But when I call the function on the front it returns me.
If a new function was created in the database with this name and parameters, try reloading the schema cache.
The function:
create function get_number_of_posts_by_user(userId uuid)
returns integer
as $$
SELECT
count(pl.id)
FROM
auth.users au
join posts p on p.user_id = au.id
join post_likes pl on pl.post_id = p.id
where au.id = userId
$$ language sql;
get the number of likes for each post for a user
You need to group by post to get there.
CREATE OR REPLACE FUNCTION get_likes_per_post_for_user(_userid uuid)
RETURNS TABLE (post_id uuid, likes bigint) -- use actual type of post_id
LANGUAGE sql AS
$func$
SELECT p.id, count(*)
FROM posts p
JOIN post_likes pl ON pl.post_id = p.id
WHERE p.user_id = _userid
GROUP BY 1
ORDER BY 1; -- or some other order?
$func$;
Call:
SELECT * FROM get_likes_per_post_for_user(<some_uuid>);
Major points:
You don't need to involve the table users at all. Filter by posts.user_id directly. Cheaper.
count(*) >> count(pl.id) in this case. A bit cheaper, too. count(*) has a separate implementation in Postgres.
count() returns bigint, not integer. Match what RETURNS declares one way or the other.
Avoid naming conflicts between function parameters and table columns. Prefixing _ for parameters (and never using the same for column names) is one common convention.
And table-qualify column names. In this case to avoid a conflict between the OUT parameter post_id (also visible in the query) and post_likes.post_id.
When counting the number of likes, don't call your function "get_number_of_posts...".
Your original issue may have been a XY problem that goes away with a proper function definition.
Addressing the title
If you actually need to reload the schema cache, use:
SELECT pg_stat_clear_snapshot();
The manual:
You can invoke pg_stat_clear_snapshot() to discard the current
transaction's statistics snapshot or cached values (if any). The next
use of statistical information will (when in snapshot mode) cause a
new snapshot to be built or (when in cache mode) accessed statistics
to be cached.
I have never had a need for this myself, yet. Typically, the problem lies elsewhere.
Related
I have following scalar function
CREATE FUNCTION dbo.getOM
( #mskey INT,
#category VARCHAR(2)
)
RETURNS VARCHAR(11)
AS
BEGIN
DECLARE #om VARCHAR(11)
SELECT #om = o.aValue
FROM dbo.idmv_value_basic o WITH (NOLOCK)
WHERE o.MSKEY = #mskey and o.AttrName = 'OM'
AND EXISTS (
SELECT NULL
FROM sys.sequences s WITH (NOLOCK)
WHERE CONVERT(INT, replace(o.aValue, '1690', '')) BETWEEN s.minimum_value AND s.maximum_value AND s.name = concat('om_', #category)
)
RETURN #om
END
Problem with that is, that o.aValue could not only have numeric values, so that the convertion can fail, if it is executet on other rows of idmv_value_basic, where attrName is not 'OM'.
For some unknown reason this morning, our MSSQL-Server changed the execution order of the where conditions and the convertion failed.
How could I define the selection, so that is guaranteed, that only the the selected lines of idmv_value_basic are used for the selection on sys.sequences?
I know, for SQL the execution order is not deterministic, but there must be a way, to guarantee, that the conversion would not fail.
Any ideas, how I could change the function or am I doing something fundamentally wrong?
By the way, when I execute the selection manualy, it does not fail, but when I execute the funtion it fails.
We could repair the function while changing something, save and then change it back and save again.
I'll try answer the question: "Any way to guarantee execution-order?"
You can - to some extent. When you write an EXISTS, what sqlserver will actually do behind the scenes is a join. (check the execution plan)
Now, the way joins are evaluated depends on cardinalities. Say you're joining tables A and B, with a predicate on both. SQL Server will prefer to start with the table producing the fewest rows.
In your case, SQL Server probably decided that a full scan of sys.sequences produces fewer rows than dbo.idmv_value_basic (WHERE o.MSKEY = #mskey and o.AttrName = 'OM') - maybe because the number of rows in idmv_value_basic increased recently?
You could help things by making an index on dbo.idmv_value_basic (MSKEY, AttrName) INCLUDE (aValue). I assume the predicate produces exactly one row pr. MSKey - or at least not very many - and an index would help SQL Server choose the "right way", by giving it more accurate estimates of how many rows that part of the join produces.
CREATE INDEX IDVM_VALUE_BASIC_MSKEY_ATTRNAME_VALUE
ON dbo.idmv_value_basic (MSKey, AttrName) INCLUDE (aValue);
Rewriting an EXISTS as a JOIN can be done, but requires a bit of finesse. With a JOIN, you can specify which kind (Loop, Merge or Hash) and thus force SQL Server to acknowledge your knowing better and forcing the order of evaluation, ie.
SELECT ...
FROM dbo.idmv_value_basic o
INNER LOOP JOIN (SELECT name
FROM sys.sequences
WHERE (..between min and max)
) AS B
ON (B.name= concat('om_',#category ))
WHERE o.MSKey=#mskey AND o.AttrName = 'OM'
And lose the WITH (NOLOCK)
I'm currently using postgres in node to query all users who have a certain tag associated with their account like so (note: I'm using node-postgres):
query = 'SELECT tags.*, pl.email FROM admin.tags tags
LEFT JOIN gameday.player_settings pl
ON tags.player_id = pl.id
WHERE tags.tag = $1'
client.query(
query,
[tagName],
function(err, results) {
...
[tagName] is then passed into the WHERE clause.
What I'm aiming to do is instead query by an unknown number of tags and return all users who have all of those tags associated with their account. So instead of [tagName] I'd like to pass in an array of unknown length, [tagNames], but I'm not sure how to accomplish this.
You need to turn the question backwards. Instead of:
Which users have all of these tags, you need to ask which users do not have one or more of these tags absent. It's a double negation.
You also need a way to pass the set of tags. The best way to do this, if the client language binding supports it, is as an array-valued query parameter. If the client binding doesn't support array-valued parameters you'll need dynamic SQL.
One formulation might be (untested, since you didn't provide sample schema and data):
SELECT pl.email
FROM gameday.player_settings pl
WHERE NOT EXISTS (
SELECT 1
FROM unnest(?) AS wanted_tags(tag)
LEFT JOIN admin.tags tags
ON tags.tag = wanted_tags.tag
WHERE tags.player_id = pl.id
AND wanted_tags.tag IS NULL
);
Doing a left join and filtering for IS NULL is called a left anti-join. It keeps the rows where the left-join condition does not match. So in this case, we retain a tag from our wanted_tags array only if there is no matching tag associated with this player. If any tags are left, the WHERE NOT EXISTS returns false, so the player is excluded.
Double-think, isn't it? It's easy to make mistakes with this so test.
Here ? should be your programming language PostgreSQL database binding's query parameter placeholder. I don't know what node.js's is. This will only work if you can pass an array as a query parameter in node. If not, you'll have to use dynamic SQL to generate an ARRAY['x','y','z'] expression or a (VALUES ('x'), ('y'), ('z')) subquery.
P.S. Please provide sample schema and data with questions when possible. http://sqlfiddle.com/ is handy.
I'm trying to select just one account using SQL Server but am getting the following error:
ERROR: The text data type cannot be selected as DISTINCT because
it is not comparable. Error Code: 421
with the following statement:
select DISTINCT ad.*,
acc.companyname,
acc.accountnumber
from address ad
join AddressLink al on al.AddressID = ad.id
join account acc on acc.ID = al.ParentID
where acc.accountnumber like '11227'
What have I done wrong?
Edit:
New query:
select address.ID,
address.StreetAddress1,
address.StreetAddress2,
address.City,
Address.State,
Address.PostalCode,
Address.ClassTypeID,
account.companyname,
account.accountnumber,
addresslink.ID as addressLinkID,
addresslink.addresstypeid
from address
join AddressLink on address.id = addresslink.AddressID
join account on addresslink.ParentID = account.ID
where account.CompanyName like 'company name'
All the company names that I've had to blur are identical.
Try:
select ad.*,
l.companyname,
l.accountnumber
from address ad
join (select DISTINCT al.AddressID,
acc.companyname,
acc.accountnumber
from account acc
join AddressLink al on acc.ID = al.ParentID
where acc.accountnumber like '11227') l
on l.AddressID = ad.id
"Distinct", in the context you have is trying to do distinct on ALL columns. That said, there are some data types that are NOT converable, such as TEXT. So, if your table has some of these non "Distinctable" column types exists, that is what is crashing your query.
However, to get around this, if you do something like
CONVERT( char(60), YourTextColumn ) as YourTextColumn,
It should get that for you... at least its now thinking the final column type is "char"acter and CAN compare it.
You should check the data types of the columns in the address table. My guess is that one or more of them has the data type text, ntext or image.
One of the restrictions of using text, ntext or image data types is that columns defined of these data types cannot be used as part of a SELECT statement that includes the DISTINCT clause.
For what it's worth, the MSDN article for ntext, text, and image (Transact-SQL) recommends avoiding these data types and use nvarchar(max), varchar(max), and varbinary(max) instead. You may want to consider changing how that table is defined.
The accepted answer from Mark B shows a subquery (good idea to limit the domain of the DISTINCT) on AddressLink.AddressId, Account.CompanyName, and Account.AccountNumber.
Let me ask this: Does AddressLink allow more than one record to have the same value in the ParentId and AddressId fields?
If not, and assuming that Mark B's answer works, then just remove the DISTINCT because you're never going to get any duplicates inside of that subquery.
Leaving the DISTINCT in causes a performance hit because the DB has to create a temporary table that is either indexed with a btree or a hash and it has to insert every value returned by the subquery into that table to check if it invalidates the uniqueness constraint on those three fields. Note that the "optimizer" doesn't know that there won't be any dupes... if you tell it to check for DISTINCT, it will check it... With a btree index this is going to cause O(n log n) work on the number of rows returned; with a hash it would cause O(n) work but who knows how big the constant factor is in relation to the other work you're doing (it's probably larger than everything else you're doing meaning this could make this run half as fast as without the DISTINCT).
So my answer is Mark B's answer without the DISTINCT in the subquery. Let me know if AddressLink does allow repeats (can't imagine why it would).
I have a rather complicated query on my PostgreSQL database spanning 4 tables via a series of nested subqueries. However, despite the slightly tricky looking appearance and setup, ultimately it will return two columns (from the same table, if that helps the situation) based on that matching of two external parameters (two strings need to match with fields in different tables). I'm fairly new to database design in PostgreSQL, so I know that this seemingly magical thing called Views exist, and that seems like it could help me here, but perhaps not.
Is there some way I can move my complex query inside a view and somehow just pass it the two values I need to match? That would greatly simplify my code on the front-end (by shifting the complexities to the database structure). I can create a view that wraps my static example query, and that works just fine, however that only works for one pair of string values. I need to be able to use it with a variety of different values.
Thus my question is: is it possible to pass parameters into an otherwise static View and have it become "dynamic"? Or perhaps a View is not the right way to approach it. If there's something else that would work better, I'm all ears!
*Edit: * As requested in comments, here's my query as it stands now:
SELECT param_label, param_graphics_label
FROM parameters
WHERE param_id IN
(SELECT param_id
FROM parameter_links
WHERE region_id =
(SELECT region_id
FROM regions
WHERE region_label = '%PARAMETER 1%' AND model_id =
(SELECT model_id FROM models WHERE model_label = '%PARAMETER 2%')
)
) AND active = 'TRUE'
ORDER BY param_graphics_label;
Parameters are set off by percent symbols above.
You could use a set returning function:
create or replace function label_params(parm1 text, parm2 text)
returns table (param_label text, param_graphics_label text)
as
$body$
select ...
WHERE region_label = $1
AND model_id = (SELECT model_id FROM models WHERE model_label = $2)
....
$body$
language sql;
Then you can do:
select *
from label_params('foo', 'bar')
Btw: are you sure you want:
AND model_id = (SELECT model_id FROM models WHERE model_label = $2)
if model_label is not unique (or the primary key) then this will throw an error eventually. You probably want:
AND model_id IN (SELECT model_id FROM models WHERE model_label = $2)
In addition to what #a_horse already cleared up, you could simplify your SQL statement with JOIN syntax instead of nested subqueries. Performance will be similar, but the syntax is much shorter and easier to manage.
CREATE OR REPLACE FUNCTION param_labels(_region_label text, _model_label text)
RETURNS TABLE (param_label text, param_graphics_label text)
LANGUAGE sql AS
$func$
SELECT p.param_label, p.param_graphics_label
FROM parameters p
JOIN parameter_links l USING (param_id)
JOIN regions r USING (region_id)
JOIN models m USING (model_id)
WHERE p.active
AND r.region_label = $1
AND m.model_label = $2
ORDER BY p.param_graphics_label;
$func$;
If model_label is not unique or something else in the query produces duplicate rows, you may want to make that SELECT DISTINCT p.param_graphics_label, p.param_label - with a matching ORDER BY clause for best performance. Or use GROUP BY.
Since Postgres 9.2 you can use the declared parameter names in place of $1 and $2 in SQL functions. (Has been possible for PL/pgSQL functions for a long time).
To avoid naming conflicts, I prefix parameter names with _ (those are visible most everywhere inside the function) and table-qualify column names in queries.
I simplified WHERE p.active = 'TRUE' to WHERE p.active, assuming the column active is type boolean.
USING in the JOIN condition only works if the column names are unambiguous across all tables to the left. Else use the more explicit syntax: ON l.param_id = p.param_id
In most cases the set-returning function is the way to go, but in the event that you want to both read and write to the set, a view may be more appropriate. And it is possible for a view to read a session parameter:
CREATE VIEW widget_sb AS
SELECT * FROM widget
WHERE column = cast(current_setting('mydomain.myparam') as int)
SET mydomain.myparam = 0
select * from widget_sb
[results]
SET mydomain.myparam = 1
select * from widget_sb
[distinct results]
I don't think a "dynamic" view as you stated is possible.
Why not write a stored procedure that takes 2 arguments instead?
I would rephrase the query as the following:
SELECT p.param_label, p.param_graphics_label
FROM parameters p
where exists (
select 1
from parameter_links pl
where pl.parameter_id = p.id
and exists (select 1 from regions r where r.region_id = pl.region_id
) and p.active = 'TRUE'
order by p.param_graphics_label;
Assuming that you have indexes on the various id columns, this query should be significantly faster than using the IN operator; the exists parameters here will use only the index values without even touching the data table except for getting the final data from the parameters table.
I have a case where I need to translate (lookup) several values from the same table. The first way I wrote it, was using subqueries:
SELECT
(SELECT id FROM user WHERE user_pk = created_by) AS creator,
(SELECT id FROM user WHERE user_pk = updated_by) AS updater,
(SELECT id FROM user WHERE user_pk = owned_by) AS owner,
[name]
FROM asset
As I'm using this subquery a lot (that is, I have about 50 tables with these fields), and I might need to add some more code to the subquery (for example, "AND active = 1" ) I thought I'd put these into a user-defined function UDF and use that. But the performance using that UDF was abysmal.
CREATE FUNCTION dbo.get_user ( #user_pk INT )
RETURNS INT
AS BEGIN
RETURN ( SELECT id
FROM ice.dbo.[user]
WHERE user_pk = #user_pk )
END
SELECT dbo.get_user(created_by) as creator, [name]
FROM asset
The performance of #1 is less than 1 second. Performance of #2 is about 30 seconds...
Why, or more importantly, is there any way I can code in SQL server 2008, so that I don't have to use so many subqueries?
Edit:
Just a litte more explanation of when this is useful. This simple query (that is, get userid) gets a lot more complex when I want to have a text for a user, since I have to join with profile to get the language, with a company to see if the language should be fetch'ed from there instead, and with the translation table to get the translated text. And for most of these queries, performance is a secondary issue to readability and maintainability.
The UDF is a black box to the query optimiser so it's executed for every row.
You are doing a row-by-row cursor. For each row in an asset, look up an id three times in another table. This happens when you use scalar or multi-statement UDFs (In-line UDFs are simply macros that expand into the outer query)
One of many articles on the problem is "Scalar functions, inlining, and performance: An entertaining title for a boring post".
The sub-queries can be optimised to correlate and avoid the row-by-row operations.
What you really want is this:
SELECT
uc.id AS creator,
uu.id AS updater,
uo.id AS owner,
a.[name]
FROM
asset a
JOIN
user uc ON uc.user_pk = a.created_by
JOIN
user uu ON uu.user_pk = a.updated_by
JOIN
user uo ON uo.user_pk = a.owned_by
Update Feb 2019
SQL Server 2019 starts to fix this problem.
As other posters have suggested, using joins will definitely give you the best overall performance.
However, since you've stated that that you don't want the headache of maintaining 50-ish similar joins or subqueries, try using an inline table-valued function as follows:
CREATE FUNCTION dbo.get_user_inline (#user_pk INT)
RETURNS TABLE AS
RETURN
(
SELECT TOP 1 id
FROM ice.dbo.[user]
WHERE user_pk = #user_pk
-- AND active = 1
)
Your original query would then become something like:
SELECT
(SELECT TOP 1 id FROM dbo.get_user_inline(created_by)) AS creator,
(SELECT TOP 1 id FROM dbo.get_user_inline(updated_by)) AS updater,
(SELECT TOP 1 id FROM dbo.get_user_inline(owned_by)) AS owner,
[name]
FROM asset
An inline table-valued function should have better performance than either a scalar function or a multistatement table-valued function.
The performance should be roughly equivalent to your original query, but any future changes can be made in the UDF, making it much more maintainable.
To get the same result (NULL if user is deleted or not active).
select
u1.id as creator,
u2.id as updater,
u3.id as owner,
[a.name]
FROM asset a
LEFT JOIN user u1 ON (u1.user_pk = a.created_by AND u1.active=1)
LEFT JOIN user u2 ON (u2.user_pk = a.created_by AND u2.active=1)
LEFT JOIN user u3 ON (u3.user_pk = a.created_by AND u3.active=1)
Am I missing something? Why can't this work? You are only selecting the id which you already have in the table:
select created_by as creator, updated_by as updater,
owned_by as owner, [name]
from asset
By the way, in designing you really should avoid keywords, like name, as field names.