PostgreSQL - Check foreign key exists when doing a SELECT - sql

Suppose I have the following data:
Table some_table:
some_table_id | value | other_table_id
--------------------------------------
1 | foo | 1
2 | bar | 2
Table other_table:
other_table_id | value
----------------------
1 | foo
2 | bar
Here, some_table has a foreign key to column other_table_id from other_table into the column of some name.
With the following query in PostgreSQL:
SELECT *
FROM some_table
WHERE other_table_id = 3;
As you see, 3 does not exists in other_table This query obviously will return 0 results.
Without doing a second query, is there a way to know if the foreign key that I am using as a filter effectively does not exist in the other_table?
Ideally as an error that later could be parsed (as it happends when doing an INSERT or an UPDATE with a wrong foreign key, for example).

You can exploit a feature of PL/pgSQL to implement this very cheaply:
CREATE OR REPLACE FUNCTION f_select_from_some_tbl(int)
RETURNS SETOF some_table AS
$func$
BEGIN
RETURN QUERY
SELECT *
FROM some_table
WHERE other_table_id = $1;
IF NOT FOUND THEN
RAISE WARNING 'Call with non-existing other_table_id >>%<<', $1;
END IF;
END
$func$ LANGUAGE plpgsql;
A final RETURN; is optional in this case.
The WARNINGis only raised if your query doesn't return any rows. I am not raising an ERROR in the example, since this would roll back the whole transaction (but you can do that if it fits your needs).
We've added a code example to the manual with Postgres 9.3 to demonstrate this.

If you perform an INSERT or UPDATE on some_table, specifying an other_table_id value that does not in fact exist in other_table, then you will get an error arising from violation of the foreign key constraint. SELECT queries are therefore your primary concern.
One way you could address the issue with SELECT queries would be to transform your queries to perform an outer join with other_table, like so:
SELECT st.*
FROM
other_table ot
LEFT JOIN some_table st ON st.other_table_id = ot.other_table_id
WHERE st.other_table_id = 3;
That query will always return at least one row if any other_table row has other_table_id = 3. In that case, if there is no matching some_table row, then it will return exactly one row, with that row having all columns NULL (given that it selects only columns from some_table), even columns that are declared not null.
If you want such queries to raise an error then you'll probably need to write a custom function to assist, but it can be done. I'd probably implement it in PL/pgSQL, using that language's RAISE statement.

Related

Delete rows depending on values from another table

How to delete rows from my customer table depending on values from another table, let's say orders?
If the customer has no active orders they should be able to be deleted from the DB along with their rows (done using CASCADE). However, if they have any active orders at all, they can't be deleted.
I thought about a PLPGSQL function, then using it in a trigger to check, but I'm lost. I have a basic block of SQL shown below of my first idea of deleting the record accordingly. But it doesn't work properly as it deletes the customer regardless of status, it just needs one cancelled order in this function and not all cancelled.
CREATE OR REPLACE FUNCTION DelCust(int)
RETURNS void AS $body$
DELETE FROM Customer
WHERE CustID IN (
SELECT CustID
FROM Order
WHERE Status = 'C'
AND CustID = $1
);
$body$
LANGUAGE SQL;
SELECT * FROM Customer;
I have also tried to use a PLPGSQL function returning a trigger then using a trigger to help with the deletion and checks, but I'm lost on it. Any thoughts on how to fix this? Any good sources for further reading?
You were very close. I suggest you use NOT IN instead of IN:
DELETE FROM Customer
WHERE CustID = $1 AND
CustID NOT IN (SELECT DISTINCT CustID
FROM Order
WHERE Status = 'A');
I'm guessing here that Status = 'A' means "active order". If it's something else change the code appropriately.
Your basic function can work like this:
CREATE OR REPLACE FUNCTION del_cust(_custid int)
RETURNS int --③
LANGUAGE sql AS
$func$
DELETE FROM customer c
WHERE c.custid = $1
AND NOT EXISTS ( -- ①
SELECT FROM "order" o -- ②
WHERE o.status = 'C' -- meaning "active"?
AND o.custid = $1
);
RETURNING c.custid; -- ③
$func$;
① Avoid NOT IN if you can. It's inefficient and potentially treacherous. Related:
Select rows which are not present in other table
② "order" is a reserved word in SQL. Better chose a legal identifier, or you have to always double-quote.
③ I made it RETURNS int to signify whether the row was actually deleted. The function returns NULL if the DELETE does not go through. Optional addition.
For a trigger implementation, consider this closely related answer:
Foreign key contraints in many-to-many relationships

How to find intersecting geographies between two tables recursively

I'm running Postgres 9.6.1 and PostGIS 2.3.0 r15146 and have two tables.
geographies may have 150,000,000 rows, paths may have 10,000,000 rows:
CREATE TABLE paths (id uuid NOT NULL, path path NOT NULL, PRIMARY KEY (id))
CREATE TABLE geographies (id uuid NOT NULL, geography geography NOT NULL, PRIMARY KEY (id))
Given an array/set of ids for table geographies, what is the "best" way of finding all intersecting paths and geometries?
In other words, if an initial geography has a corresponding intersecting path we need to also find all other geographies that this path intersects. From there, we need to find all other paths that these newly found geographies intersect, and so on until we've found all possible intersections.
The initial geography ids (our input) may be anywhere from 0 to 700. With an average around 40.
Minimum intersections will be 0, max will be about 1000. Average likely around 20, typically less than 100 connected.
I've created a function that does this, but I'm new to GIS in PostGIS, and Postgres in general. I've posted my solution as an answer to this question.
I feel like there should be a more eloquent and faster way of doing this than what I've come up with.
Your function can be radically simplified.
Setup
I suggest you convert the column paths.path to data type geography (or at least geometry). path is a native Postgres type and does not play well with PostGIS functions and spatial indexes. You would have to cast path::geometry or path::geometry::geography (resulting in a LINESTRING internally) to make it work with PostGIS functions like ST_Intersects().
My answer is based on these adapted tables:
CREATE TABLE paths (
id uuid PRIMARY KEY
, path geography NOT NULL
);
CREATE TABLE geographies (
id uuid PRIMARY KEY
, geography geography NOT NULL
, fk_id text NOT NULL
);
Everything works with data type geometry for both columns just as well. geography is generally more exact but also more expensive. Which to use? Read the PostGIS FAQ here.
Solution 1: Your function optimized
CREATE OR REPLACE FUNCTION public.function_name(_fk_ids text[])
RETURNS TABLE(id uuid, type text)
LANGUAGE plpgsql AS
$func$
DECLARE
_row_ct int;
_loop_ct int := 0;
BEGIN
CREATE TEMP TABLE _geo ON COMMIT DROP AS -- dropped at end of transaction
SELECT DISTINCT ON (g.id) g.id, g.geography, _loop_ct AS loop_ct -- dupes possible?
FROM geographies g
WHERE g.fk_id = ANY(_fk_ids);
GET DIAGNOSTICS _row_ct = ROW_COUNT;
IF _row_ct = 0 THEN -- no rows found, return empty result immediately
RETURN; -- exit function
END IF;
CREATE TEMP TABLE _path ON COMMIT DROP AS
SELECT DISTINCT ON (p.id) p.id, p.path, _loop_ct AS loop_ct
FROM _geo g
JOIN paths p ON ST_Intersects(g.geography, p.path); -- no dupes yet
GET DIAGNOSTICS _row_ct = ROW_COUNT;
IF _row_ct = 0 THEN -- no rows found, return _geo immediately
RETURN QUERY SELECT g.id, text 'geo' FROM _geo g;
RETURN;
END IF;
ALTER TABLE _geo ADD CONSTRAINT g_uni UNIQUE (id); -- required for UPSERT
ALTER TABLE _path ADD CONSTRAINT p_uni UNIQUE (id);
LOOP
_loop_ct := _loop_ct + 1;
INSERT INTO _geo(id, geography, loop_ct)
SELECT DISTINCT ON (g.id) g.id, g.geography, _loop_ct
FROM _paths p
JOIN geographies g ON ST_Intersects(g.geography, p.path)
WHERE p.loop_ct = _loop_ct - 1 -- only use last round!
ON CONFLICT ON CONSTRAINT g_uni DO NOTHING; -- eliminate new dupes
EXIT WHEN NOT FOUND;
INSERT INTO _path(id, path, loop_ct)
SELECT DISTINCT ON (p.id) p.id, p.path, _loop_ct
FROM _geo g
JOIN paths p ON ST_Intersects(g.geography, p.path)
WHERE g.loop_ct = _loop_ct - 1
ON CONFLICT ON CONSTRAINT p_uni DO NOTHING;
EXIT WHEN NOT FOUND;
END LOOP;
RETURN QUERY
SELECT g.id, text 'geo' FROM _geo g
UNION ALL
SELECT p.id, text 'path' FROM _path p;
END
$func$;
Call:
SELECT * FROM public.function_name('{foo,bar}');
Much faster than what you have.
Major points
You based queries on the whole set, instead of the latest additions to the set only. This gets increasingly slower with every loop without need. I added a loop counter (loop_ct) to avoid redundant work.
Be sure to have spatial GiST indexes on geographies.geography and paths.path:
CREATE INDEX geo_geo_gix ON geographies USING GIST (geography);
CREATE INDEX paths_path_gix ON paths USING GIST (path);
Since Postgres 9.5 index-only scans would be an option for GiST indexes. You might add id as second index column. The benefit depends on many factors, you'd have to test. However, there is no fitting operator GiST class for the uuid type. It would work with bigint after installing the extension btree_gist:
Postgres multi-column index (integer, boolean, and array)
Multicolumn index on 3 fields with heterogenous data types
Have a fitting index on g.fk_id, too. Again, a multicolumn index on (fk_id, id, geography) might pay if you can get index-only scans out of it. Default btree index, fk_id must be first index column. Especially if you run the query often and rarely update the table and table rows are much wider than the index.
You can initialize variables at declaration time. Only needed once after the rewrite.
ON COMMIT DROP drops the temp tables at the end of the transaction automatically. So I removed dropping tables explicitly. But you get an exception if you call the function in the same transaction twice. In the function I would check for existence of the temp table and use TRUNCATE in this case. Related:
How to check if a table exists in a given schema
Use GET DIAGNOSTICS to get the row count instead of running another query for the count.
Count rows affected by DELETE
You need GET DIAGNOSTICS. CREATE TABLE does not set FOUND (as is mentioned in the manual).
It's faster to add an index or PK / UNIQUE constraint after filling the table. And not before we actually need it.
ON CONFLICT ... DO ... is the simpler and cheaper way for UPSERT since Postgres 9.5.
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
For the simple form of the command you just list index columns or expressions (like ON CONFLICT (id) DO ...) and let Postgres perform unique index inference to determine an arbiter constraint or index. I later optimized by providing the constraint directly. But for this we need an actual constraint - a unique index is not enough. Fixed accordingly. Details in the manual here.
It may help to ANALYZE temporary tables manually to help Postgres find the best query plan. (But I don't think you need it in your case.)
Are regular VACUUM ANALYZE still recommended under 9.1?
_geo_ct - _geographyLength > 0 is an awkward and more expensive way of saying _geo_ct > _geographyLength. But that's gone completely now.
Don't quote the language name. Just LANGUAGE plpgsql.
Your function parameter is varchar[] for an array of fk_id, but you later commented:
It is a bigint field that represents a geographic area (it's actually a precomputed s2cell id at level 15).
I don't know s2cell id at level 15, but ideally you pass an array of matching data type, or if that's not an option default to text[].
Also since you commented:
There are always exactly 13 fk_ids passed in.
This seems like a perfect use case for a VARIADIC function parameter. So your function definition would be:
CREATE OR REPLACE FUNCTION public.function_name(_fk_ids VARIADIC text[]) ...
Details:
Pass multiple values in single parameter
Solution 2: Plain SQL with recursive CTE
It's hard to wrap an rCTE around two alternating loops, but possible with some SQL finesse:
WITH RECURSIVE cte AS (
SELECT g.id, g.geography::text, NULL::text AS path, text 'geo' AS type
FROM geographies g
WHERE g.fk_id = ANY($kf_ids) -- your input array here
UNION
SELECT p.id, g.geography::text, p.path::text
, CASE WHEN p.path IS NULL THEN 'geo' ELSE 'path' END AS type
FROM cte c
LEFT JOIN paths p ON c.type = 'geo'
AND ST_Intersects(c.geography::geography, p.path)
LEFT JOIN geographies g ON c.type = 'path'
AND ST_Intersects(g.geography, c.path::geography)
WHERE (p.path IS NOT NULL OR g.geography IS NOT NULL)
)
SELECT id, type FROM cte;
That's all.
You need the same indexes as above. You might wrap it into an SQL function for repeated use.
Major additional points
The cast to text is necessary because the geography type is not "hashable" (same for geometry). (See this open PostGIS issue for details.) Work around it by casting to text. Rows are unique by virtue of (id, type) alone, we can ignore the geography columns for this. Cast back to geography for the join. Shouldn't cost too much extra.
We need two LEFT JOIN so not to exclude rows, because at each iteration only one of the two tables may contribute more rows.
The final condition makes sure we are not done, yet:
WHERE (p.path IS NOT NULL OR g.geography IS NOT NULL)
This works because duplicate findings are excluded from the temporary
intermediate table. The manual:
For UNION (but not UNION ALL), discard duplicate rows and rows that
duplicate any previous result row. Include all remaining rows in the
result of the recursive query, and also place them in a temporary
intermediate table.
So which is faster?
The rCTE is probably faster than the function for small result sets. The temp tables and indexes in the function mean considerably more overhead. For large result sets the function may be faster, though. Only testing with your actual setup can give you a definitive answer.*
See the OP's feedback in the comment.
I figured it'd be good to post my own solution here even if it isn't optimal.
Here is what I came up with (using Steve Chambers' advice):
CREATE OR REPLACE FUNCTION public.function_name(
_fk_ids character varying[])
RETURNS TABLE(id uuid, type character varying)
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE
ROWS 1000.0
AS $function$
DECLARE
_pathLength bigint;
_geographyLength bigint;
_currentPathLength bigint;
_currentGeographyLength bigint;
BEGIN
DROP TABLE IF EXISTS _pathIds;
DROP TABLE IF EXISTS _geographyIds;
CREATE TEMPORARY TABLE _pathIds (id UUID PRIMARY KEY);
CREATE TEMPORARY TABLE _geographyIds (id UUID PRIMARY KEY);
-- get all geographies in the specified _fk_ids
INSERT INTO _geographyIds
SELECT g.id
FROM geographies g
WHERE g.fk_id= ANY(_fk_ids);
_pathLength := 0;
_geographyLength := 0;
_currentPathLength := 0;
_currentGeographyLength := (SELECT COUNT(_geographyIds.id) FROM _geographyIds);
-- _pathIds := ARRAY[]::uuid[];
WHILE (_currentPathLength - _pathLength > 0) OR (_currentGeographyLength - _geographyLength > 0) LOOP
_pathLength := (SELECT COUNT(_pathIds.id) FROM _pathIds);
_geographyLength := (SELECT COUNT(_geographyIds.id) FROM _geographyIds);
-- gets all paths that have paths that intersect the geographies that aren't in the current list of path ids
INSERT INTO _pathIds
SELECT DISTINCT p.id
FROM paths p
JOIN geographies g ON ST_Intersects(g.geography, p.path)
WHERE
g.id IN (SELECT _geographyIds.id FROM _geographyIds) AND
p.id NOT IN (SELECT _pathIds.id from _pathIds);
-- gets all geographies that have paths that intersect the paths that aren't in the current list of geography ids
INSERT INTO _geographyIds
SELECT DISTINCT g.id
FROM geographies g
JOIN paths p ON ST_Intersects(g.geography, p.path)
WHERE
p.id IN (SELECT _pathIds.id FROM _pathIds) AND
g.id NOT IN (SELECT _geographyIds.id FROM _geographyIds);
_currentPathLength := (SELECT COUNT(_pathIds.id) FROM _pathIds);
_currentGeographyLength := (SELECT COUNT(_geographyIds.id) FROM _geographyIds);
END LOOP;
RETURN QUERY
SELECT _geographyIds.id, 'geography' AS type FROM _geographyIds
UNION ALL
SELECT _pathIds.id, 'path' AS type FROM _pathIds;
END;
$function$;
Sample plot and data from this script
It can be pure relational with an aggregate function. This implementation uses one path table and one point table. Both are geometries. The point is easier to create test data with and to test than a generic geography but it should be simple to adapt.
create table path (
path_text text primary key,
path geometry(linestring) not null
);
create table point (
point_text text primary key,
point geometry(point) not null
);
A type to keep the aggregate function's state:
create type mpath_mpoint as (
mpath geometry(multilinestring),
mpoint geometry(multipoint)
);
The state building function:
create or replace function path_point_intersect (
_i mpath_mpoint[], _e mpath_mpoint
) returns mpath_mpoint[] as $$
with e as (select (e).mpath, (e).mpoint from (values (_e)) e (e)),
i as (select mpath, mpoint from unnest(_i) i (mpath, mpoint))
select array_agg((mpath, mpoint)::mpath_mpoint)
from (
select
st_multi(st_union(i.mpoint, e.mpoint)) as mpoint,
(
select st_collect(gd)
from (
select gd from st_dump(i.mpath) a (a, gd)
union all
select gd from st_dump(e.mpath) b (a, gd)
) s
) as mpath
from i inner join e on st_intersects(i.mpoint, e.mpoint)
union all
select i.mpoint, i.mpath
from i inner join e on not st_intersects(i.mpoint, e.mpoint)
union all
select e.mpoint, e.mpath
from e
where not exists (
select 1 from i
where st_intersects(i.mpoint, e.mpoint)
)
) s;
$$ language sql;
The aggregate:
create aggregate path_point_agg (mpath_mpoint) (
sfunc = path_point_intersect,
stype = mpath_mpoint[]
);
This query will return a set of multilinestring, multipoint strings containing the matched paths/points:
select st_astext(mpath), st_astext(mpoint)
from unnest((
select path_point_agg((st_multi(path), st_multi(mpoint))::mpath_mpoint)
from (
select path, st_union(point) as mpoint
from
path
inner join
point on st_intersects(path, point)
group by path
) s
)) m (mpath, mpoint)
;
st_astext | st_astext
-----------------------------------------------------------+-----------------------------
MULTILINESTRING((-10 0,10 0,8 3),(0 -10,0 10),(2 1,4 -1)) | MULTIPOINT(0 0,0 5,3 0,5 0)
MULTILINESTRING((-9 -8,4 -8),(-8 -9,-8 6)) | MULTIPOINT(-8 -8,2 -8)
MULTILINESTRING((-7 -4,-3 4,-5 6)) | MULTIPOINT(-6 -2)

Optimize INSERT / UPDATE / DELETE operation

I wonder if the following script can be optimized somehow. It does write a lot to disk because it deletes possibly up-to-date rows and reinserts them. I was thinking about applying something like "insert ... on duplicate key update" and found some possibilities for single-row updates but I don't know how to apply it in the context of INSERT INTO ... SELECT query.
CREATE OR REPLACE FUNCTION update_member_search_index() RETURNS VOID AS $$
DECLARE
member_content_type_id INTEGER;
BEGIN
member_content_type_id :=
(SELECT id FROM django_content_type
WHERE app_label='web' AND model='member');
DELETE FROM watson_searchentry WHERE content_type_id = member_content_type_id;
INSERT INTO watson_searchentry (engine_slug, content_type_id, object_id
, object_id_int, title, description, content
, url, meta_encoded)
SELECT 'default',
member_content_type_id,
web_member.id,
web_member.id,
web_member.name,
'',
web_user.email||' '||web_member.normalized_name||' '||web_country.name,
'',
'{}'
FROM web_member
INNER JOIN web_user ON (web_member.user_id = web_user.id)
INNER JOIN web_country ON (web_member.country_id = web_country.id)
WHERE web_user.is_active=TRUE;
END;
$$ LANGUAGE plpgsql;
EDIT: Schemas of web_member, watson_searchentry, web_user, web_country: http://pastebin.com/3tRVPPVi.
The main point is to update columns title and content in watson_searchentry. There is a trigger on the table that sets value of column search_tsv based on these columns.
(content_type_id, object_id_int) in watson_searchentry is unique pair in the table but atm the index is not present (there is no use for it).
This script should be run at most once a day for full rebuilds of search index and occasionally after importing some data.
Modified table definition
If you really need those columns to be NOT NULL and you really need the string 'default' as default for engine_slug, I would advice to introduce column defaults:
COLUMN | TYPE | Modifiers
-----------------+-------------------------+---------------------
id | INTEGER | NOT NULL DEFAULT ...
engine_slug | CHARACTER VARYING(200) | NOT NULL DEFAULT 'default'
content_type_id | INTEGER | NOT NULL
object_id | text | NOT NULL
object_id_int | INTEGER |
title | CHARACTER VARYING(1000) | NOT NULL
description | text | NOT NULL DEFAULT ''
content | text | NOT NULL
url | CHARACTER VARYING(1000) | NOT NULL DEFAULT ''
meta_encoded | text | NOT NULL DEFAULT '{}'
search_tsv | tsvector | NOT NULL
...
DDL statement would be:
ALTER TABLE watson_searchentry ALTER COLUMN engine_slug DEFAULT 'default';
Etc.
Then you don't have to insert those values manually every time.
Also: object_id text NOT NULL, object_id_int INTEGER? That's odd. I guess you have your reasons ...
I'll go with your updated requirement:
The main point is to update columns title and content in watson_searchentry
Of course, you must add a UNIQUE constraint to enforce your requirements:
ALTER TABLE watson_searchentry
ADD CONSTRAINT ws_uni UNIQUE (content_type_id, object_id_int)
The accompanying index will be used. By this query for starters.
BTW, I almost never use varchar(n) in Postgres. Just text. Here's one reason.
Query with data-modifying CTEs
This could be rewritten as a single SQL query with data-modifying common table expressions, also called "writeable" CTEs. Requires Postgres 9.1 or later.
Additionally, this query only deletes what has to be deleted, and updates what can be updated.
WITH ctyp AS (
SELECT id AS content_type_id
FROM django_content_type
WHERE app_label = 'web'
AND model = 'member'
)
, sel AS (
SELECT ctyp.content_type_id
,m.id AS object_id_int
,m.id::text AS object_id -- explicit cast!
,m.name AS title
,concat_ws(' ', u.email,m.normalized_name,c.name) AS content
-- other columns have column default now.
FROM web_user u
JOIN web_member m ON m.user_id = u.id
JOIN web_country c ON c.id = m.country_id
CROSS JOIN ctyp
WHERE u.is_active
)
, del AS ( -- only if you want to del all other entries of same type
DELETE FROM watson_searchentry w
USING ctyp
WHERE w.content_type_id = ctyp.content_type_id
AND NOT EXISTS (
SELECT 1
FROM sel
WHERE sel.object_id_int = w.object_id_int
)
)
, up AS ( -- update existing rows
UPDATE watson_searchentry
SET object_id = s.object_id
,title = s.title
,content = s.content
FROM sel s
WHERE w.content_type_id = s.content_type_id
AND w.object_id_int = s.object_id_int
)
-- insert new rows
INSERT INTO watson_searchentry (
content_type_id, object_id_int, object_id, title, content)
SELECT sel.* -- safe to use, because col list is defined accordingly above
FROM sel
LEFT JOIN watson_searchentry w1 USING (content_type_id, object_id_int)
WHERE w1.content_type_id IS NULL;
The subquery on django_content_type always returns a single value? Otherwise, the CROSS JOIN might cause trouble.
The first CTE sel gathers the rows to be inserted. Note how I pick matching column names to simplify things.
In the CTE del I avoid deleting rows that can be updated.
In the CTE up those rows are updated instead.
Accordingly, I avoid inserting rows that were not deleted before in the final INSERT.
Can easily be wrapped into an SQL or PL/pgSQL function for repeated use.
Not secure for heavy concurrent use. Much better than the function you had, but still not 100% robust against concurrent writes. But that's not an issue according to your updated info.
Replacing the UPDATEs with DELETE and INSERT may or may not be a lot more expensive. Internally every UPDATE results in a new row version anyways, due to the MVCC model.
Speed first
If you don't really care about preserving old rows, your simpler approach may be faster: Delete everything and insert new rows. Also, wrapping into a plpgsql function saves a bit of planning overhead. Your function basically, with a couple of minor simplifications and observing the defaults added above:
CREATE OR REPLACE FUNCTION update_member_search_index()
RETURNS VOID AS
$func$
DECLARE
_ctype_id int := (
SELECT id
FROM django_content_type
WHERE app_label='web'
AND model = 'member'
); -- you can assign at declaration time. saves another statement
BEGIN
DELETE FROM watson_searchentry
WHERE content_type_id = _ctype_id;
INSERT INTO watson_searchentry
(content_type_id, object_id, object_id_int, title, content)
SELECT _ctype_id, m.id, m.id::int,m.name
,u.email || ' ' || m.normalized_name || ' ' || c.name
FROM web_member m
JOIN web_user u USING (user_id)
JOIN web_country c ON c.id = m.country_id
WHERE u.is_active;
END
$func$ LANGUAGE plpgsql;
I even refrain from using concat_ws(): It is safe against NULL values and simplifies code, but a bit slower than simple concatenation.
Also:
There is a trigger on the table that sets value of column search_tsv
based on these columns.
It would be faster to incorporate the logic into this function - if this is the only time the trigger is needed. Else, it's probably not worth the fuss.

How to access columns on a cursor which is a join on all elements of two tables in Oracle PL/SQL

I am trying to run a cursor on full join of two tables but having problem accessing the columns in cursor.
CREATE TABLE APPLE(
MY_ID VARCHAR(2) NOT NULL,
A_TIMESTAMP TIMESTAMP,
A_NAME VARCHAR(10)
);
CREATE TABLE BANANA(
MY_ID VARCHAR(2) NOT NULL,
B_TIMESTAMP TIMESTAMP,
B_NAME VARCHAR(10)
);
I have written a Full join to return all related rows from tables A and B where any of the two timestamps are in future.
i.e. if a row in table APPLE has timestamp in future then fetch row from APPLE joined with row from BANANA on MY_ID
Similarly, if a row in table BANANA has timestamp in future then fetch row from BANANA joined with row from APPLE on MY_ID
This full join works for me.
select * from APPLE a full join BANANA b on a.MY_ID = b.MY_ID where
(
a.A_TIMESTAMP > current_timestamp
or b.B_TIMESTAMP > current_timestamp
);
Now I want to iterate over each joined record and do some processing. I am able to access the columns which are only present in one tables but getting error when trying to access the column names which are same in both tables. For ex. ID in this case.
create or replace
PROCEDURE testProc(someDate IN DATE)
AS
CURSOR c1 IS
select * from APPLE a full join BANANA b on a.MY_ID = b.MY_ID where
(
a.A_TIMESTAMP > current_timestamp
or b.B_TIMESTAMP > current_timestamp
);
BEGIN
FOR rec IN c1
LOOP
DBMS_OUTPUT.PUT_LINE(rec.A_NAME);
DBMS_OUTPUT.PUT_LINE(rec.A_TIMESTAMP);
DBMS_OUTPUT.PUT_LINE(rec.MY_ID);
END LOOP;
END testProc;
I get this error when I compile the above proc:
Error(16,28): PLS-00302: component 'MY_ID' must be declared
and I am not sure how would I access the MY_ID element. I am sure it will be pretty
straight forward but I am new to database programming and have been trying but unable to find the right way to do it.
Any help is appreciated.
Thanks
One other thing you can do in this case is to join the tables with the USING clause instead of using ON, as in:
select *
from APPLE a
full join BANANA b
USING (MY_ID)
where a.A_TIMESTAMP > current_timestamp or
b.B_TIMESTAMP > current_timestamp
USING can be used if the columns on both tables have the same name, and the comparison of the key values is made using the equality ('=') operator. In the result set there will be one column named MY_ID along with the other columns from both table (A_TIMESTAMP, B_TIMESTAMP, etc).
Share and enjoy.
I assume the problem is that MY_ID is defined in both tables, so * gets both of them. Try defining the cursor using this query:
select coalesce(A.MY_ID, B.MY_ID) as MY_ID,
A_TIMESTAMP, A_NAME, B_TIMESTAMP, B_NAME
from APPLE a full join
BANANA b
on a.MY_ID = b.MY_ID
where a.A_TIMESTAMP > current_timestamp or b.B_TIMESTAMP > current_timestamp;
EDIT:
You have two issues with conflicting columns. If this were just an inner join, you could do:
select A.*, B_TIMESTAMP, B_NAME
That is, you can select the columns from one table using * and the rest individually. However, this is a full outer join, so there is a set of columns where you want to use coalesce().
So, the best answer is that you should list out all the columns. This is good coding practice anyway, and helps protect the code from inadvertent mistakes when columns are added or removed from the table.

formula for computed column based on different table's column

Consider this table: c_const
code | nvalue
--------------
1 | 10000
2 | 20000
and another table t_anytable
rec_id | s_id | n_code
---------------------
2 | x | 1
The goal is to have s_id be a computed column, based on this formula:
rec_id*(select nvalue from c_const where code=ncode)
This produces an error:
Subqueries are not allowed in this context. Only scalar expressions are allowed.
How can I calculate the value for this computed column using another table's column as an input?
You could create a user-defined function for this:
CREATE FUNCTION dbo.GetValue(#ncode INT, #recid INT)
RETURNS INT
AS
SELECT #recid * nvalue
FROM c_const
WHERE code = #ncode
and then use that to define your computed column:
ALTER TABLE dbo.YourTable
ADD NewColumnName AS dbo.GetValue(ncodeValue, recIdValue)
This seems to be more of a job for views (indexed views, if you need fast lookups on the computed column):
CREATE VIEW AnyView
WITH SCHEMABINDING
AS
SELECT a.rec_id, a.s_id, a.n_code, a.rec_id * c.nvalue AS foo
FROM AnyTable a
INNER JOIN C_Const c
ON c.code = a.n_code
This has a subtle difference from the subquery version in that it would return multiple records instead of producing an error if there are multiple results for the join. But that is easily resolved with a UNIQUE constraint on c_const.code (I suspect it's already a PRIMARY KEY).
It's also a lot easier for someone to understand than the subquery version.
You can do it with a subquery and UDF as marc_s has shown, but that's likely to be highly inefficient compared to a simple JOIN, since a scalar UDF will need to be computed row-by-row.