Performantly update table with 2 million rows with Postgres/PostGIS - sql

I have two tables:
properties (geo_point POINT, locality_id INTEGER, neighborhood_id INTEGER, id UUID)
places_temp (id INTEGER, poly GEOMETRY, placetype TEXT)
Note: all columns in places_temp are indexed.
properties has ~2 million rows and I would like to:
update locality_id and neighborhood_id for each row in properties with the id from places_temp where properties.geo_point is contained by a polygon in places_temp.poly
Whatever I do it just seems to hang for hours within which time I don't know if it's working, the connection is lost, etc.
Any thoughts on how to do this performantly?
My query:
-- drop indexes on locality_id and neighborhood_id to speed up update
DROP INDEX IF EXISTS idx_properties_locality_id;
DROP INDEX IF EXISTS idx_properties_neighborhood_id;
-- for each property find the locality and neighborhood
UPDATE
properties
SET
locality_id = (
SELECT
id
FROM
places_temp
WHERE
placetype = 'locality'
-- check if geo_point is contained by polygon. geo_point is stored as SRID 26910 so must be
-- transformed first
AND st_intersects (st_transform (geo_point, 4326), poly)
LIMIT 1),
neighborhood_id = (
SELECT
id
FROM
places_temp
WHERE
placetype = 'neighbourhood'
-- check if geo_point is contained by polygon. geo_point is stored as SRID 26910 so must be
-- transformed first
AND st_intersects (st_transform (geo_point, 4326), poly)
LIMIT 1);
-- Add indexes back after update
CREATE INDEX IF NOT EXISTS idx_properties_locality_id ON properties (locality_id);
CREATE INDEX IF NOT EXISTS idx_properties_neighborhood_id ON properties (neighborhood_id);

CREATE INDEX properties_point_idx ON properties USING gist (geo_point);
CREATE INDEX places_temp_poly_idx ON places_temp USING gist (poly);
UPDATE properties p
SET locality_id = x.id
FROM ( SELECT *
, row_number() OVER () rn
FROM places_temp t
WHERE t.placetype = 'locality'
AND st_intersects (st_transform (p.geo_point, 4326), t.poly)
)x
WHERE x.rn = 1
;
And similar for the other field (you could combine them into one query)

Try this
UPDATE
properties
SET
locality_id =t.id, neighbourhood_id
=t.id
From(
SELECT
id
FROM
places_temp
WHERE
placetype in ('locality',
'neighbourhood')
AND st_intersects (st_transform
(geo_point, 4326), poly)
LIMIT 1)) t

Related

SQL to catch incremental entries from a view

I have a historic table that won't be updated with new inserts, and a have a view that everyday will be update with new inserts. So I need to know if my SQL is correct.
This SQL needs to get all entries that inside in FATO_Proposta_Planilha table (Table 1)
and to add the entries not similar that are in the FATO_Proposta_View table (Table 2).
So, this SQL must have all entries from Table 1 more all entries from Table 2 that are not repeated in the Table 1. Can you give a opinion about this SQL, please?
SELECT vw.[DescPac] [PA]
,vw.[DescRegional] [Regional]
,vw.[DescSuperintendencia] [Superintendencia]
,vw.[NUM_CPF_CNPJ] [Documento_Numero]
,pla.[Nome] [Nome]
,pla.[Produto] [Produto]
,pla.[Modalidade] [Modalidade]
,vw.[NUM_CONTRATO_CREDITO] [Contrato]
,vw.[DESC_FINALIDADE_OPCRED] [Finalidade]
,vw.[DATA_OPERACAO] [Data_operacao]
,pla.[Data_mov_entrada] [Data_mov_entrada]
,vw.[DATA_VENC_OPCRED] [Data_vencimento]
,vw.[VALOR_CONTRATO_OPCRED] [Valor_contrato]
,pla.[Processo_Lecon] [Processo_Lecon]
,CASE WHEN ISNULL(pla.Origem, '') = ''
THEN 'Esteira Convencional'
ELSE pla.Origem
END [Origem]
FROM Proposta_View vw
LEFT JOIN FATO_Proposta_Planilha pla
ON vw.NUM_CONTRATO_CREDITO = pla.Contrato
UNION
SELECT [PA] [PA]
,[Regional] [Regional]
,[Superintendencia] [Superintendencia]
,[Documento_Numero] [Documento_Numero]
,[Nome] [Nome]
,[Produto] [Produto]
,[Modalidade] [Modalidade]
,[Contrato] [Contrato]
,[Finalidade] [Finalidade]
,[Data_operacao] [Data_operacao]
,[Data_mov_entrada] [Data_mov_entrada]
,[Data_vencimento] [Data_vencimento]
,[Valor_contrato] [Valor_contrato]
,[Processo_Lecon] [Processo_Lecon]
,CASE WHEN ISNULL(Origem, '') = ''
THEN 'Esteira Convencional'
ELSE Origem
END [Origem]
If you are only inserting rows through the view you can add an extra column with a DEFAULT value to distinguish the old rows from the new ones.
For example if you have a table t as:
create table t (a int primary key not null);
insert into t (a) values (123), (456);
You can add the extra column as:
alter table t add is_new int default 1;
update t set is_new = 0;
create view v as select a from t;
Then each insert through the view won't see that new column and will insert with value 1.
insert into v (a) values (789), (444);
Then it's easy to find the new rows:
select * from t where is_new = 1;
Result:
a is_new
---- ------
444 1
789 1
Se running example at db<>fiddle.

Using result of select inside alter table statement in postgres

I have a postgres table defined as below:
CREATE TABLE public.Table_1
(
id bigint NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1
START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 )
)
Due to data migration, the id column is messed up and the value for id that is being generated on INSERT is not unique. Hence, I need to reset the id column as below
SELECT MAX(id) + 1 From Table_1;
ALTER TABLE Table_1 ALTER COLUMN id RESTART WITH 935074;
Right now I run the first query to get the Max(id) + 1 value and then I need to substitute it in the ALTER query.
Is there a way to store the result of SELECT and just use the variable inside ALTER statement?
Here is one way to do it:
select setval(pg_get_serial_sequence('Table_1', 'id'), coalesce(max(id),0) + 1, false)
from Table_1;
Rationale:
pg_get_serial_sequence() returns the name of the sequence for the given table and column
set_val() can be used to reset the sequence
this can be wrapped in a select that gives you the current maximum value of id in the table (or 1 if the table is empty)

Convert table column data type from blob to raw

I have a table designed like,
create table tbl (
id number(5),
data blob
);
Its found that the column data have
very small size data, which can be stored in raw(200):
so the new table would be,
create table tbl (
id number(5),
data raw(200)
);
How can I migrate this table to new design without loosing the data in it.
This is a bit lengthy method, but it works if you are sure that your data column values don't go beyond 200 in length.
Create a table to hold the contents of tbl temporarily
create table tbl_temp as select * from tbl;
Rem -- Ensure that tbl_temp contains all the contents
select * from tbl_temp;
Rem -- Double verify by subtracting the contents
select * from tbl minus select * from tbl_temp;
Delete the contents in tbl
delete from tbl;
commit;
Drop column data
alter table tbl drop column data;
Create a column data with raw(200) type
alter table tbl add data raw(200);
Select & insert from the temporary table created
insert into tbl select id, dbms_lob.substr(data,200,1) from tbl_temp;
commit;
We are using substr method of dbms_lob package which returns raw type data. So, the resulted value can be directly inserted.

Why does this stored procedure return an empty set?

I'm trying to build a stored proc to encapsulate some complicated logic. Here's the basic code, anonymized a little:
SET TERM ^ ;
RECREATE PROCEDURE GET_DATA (
USERID INTEGER,
W INTEGER,
X INTEGER,
Y INTEGER)
RETURNS (
ID INTEGER,
NAME VARCHAR(64) CHARACTER SET UTF8)
AS
BEGIN
select first 1
QP.ID,
QO.NAME
from QP
join QO
on QO.ID = QP.QO_ID
where
(QO.W = :w) and (QO.X = :x) and (QO.Y = :y)
and ((QP.PREREQUISITE in (
select VALUE
from LOOKUP_TABLE1
where USER_ID = :userid))
or (QP.PREREQUISITE is null))
and (QO.Q_ID not in (
select VALUE
from LOOKUP_TABLE2
where USER_ID = :userid))
order by QP.SEQUENCE desc
into :ID, :NAME;
suspend;
END^
SET TERM ; ^
It's expected to return either 1 or 0 results. It's logically correct; if I take the SELECT query, substitute the parameters manually, and run it in Firebird Maestro, it gives the expected result. But if I say select ID, NAME from GET_DATA(1, 1, 2, 3), with the same parameters, I get back an empty result set.
So something's going wrong at the stored procedure level. Anyone have any idea what it is and how I can fix it?
Your procedure always returns 1 result even if the select returns 1 or 0 results because SUSPEND is independent of selection.
To obtain 0 or 1 to select the corresponding results, you can use:
RECREATE PROCEDURE GET_DATA (
USERID INTEGER,
W INTEGER,
X INTEGER,
Y INTEGER)
RETURNS (
ID INTEGER,
NAME VARCHAR(64) CHARACTER SET UTF8)
AS
BEGIN
FOR select first 1
QP.ID,
QO.NAME
from QP
join QO
on QO.ID = QP.QO_ID
where
(QO.W = :w) and (QO.X = :x) and (QO.Y = :y)
and ((QP.PREREQUISITE in (
select VALUE
from LOOKUP_TABLE1
where USER_ID = :userid))
or (QP.PREREQUISITE is null))
and (QO.Q_ID not in (
select VALUE
from LOOKUP_TABLE2
where USER_ID = :userid))
order by QP.SEQUENCE desc
into :ID, :NAME DO
suspend;
END^
SET TERM ; ^
You are treating this as a selectable procedure, but Firebird doesn't see this as being selectable procedure, but as an executable procedure. If you add a SUSPEND at the end it will, and AFAIK that should solve your problem.
SUSPEND is just like an inverted FETCH it means "send a line as result".
If you add 4 lines with SUSPEND; The result of your select * from GET_DATA(parameters) should be at least four lines with null values. Which would mean that your select statement has returned no rows, or one row with null values.
Do some tests with a simpler query.
select 1, 'TEST' from RDB$DATABASE into :ID, :NAME;

Find the last value in a "rolled-over" sequence with a stored procedure?

Suppose I had a set of alpha-character identifiers of a set length, e.g. always five letters, and they are assigned in such a way that they are always incremented sequentially (GGGGZ --> GGGHA, etc.). Now, if I get to ZZZZZ, since the length is fixed, I must "roll over" to AAAAA. I might have a contiguous block from ZZZAA through AAAAM. I want to write a sproc that will give me the "next" identifier, in this case AAAAN.
If I didn't have this "rolling over" issue, of course, I'd just ORDER BY DESC and grab the top result. But I'm at a bit of a loss now -- and it doesn't help at all that SQL is not my strongest language.
If I have to I can move this to my C# calling code, but a sproc would be a better fit.
ETA: I would like to avoid changing the schema (new column or new table); I'd rather just be able to "figure it out". I might even prefer to do it brute force (e.g. start at the lowest value and increment until I find a "hole"), even though that could get expensive. If you have an answer that does not modify the schema, it'd be a better solution for my needs.
Here's code that I think will give you your Next value. I created 3 functions. The table is just my simulation of the table.column with your alpha ids (I used MyTable.AlphaID). I assume that it's as you implied and there is one contiguous block of five-character uppercase alphabetic strings (AlphaID):
IF OBJECT_ID('dbo.MyTable','U') IS NOT NULL
DROP TABLE dbo.MyTable
GO
CREATE TABLE dbo.MyTable (AlphaID char(5) PRIMARY KEY)
GO
-- Play with different population scenarios for testing
INSERT dbo.MyTable VALUES ('ZZZZY')
INSERT dbo.MyTable VALUES ('ZZZZZ')
INSERT dbo.MyTable VALUES ('AAAAA')
INSERT dbo.MyTable VALUES ('AAAAB')
GO
IF OBJECT_ID('dbo.ConvertAlphaIDToInt','FN') IS NOT NULL
DROP FUNCTION dbo.ConvertAlphaIDToInt
GO
CREATE FUNCTION dbo.ConvertAlphaIDToInt (#AlphaID char(5))
RETURNS int
AS
BEGIN
RETURN 1+ ASCII(SUBSTRING(#AlphaID,5,1))-65
+ ((ASCII(SUBSTRING(#AlphaID,4,1))-65) * 26)
+ ((ASCII(SUBSTRING(#AlphaID,3,1))-65) * POWER(26,2))
+ ((ASCII(SUBSTRING(#AlphaID,2,1))-65) * POWER(26,3))
+ ((ASCII(SUBSTRING(#AlphaID,1,1))-65) * POWER(26,4))
END
GO
IF OBJECT_ID('dbo.ConvertIntToAlphaID','FN') IS NOT NULL
DROP FUNCTION dbo.ConvertIntToAlphaID
GO
CREATE FUNCTION dbo.ConvertIntToAlphaID (#ID int)
RETURNS char(5)
AS
BEGIN
RETURN CHAR((#ID-1) / POWER(26,4) + 65)
+ CHAR ((#ID-1) % POWER(26,4) / POWER(26,3) + 65)
+ CHAR ((#ID-1) % POWER(26,3) / POWER(26,2) + 65)
+ CHAR ((#ID-1) % POWER(26,2) / 26 + 65)
+ CHAR ((#ID-1) % 26 + 65)
END
GO
IF OBJECT_ID('dbo.GetNextAlphaID','FN') IS NOT NULL
DROP FUNCTION dbo.GetNextAlphaID
GO
CREATE FUNCTION dbo.GetNextAlphaID ()
RETURNS char(5)
AS
BEGIN
DECLARE #MaxID char(5), #ReturnVal char(5)
SELECT #MaxID = MAX(AlphaID) FROM dbo.MyTable
IF #MaxID < 'ZZZZZ'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
IF #MaxID IS NULL
RETURN 'AAAAA'
SELECT #MaxID = MAX(AlphaID)
FROM dbo.MyTable
WHERE AlphaID < dbo.ConvertIntToAlphaID((SELECT COUNT(*) FROM dbo.MyTable))
IF #MaxID IS NULL
RETURN 'AAAAA'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
END
GO
SELECT * FROM dbo.MyTable ORDER BY dbo.ConvertAlphaIDToInt(AlphaID)
GO
SELECT dbo.GetNextAlphaID () AS 'NextAlphaID'
By the way, if you don't want to assume contiguity, you can do as you suggested and (if there's a 'ZZZZZ' row) use the first gap in the sequence. Replace the last function with this:
IF OBJECT_ID('dbo.GetNextAlphaID_2','FN') IS NOT NULL
DROP FUNCTION dbo.GetNextAlphaID_2
GO
CREATE FUNCTION dbo.GetNextAlphaID_2 ()
RETURNS char(5)
AS
BEGIN
DECLARE #MaxID char(5), #ReturnVal char(5)
SELECT #MaxID = MAX(AlphaID) FROM dbo.MyTable
IF #MaxID < 'ZZZZZ'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
IF #MaxID IS NULL
RETURN 'AAAAA'
SELECT TOP 1 #MaxID=M1.AlphaID
FROM dbo.Mytable M1
WHERE NOT EXISTS (SELECT 1 FROM dbo.MyTable M2
WHERE AlphaID = dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(M1.AlphaID) + 1 )
)
ORDER BY M1.AlphaID
IF #MaxID IS NULL
RETURN 'AAAAA'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
END
GO
You'd have to store the last allocated identifier in the sequence.
For example, store it in another table that has one column & one row.
CREATE TABLE CurrentMaxId (
Id CHAR(6) NOT NULL
);
INSERT INTO CurrentMaxId (Id) VALUES ('AAAAAA');
Each time you allocate a new identifier, you'd fetch the value in that tiny table, increment it, and store that value in your main table as well as updating the value in CurrentMaxId.
The usual caveats apply with respect to concurrency, table-locking, etc.
I think I'd have tried to store the sequence as an integer, then translate it to string. Or else store a parallel integer column that is incremented at the same time as the alpha value. Either way, you could sort on the integer column.
A problem here is that you can't really tell from the data where the "last" entry is unless there is more detail as to how the old entries are deleted.
If I understand correctly, you are wrapping around at the end of the sequence, which means you must be deleting some of your old data to make space. However if the data isn't deleted in a perfectly uniform manner, you'll end up with fragments, like below:
ABCD HIJKL NOPQRS WXYZ
You'll notice that there is no obvious next value...D could be the last value created, but it might also be L or S.
At best you could look for the first or last missing element (use a stored procedure to perform a x+1 check just like you would to find a missing element in an integer sequence), but it's not going to provide any special result for rolled-over lists.
Since I don't feel like writing code to increment letters, I'd create a table of all valid IDs (AAAAAA through ZZZZZZ) with an integer from 1 to X for those IDs. Then you can use the following:
SELECT #max_id = MAX(id) FROM Possible_Silly_IDs
SELECT
COALESCE(MAX(PSI2.silly_id), 'AAAAAA')
FROM
My_Table T1
INNER JOIN Possible_Silly_IDs PSI1 ON
PSI1.silly_id = T1.silly_id
INNER JOIN Possible_Silly_IDs PSI2 ON
PSI2.id = CASE WHEN PSI1.id = #max_id THEN 1 ELSE PSI1.id + 1 END
LEFT OUTER JOIN My_Table T2 ON
T2.silly_id = PSI2.silly_id
WHERE
T2.silly_id IS NULL
The COALESCE is there in case the table is empty. To be truly robust you should calculate the 'AAAAAA' (SELECT #min_silly_id = silly_id WHERE id = 1) in case your "numbering" algorithm changes.
If you really wanted to do things right, you'd redo the database design as has been suggested.
I think the lowest-impact solution for my needs is to add an identity column. The one thing I can guarantee is that the ordering will be such that entries that should "come first" will be added first -- I'll never add one with identifier BBBB, then go back and add BBBA later. If I didn't have that constraint, obviously it wouldn't work, but as it stands, I can just order by the identity column and get the sort I want.
I'll keep thinking about the other suggestions -- maybe if they "click" in my head, they'll look like a better option.
To return the next ID for a given ID (with rollover), use:
SELECT COALESCE
(
(
SELECT TOP 1 id
FROM mytable
WHERE id > #id
ORDER BY
id
),
(
SELECT TOP 1 id
FROM mytable
ORDER BY
id
)
) AS nextid
This query searches for the ID next to the given. If there is no such ID, it returns the first ID.
Here are the results:
WITH mytable AS
(
SELECT 'AAA' AS id
UNION ALL
SELECT 'BBB' AS id
UNION ALL
SELECT 'CCC' AS id
UNION ALL
SELECT 'DDD' AS id
UNION ALL
SELECT 'EEE' AS id
)
SELECT mo.id,
COALESCE
(
(
SELECT TOP 1 id
FROM mytable mi
WHERE mi.id > mo.id
ORDER BY
id
),
(
SELECT TOP 1 id
FROM mytable mi
ORDER BY
id
)
) AS nextid
FROM mytable mo
id nextid
----- ------
AAA BBB
BBB CCC
CCC DDD
DDD EEE
EEE AAA
, i. e. it returns BBB for AAA, CCC for BBB, etc., and, finally, AAA for EEE which is last in the table.