Distance calculation in a trigger postgresql - sql

I've got two tables "modulo1_cella" and "modulo2_campionamento".
The first, "modulo1_cella" contains polygons, while the latter, "modulo2_campionamento", contains points (samples). Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
Table "public.modulo1_cella"
Column | Type | Modifiers
-------------------+-------------------+------------------------------------------------------------------
cella_id | integer | not null default nextval('modulo1_cella_cella_id_seq'::regclass)
nome_cella | character varying |
geometria | geometry |
campione_id | integer |
dist_camp | double precision |
Table "public.modulo2_campionamento"
Column | Type | Modifiers
--------------------------+-----------------------------+----------------------------------------------------------------------------------
campione_id | integer | not null default nextval('modulo2_campionamento_aria_campione_id_seq'::regclass)
x_campionamento | double precision |
y_campionamento | double precision |
codice_campione | character varying(10) |
cella_id | integer |
geometria | geometry(Point,4326) |
I'm looking for an INSERT/UPDATE trigger that for each row of "modulo1_cella" table, i.e. for each polygon, returns:
the nearest sample, "campione_id";
the corrisponding distance, "dist_camp".
I created a query that works, but I'm not able to convert it to a trigger.
CREATE TEMP TABLE TemporaryTable
(
cella_id int,
campione_id int,
distanza double precision
);
INSERT INTO TemporaryTable(cella_id, campione_id, distanza)
SELECT
DISTINCT ON (m1c.cella_id) m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography) as dist
FROM modulo1_cella As m1c, modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria::geography, m1c.geometria::geography, 50000)
ORDER BY m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography);
UPDATE modulo1_cella as mc
SET campione_id=tt.campione_id, dist_camp=tt.distanza
from TemporaryTable as tt
where tt.cella_id=mc.cella_id;
DROP TABLE TemporaryTable;
Any help? Thank you in advance.

First, if "geometria" is not geography and is instead geometry, you should make it a geography type on the table.
ALTER TABLE modulo2_campionamento
ALTER COLUMN geometria
SET DATE TYPE geography(POINT 4326)
USING (geometria::geography);
ALTER TABLE modulo1_cella
ALTER COLUMN geometria
SET DATA TYPE geography(4326)
USING (geometria::geography);
Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
You would not normally do this because it's very fast to find the nearest sample using a KNN search anyway.
CREATE INDEX ON modulo1_cella USING gist (geometria);
CREATE INDEX ON modulo2_campionamento USING gist (geometria);
VACUUM FULL ANALYZE modulo1_cella;
VACUUM FULL ANALYZE modulo2_campionamento;
SELECT *
FROM modulo1_cella As m1c
CROSS JOIN LATERAL (
SELECT *
FROM modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria, m1c.geometria, 50000)
ORDER BY m2cmp.geometria <-> m1c.geometria,
m1c.cella_id,
m2cmp.campione_id
FETCH FIRST ROW ONLY
) AS closest_match
That's much faster than the DISTINCT ON query you wrote.
If that is fast enough, I suggest using a VIEW. If that's not fast enough, I suggest using a MATERIALIZE VIEW. If it's still not fast enough you have a very niche load and it may be worth investigating a solution with triggers. But only then.

Related

Adding column to sqlite database and distribute rows based on primary key

I have some data elements containing a timestamp and information about Item X sales related to this timestamp.
e.g.
timestamp | items X sold
------------------------
1 | 10
4 | 40
7 | 20
I store this data in an SQLite table. Now I want to add to this table. Especially if I get data about another item Y.
The item Y data might or might not have different timestamps but I want to insert this data into the existing table so that it looks like this:
timestamp | items X sold | items Y sold
------------------------------------------
1 | 10 | 5
2 | NULL | 10
4 | 40 | NULL
5 | NULL | 3
7 | 20 | NULL
Later on additional sales data (columns) must be added with the same scheme.
Is there an easy way to accomplish this with SQLite?
In the end I want to fetch data by timestamp and get an overview which items were sold at this time. Most examples consider the usecase to add a complete row (one record) or a complete column if it perfectly matches to the other columns.
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
(Using pythons sqlite3 package to create and manipulate the DB)
Thanks!
Dynamically adding columns is not a good design. You could add them using
ALTER TABLE your_table ADD COLUMN the_column_name TEXT
the column, for existing rows would be populated with nulls, although you could specify a DEFAULT value and the existing rows would then be populated with that value.
e.g. the following demonstrates the above :-
DROP TABLE IF EXISTS soldv1;
CREATE TABLE IF NOT EXISTS soldv1 (timestamp INTEGER PRIAMRY KEY, items_sold_x INTEGER);
INSERT INTO soldv1 VALUES(1,10),(4,40),(7,20);
SELECT * FROM soldv1 ORDER BY timestamp;
ALTER TABLE soldv1 ADD COLUMN items_sold_y INTEGER;
UPDATE soldv1 SET items_sold_y = 5 WHERE timestamp = 1;
INSERT INTO soldv1 VALUES(2,null,10),(5,null,3);
SELECT * FROM soldv1 ORDER BY timestamp;
resulting in the first query returning :-
and the second query returning :-
However, as stated, the above is not considered a good design as the schema is dynamic.
You could alternately manage an equivalent of the above with the addition of either a new column (to also be part of the primary key) or by prefixing/suffixing the timestamp with a type.
Consider, as an example, the following :-
DROP TABLE IF EXISTS soldv2;
CREATE TABLE IF NOT EXISTS soldv2 (type TEXT, timestamp INTEGER, items_sold INTEGER, PRIMARY KEY(timestamp,type));
INSERT INTO soldv2 VALUES('x',1,10),('x',4,40),('x',7,20);
INSERT INTO soldv2 VALUES('y',1,5),('y',2,10),('y',5,3);
INSERT INTO soldv2 VALUES('z',1,15),('z',2,5),('z',9,25);
SELECT * FROM soldv2 ORDER BY timestamp;
This has replicated, data-wise, your original data and additionally added another type (column items_sold_z) without having to change the table's schema (nor having the additional complication of needing to update rather than insert as per when applying timestamp 1 items_sold_y 5).
The result from the query being :-
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
SQLite is a valid tool. What you then do with the data can probably be done as easy as in excel (perhaps simpler) and probably much simpler than trying to process the data in csv format.
For example, say you wanted the total items sold per timestamp and how many types were sold then :-
SELECT timestamp, count(items_sold) AS number_of_item_types_sold, sum(items_sold) AS total_sold FROM soldv2 GROUP by timestamp ORDER BY timestamp;
would result in :-

How can I update the table in SQL?

I've created a table called Youtuber, the code is below:
create table Channel (
codChannel int primary key,
name varchar(50) not null,
age float not null,
subscribers int not null,
views int not null
)
In this table, there are 2 channels:
|codChannel | name | age | subscribers | views |
| 1 | PewDiePie | 28 | 58506205 | 16654168214 |
| 2 | Grandtour Games | 15 | 429 | 29463 |
So, I want to edit the age of "Grandtour Games" to "18". How can I do that with update?
Is my code right?
update age from Grandtour Games where age='18'
No, in update, you'll have to follow this sequence:
update tableName set columnWanted = 'newValue' where columnName = 'elementName'
In your code, put this:
update Channel set age=18 where name='Grandtour Games'
Comments below:
/* Channel is the name of the table you'll update
set is to assign a new value to the age, in your case
where name='Grandtour Games' is referencing that the name of the Channel you want to update, is Grandtour Games */
alter table changes the the schema (adding, updating, or removing columns or keys, that kind of thing).
Update table changes the data in the table without changing the schema.
So the two are really quite different.
Here is your answer -
-> ALTER is a DDL (Data Definition Language) statement
UPDATE is a DML (Data Manipulation Language) statement.
->ALTER is used to update the structure of the table (add/remove field/index etc).
Whereas UPDATE is used to update data.
Hope this helps!

Changing a column type from integer to string

Using PostgreSQL, what's the command to migrate an integer column type to a string column type?
Obviously I'd like to preserve the data, by converting the old integer data to strings.
You can convert from INTEGER to CHARACTER VARYING out-of-the-box, all you need is ALTER TABLE query chaning column type:
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE tbl (col INT);
INSERT INTO tbl VALUES (1), (10), (100);
ALTER TABLE tbl ALTER COLUMN col TYPE CHARACTER VARYING(10);
Query 1:
SELECT col, pg_typeof(col) FROM tbl
Results:
| col | pg_typeof |
|-----|-------------------|
| 1 | character varying |
| 10 | character varying |
| 100 | character varying |
I suggest a four step process:
Create a new string column. name it temp for now. See http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Set the string column. something like update myTable set temp=cast(intColumn as text) see http://www.postgresql.org/docs/9.3/static/functions-formatting.html for more interesting number->string conversions
Make sure everything in temp looks the way you want it.
Remove your old integer column. Once again, see http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Rename temp to the old column name. Again: http://www.postgresql.org/docs/9.3/static/ddl-alter.html
This assumes you can perform the operation while no clients are connected; offline. If you need to make this (drastic) change in an online table, take a look at setting up a new table with triggers for live updates, then swap to the new table in an atomic operation. see ALTER TABLE without locking the table?

Using ALTER TABLE command in psql to add to a table

I am trying to solve this extra credit problem for my homework. So we haven't learned about this yet, but I thought I would give it a try because extra credit is always good. I am trying to write an ALTER TABLE statement to add a column to a table. The full definition is here.
Use the ALTER TABLE command to add a field to the table called rank
that is of type smallint. We’ll use this field to store a ranking of
the teams. The team with the highest points value will be ranked
number 1; the team with the second highest points value will be
ranked number 2; etc. Write a PL/pgSQL function named update rank
that updates the rank field to contain the appropriate number for
all teams. (There are both simple and complicated ways of doing this.
Think about how it can be done with very little code.) Then, define a
trigger named tr update rank that fires after an insert or update
of any of the fields {wins, draws}. This trigger should be executed
once per statement (not per row).
The table that I am using is
Table "table.group_standings"
Column | Type | Modifiers
--------+-----------------------+-----------
team | character varying(25)| not null
wins | smallint | not null
losses | smallint | not null
draws | smallint | not null
points | smallint | not null
Indexes:
"group_standings_pkey" PRIMARY KEY, btree (team)
Check constraints:
"group_standings_draws_check" CHECK (draws >= 0)
"group_standings_losses_check" CHECK (losses >= 0)
"group_standings_points_check" CHECK (points >= 0)
"group_standings_wins_check" CHECK (wins >= 0)
heres my code
ALTER TABLE group_standings ADD COLUMN rank smallint;
I need help with writing the function to rank the teams

Making PostgreSQL a little more error tolerant?

This is sort of a general question that has come up in several contexts, the example below is representative but not exhaustive. I am interested in any ways of learning to work with Postgres on imperfect (but close enough) data sources.
The specific case -- I am using Postgres with PostGIS for working with government data published in shapefiles and xml. Using the shp2pgsql module distributed with PostGIS (for example on this dataset) I often get schema like this:
Column | Type |
------------+-----------------------+-
gid | integer |
st_fips | character varying(7) |
sfips | character varying(5) |
county_fip | character varying(12) |
cfips | character varying(6) |
pl_fips | character varying(7) |
id | character varying(7) |
elevation | character varying(11) |
pop_1990 | integer |
population | character varying(12) |
name | character varying(32) |
st | character varying(12) |
state | character varying(16) |
warngenlev | character varying(13) |
warngentyp | character varying(13) |
watch_warn | character varying(14) |
zwatch_war | bigint |
prog_disc | bigint |
zprog_disc | bigint |
comboflag | bigint |
land_water | character varying(13) |
recnum | integer |
lon | numeric |
lat | numeric |
the_geom | geometry |
I know that at least 10 of those varchars -- the fips, elevation, population, etc., should be ints; but when trying to cast them as such I get errors. In general I think I could solve most of my problems by allowing Postgres to accept an empty string as a default value for a column -- say 0 or -1 for an int type -- when altering a column and changing the type. Is this possible?
If I create the table before importing with the type declarations generated from the original data source, I get better types than with shp2pgsql, and can iterate over the source entries feeding them to the database, discarding any failed inserts. The fundamental problem is that if I have 1% bad fields, evenly distributed over 25 columns, I will lose 25% of my data since a given insert will fail if any field is bad. I would love to be able to make a best-effort insert and fix any problems later, rather than lose that many rows.
Any input from people having dealt with similar problems is welcome -- I am not a MySQL guy trying to batter PostgreSQL into making all the same mistakes I am used to -- just dealing with data I don't have full control over.
Could you produce a SQL file from shp2pgsql and do some massaging of the data before executing it? If the data is in COPY format, it should be easy to parse and change "" to "\N" (insert as null) for columns.
Another possibility would be to use shp2pgsql to load the data into a staging table where all the fields are defined as just 'text' type, and then use an INSERT...SELECT statement to copy the data to your final location, with the possibility of massaging the data in the SELECT to convert blank strings to null etc.
I don't think there's a way to override the behaviour of how strings are converted to ints and so on: possibly you could create your own type or domain, and define an implicit cast that was more lenient... but this sounds pretty nasty, since the types are really just artifacts of how your data arrives in the system and not something you want to keep around after that.
You asked about fixing it up when changing the column type: you can do that too, for example:
steve#steve#[local] =# create table test_table(id serial primary key, testvalue text not null);
NOTICE: CREATE TABLE will create implicit sequence "test_table_id_seq" for serial column "test_table.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
CREATE TABLE
steve#steve#[local] =# insert into test_table(testvalue) values('1'),('0'),('');
INSERT 0 3
steve#steve#[local] =# alter table test_table alter column testvalue type int using case testvalue when '' then 0 else testvalue::int end;
ALTER TABLE
steve#steve#[local] =# select * from test_table;
id | testvalue
----+-----------
1 | 1
2 | 0
3 | 0
(3 rows)
Which is almost equivalent to the "staging table" idea I suggested above, except that now the staging table is your final table. Altering a column type like this requires rewriting the entire table anyway: so actually, using a staging table and reformatting multiple columns at once is likely to be more efficient.