how to update xml datatype column in postgresql - sql

PostgreSQL Version 9.1,
i have a table,
xmltest=# \d xmltest
Table "public.xmltest"
Column | Type | Modifiers
---------+---------+-----------
id | integer | not null
xmldata | xml |
Indexes:
"xmltest_pkey" PRIMARY KEY, btree (id)
xmltest=# select * from xmltest;
id | xmldata
----+---------------------------------------
1 | <root> +
| <child1>somedata for child1 </child1>+
| <child2>somedata for child2 </child2>+
| </root>
(1 row)
now how to update the value inside the element/tag child2,
i don't prefer to update the whole column at once,
is their a way to do update/add/delete that particular value of tag, if so please share :)

PostgreSQL XML functions are aimed a producing and processing XML, not so much at manipulating it, I am afraid.
You can extract values with xpath(), there are a number of functions to build XML, but I would not know of built-in functionality to update elements inside a given XML value.

Related

Generating a decrementing ID while inserting data on a Teradata table

I'm trying to insert data from a query (or a volatile table) to another table which has a id column ( with only type smallint and not null constraint) which should be unique, on Teradata using teradata SQL Assistant the min(id) = -5 and i should insert the new data with a lower id.
This is a simple example:
table a
id| aa |bb
-3|text |text_2
-5|text_3|text_4
and the data i should insert is for example :
aa | bb
text_5|text_6
text_7|text_8
text_9|text_10
so the result should be like
id| aa |bb
-3|text |text_2
-5|text_3|text_4
-6|text_5|text_6
-7|text_7|text_8
-8|text_9|text_10
I tried to pass by creating volatile table with a generated id start by -5 increment by -1 no cycle.
But I get an error
Expected something like a name or a unicode delimited identifier or a cycle keyword between an integer and ','
There is any other way to do it please ?

Distance calculation in a trigger postgresql

I've got two tables "modulo1_cella" and "modulo2_campionamento".
The first, "modulo1_cella" contains polygons, while the latter, "modulo2_campionamento", contains points (samples). Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
Table "public.modulo1_cella"
Column | Type | Modifiers
-------------------+-------------------+------------------------------------------------------------------
cella_id | integer | not null default nextval('modulo1_cella_cella_id_seq'::regclass)
nome_cella | character varying |
geometria | geometry |
campione_id | integer |
dist_camp | double precision |
Table "public.modulo2_campionamento"
Column | Type | Modifiers
--------------------------+-----------------------------+----------------------------------------------------------------------------------
campione_id | integer | not null default nextval('modulo2_campionamento_aria_campione_id_seq'::regclass)
x_campionamento | double precision |
y_campionamento | double precision |
codice_campione | character varying(10) |
cella_id | integer |
geometria | geometry(Point,4326) |
I'm looking for an INSERT/UPDATE trigger that for each row of "modulo1_cella" table, i.e. for each polygon, returns:
the nearest sample, "campione_id";
the corrisponding distance, "dist_camp".
I created a query that works, but I'm not able to convert it to a trigger.
CREATE TEMP TABLE TemporaryTable
(
cella_id int,
campione_id int,
distanza double precision
);
INSERT INTO TemporaryTable(cella_id, campione_id, distanza)
SELECT
DISTINCT ON (m1c.cella_id) m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography) as dist
FROM modulo1_cella As m1c, modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria::geography, m1c.geometria::geography, 50000)
ORDER BY m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography);
UPDATE modulo1_cella as mc
SET campione_id=tt.campione_id, dist_camp=tt.distanza
from TemporaryTable as tt
where tt.cella_id=mc.cella_id;
DROP TABLE TemporaryTable;
Any help? Thank you in advance.
First, if "geometria" is not geography and is instead geometry, you should make it a geography type on the table.
ALTER TABLE modulo2_campionamento
ALTER COLUMN geometria
SET DATE TYPE geography(POINT 4326)
USING (geometria::geography);
ALTER TABLE modulo1_cella
ALTER COLUMN geometria
SET DATA TYPE geography(4326)
USING (geometria::geography);
Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
You would not normally do this because it's very fast to find the nearest sample using a KNN search anyway.
CREATE INDEX ON modulo1_cella USING gist (geometria);
CREATE INDEX ON modulo2_campionamento USING gist (geometria);
VACUUM FULL ANALYZE modulo1_cella;
VACUUM FULL ANALYZE modulo2_campionamento;
SELECT *
FROM modulo1_cella As m1c
CROSS JOIN LATERAL (
SELECT *
FROM modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria, m1c.geometria, 50000)
ORDER BY m2cmp.geometria <-> m1c.geometria,
m1c.cella_id,
m2cmp.campione_id
FETCH FIRST ROW ONLY
) AS closest_match
That's much faster than the DISTINCT ON query you wrote.
If that is fast enough, I suggest using a VIEW. If that's not fast enough, I suggest using a MATERIALIZE VIEW. If it's still not fast enough you have a very niche load and it may be worth investigating a solution with triggers. But only then.

Report Query - variable nr of columns

I have a table which "indexes" the captured documents for an employer's employees. Each document has got a unique ID, a URI, date captured, staffid (FK) and a document type column named "tag"
gv2=# \d staffdoc
Table "zoozooland_789166.staffdoc"
Column | Type | Modifiers
------------+-----------------------------+-----------------------
id | uuid | not null
asseturi | text | not null
created | timestamp without time zone | not null
created_by | uuid | not null
tag | text | not null
is_active | boolean | not null default true
staffid | uuid | not null
Foreign-key constraints:
"staffdoc_staffid_fkey" FOREIGN KEY (staffid) REFERENCES staffmember(id)
Inherits: assetdoc
I want to write a report that will be used to highlight missing documents.
For each staffmember the report should have a column for each document tag type, so the nr of columns is unknown up front.
Currently I do all of this in the application - generate a list of all possible tags (SELECT DISTINCT tag FROM table), generate a list of all possible staff-IDs, then for each staff ID I run multiple queries to get the document with the biggest value in the created column for each tag value.
I'm pretty sure I should at least be able to optimise it to one query per document type (tag value) (most recent document for each staff id) which would be a good-enough optimisation.
The typical scenario is 4 or 5 document tag values (document types) so running 5 queries is much more acceptable that running 5 X nr-of-staff queries.
In the final report I have the following columns:
Staff-member Name, doctype1, doctype2, doctype2, etc
The name is "joined" from the staffmember table. The value in the doctype columns is the LAST (MAX) value of dates for that doc tag for that staff member, or "None" if the document is missing for that staffmember.
FWIW I'm using Postgres 9.5

Add dynamic columns to table after appending word to column name and checking for existing column?

I am using SQl-server, and have been working on this for a while for work but running into a lot of issues.
I started with a table “Logs”
| ChangeID | UserID | LogDate | Status | Fields
| 123 | 001 | 7-12-12 | Open | (raw data)
| 456 | 001 | 7-9-14 | Complete | (raw data)
| 789 | 002 | 5-8-15 | Open | (raw data)
The column “Fields” contains data from a form in JSON format. Basically, it contains a field name, a before value and an after value.
For every row in Fields, I am able to parse the JSON in order to get a temporary table #fieldTable. So for example, one row of raw data in the Fields column would produce the following table:
|Field |Before |After
|User |ZZZ |YYY
|requestDue |7-2-13 |7-5-14
|Assigned |No |Yes
There can be any number of values for Field, and the names of the fields are not known beforehand.
What I need is for there to be a final table which combines all of the temporary tables generated with the field values as new columns, like this:
| ChangeID | UserID | LogDate | Status | Fields | UserBefore | UserAfter | requestDueBefore | requestDueAfter | … |
where, if the same field name appeared in two different rows of the JSON (and consequently the table that formats the data from it), then a new column won’t be added but rather just the data will be updated. So, for example, if the row with ChangeID 123 had the raw data
[{“field":"reqId", “before”: “000”,"after":"111"},{"field":"affected",”before”:no,"after":"yes"},{"field":"application",”before”:xxx,"after":"yyy"}]
and the ChangeID 789 had the raw data in its Fields row as
[{“field":"attachments", “before”: “null”,"after":"zzzzzzz"},{"field":"affected",”before”:no,"after":"yes}]
, then because the field “affected” from ChangeID123 should result in columns affectedBefore and affectedAfter to the final table, when this field is seen again from Change789, new columns won’t be added.
If there is no data for some column of a particular row, it should just be null.
The way I thought to do this was to first try to dynamically pivot the temporary tables when they are generated so that I get the following result for the before results
|User |requestDue |Assigned
|ZZZ |7-2-13 |No
and another for the after results
|User |requestDue |Assigned
|YYY |7-5-14 |Yes
by using the following code:
declare #cols as nvarchar(max),
#query as nvarchar(max)
select #cols = stuff((select ',' + QUOTENAME(field)
from #fieldTable
group by field--, id
--order by id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = N'SELECT ' + #cols + N' from
(
select before, field
from #fieldTable
) x
pivot
(
max(before)
for field in (' + #cols + N')
) p '
exec sp_executesql #query;
(and then the same thing again with different variables for the result with the after values)
I think that there may be some way to dynamically concatenate the dynamic column name with a “before” or “after”, and then to somehow add these columns to a final table outside of the scope of the procedure. However, I’m not sure how, and I’m also not sure if this is even the best approach to the problem. I tried to use aliases, but I think you need to know the column name to make that work, and the same goes for altering a table to add more columns.
Also, I have seen that for many sort of similar issues, people were advised to user openrowset, which I am unable to use.
This answer belongs better in a comment as it doesn't directly address your question - however, I don't have reputation yet to directly comment.
For what you are describing, you might consider a SQLXML data field. SQLXML allows you to store documents of arbitrary schema and SQL-Server natively supports queries against it.
It would allow to avoid what I think you are already finding to be considerable complexity with trying to dynamically create schemas based on models that are not foreknown.
Hope that helps.

Making PostgreSQL a little more error tolerant?

This is sort of a general question that has come up in several contexts, the example below is representative but not exhaustive. I am interested in any ways of learning to work with Postgres on imperfect (but close enough) data sources.
The specific case -- I am using Postgres with PostGIS for working with government data published in shapefiles and xml. Using the shp2pgsql module distributed with PostGIS (for example on this dataset) I often get schema like this:
Column | Type |
------------+-----------------------+-
gid | integer |
st_fips | character varying(7) |
sfips | character varying(5) |
county_fip | character varying(12) |
cfips | character varying(6) |
pl_fips | character varying(7) |
id | character varying(7) |
elevation | character varying(11) |
pop_1990 | integer |
population | character varying(12) |
name | character varying(32) |
st | character varying(12) |
state | character varying(16) |
warngenlev | character varying(13) |
warngentyp | character varying(13) |
watch_warn | character varying(14) |
zwatch_war | bigint |
prog_disc | bigint |
zprog_disc | bigint |
comboflag | bigint |
land_water | character varying(13) |
recnum | integer |
lon | numeric |
lat | numeric |
the_geom | geometry |
I know that at least 10 of those varchars -- the fips, elevation, population, etc., should be ints; but when trying to cast them as such I get errors. In general I think I could solve most of my problems by allowing Postgres to accept an empty string as a default value for a column -- say 0 or -1 for an int type -- when altering a column and changing the type. Is this possible?
If I create the table before importing with the type declarations generated from the original data source, I get better types than with shp2pgsql, and can iterate over the source entries feeding them to the database, discarding any failed inserts. The fundamental problem is that if I have 1% bad fields, evenly distributed over 25 columns, I will lose 25% of my data since a given insert will fail if any field is bad. I would love to be able to make a best-effort insert and fix any problems later, rather than lose that many rows.
Any input from people having dealt with similar problems is welcome -- I am not a MySQL guy trying to batter PostgreSQL into making all the same mistakes I am used to -- just dealing with data I don't have full control over.
Could you produce a SQL file from shp2pgsql and do some massaging of the data before executing it? If the data is in COPY format, it should be easy to parse and change "" to "\N" (insert as null) for columns.
Another possibility would be to use shp2pgsql to load the data into a staging table where all the fields are defined as just 'text' type, and then use an INSERT...SELECT statement to copy the data to your final location, with the possibility of massaging the data in the SELECT to convert blank strings to null etc.
I don't think there's a way to override the behaviour of how strings are converted to ints and so on: possibly you could create your own type or domain, and define an implicit cast that was more lenient... but this sounds pretty nasty, since the types are really just artifacts of how your data arrives in the system and not something you want to keep around after that.
You asked about fixing it up when changing the column type: you can do that too, for example:
steve#steve#[local] =# create table test_table(id serial primary key, testvalue text not null);
NOTICE: CREATE TABLE will create implicit sequence "test_table_id_seq" for serial column "test_table.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
CREATE TABLE
steve#steve#[local] =# insert into test_table(testvalue) values('1'),('0'),('');
INSERT 0 3
steve#steve#[local] =# alter table test_table alter column testvalue type int using case testvalue when '' then 0 else testvalue::int end;
ALTER TABLE
steve#steve#[local] =# select * from test_table;
id | testvalue
----+-----------
1 | 1
2 | 0
3 | 0
(3 rows)
Which is almost equivalent to the "staging table" idea I suggested above, except that now the staging table is your final table. Altering a column type like this requires rewriting the entire table anyway: so actually, using a staging table and reformatting multiple columns at once is likely to be more efficient.