Update substrings using lookup table and replace function - sql

Here's my setup:
Table 1 (table_with_info): Contains a list of varchars with substrings that I'd like to replace.
Table 2 (sub_info): Contains two columns: the substring in table_with_info that I'd like to replace and the string I'd like to replace it with.
What I'd like to do is replace all the substrings in table_with_info with their substitutions in sub_info.
This works to a point but the issue is that select replace(...) returns a new row for each one of the substituted words replaced and doesn't replace all of the ones in an individual row.
I'm explaining the best I can but I don't know if it's too clear. Here's the code an example of what's happening/what I'd like to happen.
Here's my code:
create table table_with_info
(
val varchar
);
insert into table_with_info values
('this this is test data');
create table sub_info
(
word_from varchar,
word_to varchar
);
insert into sub_info values
('this','replace1')
, ('test', 'replace2');
update table_with_info set val = (select replace("val", "word_from", "word_to")
from "table_with_info", "sub_info"
the update() function doesn't work as select() returns two rows:
Row 1: replace1 replace1 is test data
Row 2: this this is replace2 data
so what I'd like for it for the select statement to return is:
Row 1: replace1 replace1 is test data
Any thoughts? I can't create UDFs on the system I'm running.

Your UPDATE statement is incorrect in multiple ways. Consult the manual before you try to run anything like this again. You introduce two cross joins that would make this statement extremely expensive, besides yielding nonsense.
To do this properly, you need to administer each UPDATE sequentially. In a single statement, one row version eliminates the other, while each replace would use the same original row version. You can use a DO statement for this or wrap it in a plpgsql function for instance:
DO
$do$
DECLARE
r sub_info;
BEGIN
FOR r IN
TABLE sub_info
-- SELECT * FROM sub_info ORDER BY ??? -- order is relevant
LOOP
UPDATE table_with_info
SET val = replace(val, r.word_from, r.word_to)
WHERE val LIKE ('%' || r.word_from || '%'); -- avoid empty updates
END LOOP;
END
$do$;
Be aware, that the order in which updates are applied can make a difference! If the first update creates a string where the second matches (but not otherwise) ..
So, order your columns in sub_info if that can be relevant.
Avoid empty updates. Without the additional WHERE clause, you would write many new row versions without changing anything. Expensive and useless.
double-quotes are optional for legal, lower-case names.
->SQLfiddle

Expanding on Erwin's answer, a do block with dynamic SQL can do the trick as well:
do $$
declare
rec record;
repl text;
begin
repl := 'val'; -- quote_ident() this if needed
for rec in select word_from, word_to from sub_info
loop
repl := 'replace(' || repl || ', '
|| quote_literal(rec.word_from) || ', '
|| quote_literal(rec.word_to) || ')';
end loop;
-- now do them all in a single query
execute 'update ' || 'table_with_info'::regclass || ' set val = ' || repl;
end;
$$ language plpgsql;
Optionally, build a like parameter in a similar way to avoid updating rows needlessly.

Related

INSERT with dynamic column names

I have column names stored in variable colls, next I execute code:
DO $$
DECLARE
v_name text := quote_ident('colls');
BEGIN
EXECUTE 'insert into table1 select '|| colls ||' from table2 ';
-- EXECUTE 'insert into table1 select '|| v_name ||' from table2 ';
END$$;
I have got error: column "colls" does not exist. Program used colls as name not as variable. What am I doing wrong?
I have found similar example in documentation:
https://www.postgresql.org/docs/8.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
I have column names stored in variable colls
No, you don't. You have a variable v_name - which holds a single word: 'colls'. About variables in SQL:
User defined variables in PostgreSQL
Read the chapters Identifiers and Key Words and Constants in the manual.
And if you had multiple column names in a single variable, you could not use quote_ident() like that. It would escape the whole string as a single identifier.
I guess the basic misunderstanding is this: 'colls' is a string constant, not a variable. There are no other variables in a DO statement than the ones you declare in the DECLARE section. You might be looking for a function that takes a variable number of column names as parameter(s) ...
CREATE OR REPLACE FUNCTION f_insert_these_columns(VARIADIC _cols text[])
RETURNS void AS
$func$
BEGIN
EXECUTE (
SELECT 'INSERT INTO table1 SELECT '
|| string_agg(quote_ident(col), ', ')
|| ' FROM table2'
FROM unnest(_cols) col
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT f_insert_these_columns('abd', 'NeW Deal'); -- column names case sensitive!
SELECT f_insert_these_columns(VARIADIC '{abd, NeW Deal}'); -- column names case sensitive!
Note how I unnest the array of column names and escape them one by one.
A VARIADIC parameter should be perfect for your use case. You can either pass a list of column names or an array.
Either way, be vary of SQL injection.
Related, with more explanation:
Pass multiple values in single parameter
Table name as a PostgreSQL function parameter

PostgreSQL Function returning result set from dynamic tables names

In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.
I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:
SELECT * FROM employee EXCEPT SELECT * FROM bak_employee
This will show me the inserted and updated records. I'll ignore the deleted records for now.
Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF diff_tables AS
$BODY$
DECLARE
app_tables text;
BEGIN
FOR app_tables IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name not like 'bak_%' -- exclude existing backup tables
LOOP
-- somehow loop through tables to see what's changed something like:
EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
But obviously this isn't going to return me any useful information. Any help would be appreciated.
You cannot return various well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type.
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF text AS -- text!!
$func$
DECLARE
app_table text;
BEGIN
FOR app_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name NOT LIKE 'bak_%' -- exclude existing backup tables
LOOP
RETURN NEXT ' ';
RETURN NEXT '=== ' || app_table || ' ===';
RETURN QUERY EXECUTE format(
'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
, app_table, 'bak_' || app_table);
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.show_diff();
I had the test suggested by #a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.
While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.
Using EXCEPT clause in PostgreSQL
TABLE is just syntactical sugar.
Is there a shortcut for SELECT * FROM in psql?
However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.
Crucial element is the cast the row type to text (x::text).
You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Update multiple columns that start with a specific string

I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function

How can I get a hash of an entire table in postgresql?

I would like a fairly efficient way to condense an entire table to a hash value.
I have some tools that generate entire data tables, which can then be used to generate further tables, and so on. I'm trying to implement a simplistic build system to coordinate build runs and avoid repeating work. I want to be able to record hashes of the input tables so that I can later check whether they have changed. Building a table takes minutes or hours, so spending several seconds building hashes is acceptable.
A hack I have used is to just pipe the output of pg_dump to md5sum, but that requires transferring the entire table dump over the network to hash it on the local box. Ideally I'd like to produce the hash on the database server.
Finding the hash value of a row in postgresql gives me a way to calculate a hash for a row at a time, which could then be combined somehow.
Any tips would be greatly appreciated.
Edit to post what I ended up with: tinychen's answer didn't work for me directly, because I couldn't use 'plpgsql' apparently. When I implemented the function in SQL instead, it worked, but was very inefficient for large tables. So instead of concatenating all the row hashes and then hashing that, I switched to using a "rolling hash", where the previous hash is concatenated with the text representation of a row and then that is hashed to produce the next hash. This was much better; apparently running md5 on short strings millions of extra times is better than concatenating short strings millions of times.
create function zz_concat(text, text) returns text as
'select md5($1 || $2);' language 'sql';
create aggregate zz_hashagg(text) (
sfunc = zz_concat,
stype = text,
initcond = '');
I know this is old question, however this is my solution:
SELECT
md5(CAST((array_agg(f.* order by id))AS text)) /* id is a primary key of table (to avoid random sorting) */
FROM
foo f;
SELECT md5(array_agg(md5((t.*)::varchar))::varchar)
FROM (
SELECT *
FROM my_table
ORDER BY 1
) AS t
just do like this to create a hash table aggregation function.
create function pg_concat( text, text ) returns text as '
begin
if $1 isnull then
return $2;
else
return $1 || $2;
end if;
end;' language 'plpgsql';
create function pg_concat_fin(text) returns text as '
begin
return $1;
end;' language 'plpgsql';
create aggregate pg_concat (
basetype = text,
sfunc = pg_concat,
stype = text,
finalfunc = pg_concat_fin);
then you could use the pg_concat function to caculate the table's hash value.
select md5(pg_concat(md5(CAST((f.*)AS text)))) from f order by id
I had a similar requirement, to use when testing a specialized table replication solution.
#Ben's rolling MD5 solution (which he appended to the question) seems quite efficient, but there were a couple of traps which tripped me up.
The first (mentioned in some of the other answers) is that you need to ensure that the aggregate is performed in a known order over the table you are checking. The syntax for that is eg.
select zz_hashagg(CAST((example.*)AS text) order by id) from example;
Note the order by is inside the aggregate.
The second is that using CAST((example.*)AS text will not give identical results for two tables with the same column contents unless the columns were created in the same order. In my case that was not guaranteed, so to get a true comparison I had to list the columns separately, for example:
select zz_hashagg(CAST((example.id, example.a, example.c)AS text) order by id) from example;
For completeness (in case a subsequent edit should remove it) here is the definition of the zz_hashagg from #Ben's question:
create function zz_concat(text, text) returns text as
'select md5($1 || $2);' language 'sql';
create aggregate zz_hashagg(text) (
sfunc = zz_concat,
stype = text,
initcond = '');
Tomas Greif's solution is nice. But for huge enough table invalid memory alloc request size error will occur. So, it can be overcome with 2 options.
Option 1. Without batches
If the table is not big enough use string_agg and bytea data type.
select
md5(string_agg(c.row_hash, '' order by c.row_hash)) table_hash
from
foo f
cross join lateral(select ('\x' || md5(f::text))::bytea row_hash) c
;
Option 2. With batches
If the query in previous option ends with error like
SQL Error [54000]: ERROR: out of memory
Detail: Cannot enlarge string buffer containing 1073741808 bytes by 16 more bytes.
the row count limit is 1073741808 / 16 = 67108863 and the table should be divided to batches.
select
md5(string_agg(t.batch_hash, '' order by t.batch_hash)) table_hash
from(
select
md5(string_agg(c.row_hash, '' order by c.row_hash)) batch_hash
from
foo f
cross join lateral(select ('\x' || md5(f::text))::bytea row_hash) c
group by substring(row_hash for 3)
) t
;
Where 3 in group by clause divides row hashes to 16 777 216 batches (2: 65 536, 1: 256). Also other batching methods (e.g. strictly ntile) will work.
P.S. If you need to compare two tables this post may help.
Great answers.
In case by any means someone required not to use aggregation functions but maintaining support for tables sized several GiB, you can use this that has little performance penalties over the best answers in the case of largest tables.
CREATE OR REPLACE FUNCTION table_md5(
table_name CHARACTER VARYING
, VARIADIC order_key_columns CHARACTER VARYING [])
RETURNS CHARACTER VARYING AS $$
DECLARE
order_key_columns_list CHARACTER VARYING;
query CHARACTER VARYING;
first BOOLEAN;
i SMALLINT;
working_cursor REFCURSOR;
working_row_md5 CHARACTER VARYING;
partial_md5_so_far CHARACTER VARYING;
BEGIN
order_key_columns_list := '';
first := TRUE;
FOR i IN 1..array_length(order_key_columns, 1) LOOP
IF first THEN
first := FALSE;
ELSE
order_key_columns_list := order_key_columns_list || ', ';
END IF;
order_key_columns_list := order_key_columns_list || order_key_columns[i];
END LOOP;
query := (
'SELECT ' ||
'md5(CAST(t.* AS TEXT)) ' ||
'FROM (' ||
'SELECT * FROM ' || table_name || ' ' ||
'ORDER BY ' || order_key_columns_list ||
') t');
OPEN working_cursor FOR EXECUTE (query);
-- RAISE NOTICE 'opened cursor for query: ''%''', query;
first := TRUE;
LOOP
FETCH working_cursor INTO working_row_md5;
EXIT WHEN NOT FOUND;
IF first THEN
first := FALSE;
SELECT working_row_md5 INTO partial_md5_so_far;
ELSE
SELECT md5(working_row_md5 || partial_md5_so_far)
INTO partial_md5_so_far;
END IF;
-- RAISE NOTICE 'partial md5 so far: %', partial_md5_so_far;
END LOOP;
-- RAISE NOTICE 'final md5: %', partial_md5_so_far;
RETURN partial_md5_so_far :: CHARACTER VARYING;
END;
$$ LANGUAGE plpgsql;
Used as:
SELECT table_md5(
'table_name', 'sorting_col_0', 'sorting_col_1', ..., 'sorting_col_n'
);
As for the algorithm, you could XOR all the individual MD5 hashes, or concatenate them and hash the concatenation.
If you want to do this completely server-side you probably have to create your own aggregation function, which you could then call.
select my_table_hash(md5(CAST((f.*)AS text)) from f order by id
As an intermediate step, instead of copying the whole table to the client, you could just select the MD5 results for all rows, and run those through md5sum.
Either way you need to establish a fixed sort order, otherwise you might end up with different checksums even for the same data.