How to md5 all columns regardless of type - sql

I would like to create a sql query (or plpgsql) that will md5() all given rows regardless of type. However, below, if one is null then the hash is null:
UPDATE thetable
SET hash = md5(accountid || accounttype || createdby || editedby);
I am later using the hash to compare uniqueness so null hash does not work for this use case.
The problem was the way it handles concatenating nulls. For example:
thedatabase=# SELECT accountid || accounttype || createdby || editedby
FROM thetable LIMIT 5;
1Type113225
<NULL>
2Type11751222
3Type10651010
4Type10651
I could use coalesce or CASE statements if I knew the type; however, I have many tables and I will not know the type ahead of time of every column.

There is much more elegant solution for this.
In Postgres, using table name in SELECT is permitted and it has type ROW. If you cast this to type TEXT, it gives all columns concatenated together in string that is actually JSON.
Having this, you can get md5 of all columns as follows:
SELECT md5(mytable::TEXT)
FROM mytable
If you want to only use some columns, use ROW constructor and cast it to TEXT:
SELECT md5(ROW(col1, col2, col3)::TEXT)
FROM mytable
Another nice property about this solution is that md5 will be different for NULL vs. empty string.
Obligatory SQLFiddle.

You can also use something else similar to mvp's solution. Instead of using ROW() function which is not supported by Amazon Redshift...
Invalid operation: ROW expression, implicit or explicit, is not supported in target list;
My proposition is to use NVL2 and CAST function to cast different type of columns to CHAR, as long as this type is compatible with all Redshift data types according to the documentation. Below there is an example of how to achieve null proof MD5 in Redshift.
SELECT md5(NVL2(col1,col1::char,''),
NVL2(col2,col2::char,''),
NVL2(col3,col3::char,''))
FROM mytable
This might work without casting second NVL2 function argument to char but it would definately fail if you'd try to get md5 from date column with null value.
I hope this would be helpful for someone.

Have you tried using CONCAT()? I just tried in my PG 9.1 install:
SELECT CONCAT('aaaa',1111,'bbbb'); => aaaa1111bbbb
SELECT CONCAT('aaaa',null,'bbbb'); => aaaabbbb
Therefore, you can try:
SELECT MD5(CONCAT(column1, column2, column3, column_n)) => md5_hash string here

select MD5(cast(p as text)) from fiscal_cfop as p

Related

Invalid token error when using jsonb_insert in postgresql

As a little bit of background. I want to fill a column with jsonb values using values from other columns. Initially, I used this query:
UPDATE myTable
SET column_name =
row_to_json(rowset)
FROM (SELECT column1, column2 FROM myTable)rowset
However, this query seems to run for way too long (a few hours before I stopped it) on a dataset with 9 million records. So I looking for a solution with the second FROM clause and found the jsonb_insert function. To test the query I first ran this sample query:
SELECT jsonb_insert('{}','{column1}','500000')
Which gives {'column1':500000} as output. Perfect, so I tried to fill the value using the actual column:
SELECT jsonb_insert('{}':,'{column1}',column1) FROM myTable WHERE id = <test_id>
This gives a syntax error and a suggestion to add argument types, which leads me to the following:
SELECT jsonb_insert('{}':,'{column1}','column1')
FROM myTable WHERE id = <test_id>
SELECT jsonb_insert('{}'::jsonb,'{column1}'::jsonb,column1::numeric(8,0))
FROM myTable WHERE id = <test_id>
Both these queries give invalid input type syntax error, with Token 'column1' is invalid.
I really can not seem to find the correct syntax for these queries using documentation. Does anyone know what the correct syntax would be?
Because jsonb_insert function might need to use jsonb type for the new_value parameter
jsonb_insert(target jsonb, path text[], new_value jsonb [, insert_after boolean])
if we want to get number type of JSON, we can try to cast the column as string type before cast jsonb
if we want to get a string type of JSON, we can try to use concat function with double-quotes sign.
CREATE TABLE myTable (column1 varchar(50),column2 int);
INSERT INTO myTable VALUES('column1',50000);
SELECT jsonb_insert('{}','{column1}',concat('"',column1,'"')::jsonb) as JsonStringType,
jsonb_insert('{}','{column2}',coalesce(column2::TEXT,'null')::jsonb) as JsonNumberType
FROM myTable
sqlfiddle
Note
if our column value might be null we can try to put 'null' for coalesce function as coalesce(column2::TEXT,'null').

SQL Server: Best way to concatenate multiple columns?

I am trying to concatenate multiple columns in a query in SQL Server 11.00.3393.
I tried the new function CONCAT() but it's not working when I use more than two columns.
So I wonder if that's the best way to solve the problem:
SELECT CONCAT(CONCAT(CONCAT(COLUMN1,COLUMN2),COLUMN3),COLUMN4) FROM myTable
I can't use COLUMN1 + COLUMN2 because of NULL values.
EDIT
If I try SELECT CONCAT('1','2','3') AS RESULT I get an error
The CONCAT function requires 2 argument(s)
Through discourse it's clear that the problem lies in using VS2010 to write the query, as it uses the canonical CONCAT() function which is limited to 2 parameters. There's probably a way to change that, but I'm not aware of it.
An alternative:
SELECT '1'+'2'+'3'
This approach requires non-string values to be cast/converted to strings, as well as NULL handling via ISNULL() or COALESCE():
SELECT ISNULL(CAST(Col1 AS VARCHAR(50)),'')
+ COALESCE(CONVERT(VARCHAR(50),Col2),'')
SELECT CONCAT(LOWER(LAST_NAME), UPPER(LAST_NAME)
INITCAP(LAST_NAME), HIRE DATE AS ‘up_low_init_hdate’)
FROM EMPLOYEES
WHERE HIRE DATE = 1995
Try using below:
SELECT
(RTRIM(LTRIM(col_1))) + (RTRIM(LTRIM(col_2))) AS Col_newname,
col_1,
col_2
FROM
s_cols
WHERE
col_any_condition = ''
;
Blockquote
Using concatenation in Oracle SQL is very easy and interesting. But don't know much about MS-SQL.
Blockquote
Here we go for Oracle :
Syntax:
SQL> select First_name||Last_Name as Employee
from employees;
Result: EMPLOYEE
EllenAbel
SundarAnde
MozheAtkinson
Here AS: keyword used as alias.
We can concatenate with NULL values.
e.g. : columnm1||Null
Suppose any of your columns contains a NULL value then the result will show only the value of that column which has value.
You can also use literal character string in concatenation.
e.g.
select column1||' is a '||column2
from tableName;
Result: column1 is a column2.
in between literal should be encolsed in single quotation. you cna exclude numbers.
NOTE: This is only for oracle server//SQL.
for anyone dealing with Snowflake
Try using CONCAT with multiple columns like so:
SELECT
CONCAT(col1, col2, col3) AS all_string_columns_together
, CONCAT(CAST(col4 AS VARCHAR(50), col1) AS string_and_int_column
FROM table
If the fields are nullable, then you'll have to handle those nulls. Remember that null is contagious, and concat('foo', null) simply results in NULL as well:
SELECT CONCAT(ISNULL(column1, ''),ISNULL(column2,'')) etc...
Basically test each field for nullness, and replace with an empty string if so.

Using UPCASE or Regexp in Array Column on Postgres

I am trying to query a Postgres Array Column disregarding case and perhaps even disregarding spaces as well.
SELECT "cats".* FROM "cats" WHERE ('CATS - PERSA' = ANY(UPCASE(cat_types))) ORDER BY "cats"."id" ASC LIMIT 1;
But I get this error:
You might need to add explicit type casts.
AS a bonus I would like to also be able to do a regexp where the search ignores spaces in values on the cat_types column.
I am using Ruby on Rails to do this.
cat_type.upcase.delete(' ')
Cats.where("'#{cat_type}' = ANY(cat_types)").first
The query works just using ANY but I want to be able to disregard spaces and upcase the values in cat_types so that it has more chances of matching. Ilike could also be a possibility.
Thanks.
SELECT DISTINCT c.*
FROM cats c, unnest(c.cat_types) AS cat_type
WHERE upper(translate(cat_type, ' ', '')) = 'CATS-PERSA'
ORDER BY id
LIMIT 1;
The Postgres function is upper(), not upcase().
cat_types seems to be an array, assuming type text[] (info missing). I unnest() to treat array elements individually. This cannot be done with ANY, which is only good for simple comparison.
I use an implicit LATERAL JOIN here, requires Postgres 9.3+ (info missing).
If multiple array elements match, you get the row multiple times here. Hence the DISTINCT.
More about pattern-matching in Postgres:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Can SQL SUM() function take an expression as argument?

I'm using SQLite database and I'm wondering whether I'm allowed to write queries as follows:
SELECT SUM(column1 * column2)
FROM my_table;
I googled but references say that SUM function is has the following format:
SUM([DISTINCT|ALL] column)
And my question is: does column mean actually column or does it allow expressions (like above) too?
You can always use a table expression:
SELECT SUM(Calc)
FROM (
SELECT Column1 * Column2 AS 'Calc'
FROM My_Table) t
I don't have SQLite but checking the docs indicates this should work fine.
Yes, you can use an expression like the one you mentioned, if the datatype of both columns allow it.

Invalid Number Error! Can't seem to get around it

Oracle 10g DB. I have a table called s_contact. This table has a field called person_uid. This person_uid field is a varchar2 but contains valid numbers for some rows and in-valid numbers for other rows. For instance, one row might have a person_uid of '2-lkjsdf' and another might be 1234567890.
I want to return just the rows with valid numbers in person_uid. The SQL I am trying is...
select person_uid
from s_contact
where decode(trim(translate(person_uid, '1234567890', ' ')), null, 'n', 'c') = 'n'
The translate replaces all numbers with spaces so that a trim will result in null if the field only contained numbers. Then I use a decode statement to set a little code to filter on. n=number, c=char.
This seems to work when I run just a preview, but I get an 'invalid number' error when I add a filter of...
and person_uid = 100
-- or
and to_number(person_uid) = 100
I just don't understand what is happening! It should be filtering out all the records that are invalid numbers and 100 is obviously a number...
Any ideas anyone? Greatly Appreciated!
Unfortunately, the various subquery approaches that have been proposed are not guaranteed to work. Oracle is allowed to push the predicate into the subquery and then evaluate the conditions in whatever order it deems appropriate. If it happens to evaluate the PERSON_UID condition before filtering out the non-numeric rows, you'll get an error. Jonathan Gennick has an excellent article Subquery Madness that discusses this issue in quite a bit of detail.
That leaves you with a few options
1) Rework the data model. It's generally not a good idea to store numbers in anything other than a NUMBER column. In addition to causing this sort of issue, it has a tendency to screw up the optimizer's cardinality estimates which leads to less than ideal query plans.
2) Change the condition to specify a string value rather than a number. If PERSON_UID is supposed to be a string, your filter condition could be PERSON_UID = '100'. That avoids the need to perform the implicit conversion.
3) Write a custom function that does the string to number conversion and ignores any errors and use that in your code, i.e.
CREATE OR REPLACE FUNCTION my_to_number( p_arg IN VARCHAR2 )
RETURN NUMBER
IS
BEGIN
RETURN to_number( p_arg );
EXCEPTION
WHEN others THEN
RETURN NULL;
END;
and then my_to_number(PERSION_UID) = 100
4) Use a subquery that prevents the predicate from being pushed. This can be done in a few different ways. I personally prefer throwing a ROWNUM into the subquery, i.e. building on OMG Ponies' solution
WITH valid_persons AS (
SELECT TO_NUMBER(c.person_uid) 'person_uid',
ROWNUM rn
FROM S_CONTACT c
WHERE REGEXP_LIKE(c.personuid, '[[:digit:]]'))
SELECT *
FROM valid_persons vp
WHERE vp.person_uid = 100
Oracle can't push the vp.person_uid = 100 predicate into the subquery here because doing so would change the results. You could also use hints to force the subquery to be materialized or to prevent predicate pushing.
Another alternative is to combine the predicates:
where case when translate(person_uid, '1234567890', ' ')) is null
then to_number(person_uid) end = 100
When you add those numbers to the WHERE clause it's still doing those checks. You can't guarantee the ordering within the WHERE clause. So, it still tries to compare 100 to '2-lkjsdf'.
Can you use '100'?
Another option is to apply a subselect
SELECT * FROM (
select person_uid
from s_contact
where decode(trim(translate(person_uid, '1234567890', ' ')), null, 'n', 'c') = 'n'
)
WHERE TO_NUMBER(PERSON_UID) = 100
Regular expressions to the rescue!
where regexp_like (person_uid, '^[0-9]+$')
Use the first part of your query to generate a temp table. Then query the temp table based on person_uid = 100 or whatever.
The problem is that oracle tries to convert each person_uid to an int as it gets to it due to the additional and statement in your where clause. This behavior may or may not show up in the preview depending on what records where picked.