PGSQL Postgres Update based on numeric range on a text column - sql

I have a table with IDs, some have letters, most do not. And associated points with those IDs. The IDs are stored as text.
I would like to add points to a given range of IDS that ARE integers and give them all the same points. 535.
I have found a way using a subquery to SELECT the data I need, but it appears that updating it is another matter. I would love to be able to only get data that is CAST-able without a subquery. however since it errors out when it touches something that isn't a number, that doesn't seem to be possible.
select * from (select idstring, amount from members where idstring ~ '^[0-9]+$') x
WHERE
CAST(x.idstring AS BIGINT) >= 10137377001
and
CAST(x.idstring AS BIGINT) <= 10137377100
What am I doing ineficiently in the above, and how an I update the records that I want to?
In a perfect world my statement would be as simple as:
UPDATE members SET amount = 535
WHERE idstring >= 10137377001
AND idstring <= 10137377100
But since Idstring both contains entries that contain letters and is stored as text, it complicates things significantly. TRY_CAST would be perfect here, however there is no easy equivalent in postgres.
An example of the ids in the table might be
A52556B
36663256
6363632900B
3000525
ETC.

You can use TO_NUMBER together with your regular expression predicate, like so:
UPDATE members
SET amount = 535
WHERE idstring ~ '^[0-9]+$'
AND to_number(idstring, '999999999999') BETWEEN 10137377001 AND 10137377100
Working example on dbfiddle

You can encase the typecast and integer comparison within a CASE statement.
UPDATE members
SET amount = COALESCE(amount, 0) + 535
WHERE CASE WHEN idstring ~ '^[0-9]+$'
THEN idstring::BIGINT BETWEEN 10137377001 AND 10137377100
ELSE FALSE END;
Here I've assumed you might want to add 535 rather than setting it explicitly to 535 based on what you said above, but if my assumption is incorrect then SET amount = 535 is just fine.

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

PostgreSQL - Assign integer value to string in case statement

I need to select one and only 1 row of data based on an ID in the data I have. I thought I had solved this (For details, see my original question and my solution, here: PostgreSQL - Select only 1 row for each ID)
However, I now still get multiple values in some cases. If there is only "N/A" and 1 other value, then no problem.. but if I have multiple values like: "N/A", "value1" and "value2" for example, then my case statement is not sufficient and I get both "value1" and "value2" returned to me. This is the case statement in question:
CASE
WHEN "PQ"."Value" = 'N/A' THEN 1
ELSE 0
END
I need to give a unique integer value to each string value and then the problem will be solved. The question is: how do I do this? My first thought is to somehow convert the character values to ASCII and sum them up.. but I am not sure how to do that and also worried about performance. Is there a way to very simply assign a value to each string so that I can choose 1 value only? I don't care which one actually... just that it's only 1.
EDIT
I am now trying to create a function to add up the ASCII values of each character so I can essentially change my case statement to something like this:
CASE
WHEN "PQ"."Value" = 'N/A' THEN 9999999
ELSE SumASCII("PQ"."Value")
END
Having a small problem with it though.. I have added it as a separate question, here: PostgreSQL - ERROR: query has no destination for result data
EDIT 2
Thanks to #Bohemian, I now have a working solution, which is as follows:
CASE
WHEN "PQ"."Value" = 'N/A' THEN -1
ELSE ('x'||LPAD(MD5("PQ"."Value"),16,'0'))::bit(64)::bigint
END DESC
This will produce a "unique" number for each value:
('x'||substr(md5("PQ"."Value"),1,8))::bit(64)::bigint
Strictly speaking, there is a chance of a collision, but it's very remote.
If the result is "too big", you could try modulus:
<above-calculation> % 10000
Although collisions would then be a 0.01% chance, you should try this formula against all known values to ensure there are no collisions.
If you don't care which value gets picked, change RANK() to ROW_NUMBER(). If you do care, do it anyway, but also add another term after the CASE statement in ORDER BY, separated by a comma, with the logic you want - for example if you want the first value alphabetically, do this:
...
ORDER BY CASE...END, "PQ"."Value")
...

Search Through All Between Values SQL

I have data following data structure..
_ID _BEGIN _END
7003 99210 99217
7003 10225 10324
7003 111111
I want to look through every _BEGIN and _END and return all rows where the input value is between the range of values including the values themselves (i.e. if 10324 is the input, row 2 would be returned)
I have tried this filter but it does not work..
where #theInput between a._BEGIN and a._END
--THIS WORKS
where convert(char(7),'10400') >= convert(char(7),a._BEGIN)
--BUT ADDING THIS BREAKS AND RETURNS NOTHING
AND convert(char(7),'10400') < convert(char(7),a._END)
Less than < and greater than > operators work on xCHAR data types without any syntactical error, but it may go semantically wrong. Look at examples:
1 - SELECT 'ab' BETWEEN 'aa' AND 'ac' # returns TRUE
2 - SELECT '2' BETWEEN '1' AND '10' # returns FALSE
Character 2 as being stored in a xCHAR type has greater value than 1xxxxx
So you should CAST types here. [Exampled on MySQL - For standard compatibility change UNSIGNED to INTEGER]
WHERE CAST(#theInput as UNSIGNED)
BETWEEN CAST(a._BEGIN as UNSIGNED) AND CAST(a._END as UNSIGNED)
You'd better change the types of columns to avoid ambiguity for later use.
This would be the obvious answer...
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #theInput between a._BEGIN and a._END
If the data is string (assuming here as we don't know what DB) You could add this.
Declare #searchArg VARCHAR(30) = CAST(#theInput as VARCHAR(30));
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #searchArg between a._BEGIN and a._END
If you care about performance and you've got a lot of data and indexes you won't want to include function calls on the column values.. you could in-line this conversion but this assures that your predicates are Sargable.
SELECT * FROM myTable
WHERE
(CAST(#theInput AS char) >= a._BEGIN AND #theInput < a.END);
I also saw several of the same type of questions:
SQL "between" not inclusive
MySQL "between" clause not inclusive?
When I do queries like this, I usually try one side with the greater/less than on either side and work from there. Maybe that can help. I'm very slow, but I do lots of trial and error.
Or, use Tony's convert.
I supposed you can convert them to anything appropriate for your program, numeric or text.
Also, see here, http://technet.microsoft.com/en-us/library/aa226054%28v=sql.80%29.aspx.
I am not convinced you cannot do your CAST in the SELECT.
Nick, here is a MySQL version from SO, MySQL "between" clause not inclusive?

How do I perform a LIKE query on a PostgreSQL primary id column?

If I have a number (such as 88) and I want to perform a LIKE query in Rails on a primary ID column to return all records that contain that number at the end of the ID (IE: 88, 288, etc.), how would I do that? Here's the code to generate the result, which works fine in SQLLite:
#item = Item.where("id like ?", "88").all
In PostgreSQL, I'm running into this error:
PG::Error: ERROR: operator does not exist: integer ~~ unknown
How do I do this? I've tried converting the number to a string, but that doesn't seem to work either.
Based on Erwin's Answer:
This is a very old question, but in case someone needs it, there is one very simple answer, using ::text cast:
Item.where("(id::text LIKE ?)", "%#{numeric_variable}").all
This way, you find the number anywhere in the string.
Use % wildcard to the left only if you want the number to be at the end of the string.
Use % wildcard to the right also, if you want the number to be anywhere in the string.
Simple case
LIKE is for string/text types. Since your primary key is an integer, you should use a mathematical operation instead.
Use modulo to get the remainder of the id value, when divided by 100.
Item.where("id % 100 = 88")
This will return Item records whose id column ends with 88
1288
1488
1238872388
862388
etc...
Match against arbitrary set of final two digits
If you are going to do this dynamically (e.g. match against an arbitrary set of two digits, but you know it will always be two digits), you could do something like:
Item.where(["id % 100 = ?", last_two_digits)
Match against any set or number of final digits
If you wanted to match an arbitrary number of digits, so long as they were always the final digits (as opposed to digits appearing elsewhere in the id field), you could add a custom method on your model. Something like:
class Item < ActiveRecord
...
def find_by_final_digits(num_digits, digit_pattern)
# Where 'num_digits' is the number of final digits to match
# and `digit_pattern` is the set of final digits you're looking fo
Item.where(["id % ? = ?", 10**num_digits, digit_pattern])
end
...
end
Using this method, you could find id values ending in 88, with:
Item.find_by_final_digits(2, 88)
Match against a range of final digits, of any length
Let's say you wanted to find all id values that end with digits between 09 and 12, for whatever reason. Maybe they represent some special range of codes you're looking up. To do this you could do another custom method to use Postgres' BETWEEN to find on a range.
def find_by_final_digit_range(num_digits, start_of_range, end_of_range)
Item.where(["id % ? BETWEEN ? AND ?", 10**num_digits, start_of_range, end_of_range)
end
...and could be called using:
Item.find_by_final_digit_range(2, 9, 12)
...of course, this is all just a little crazy, and probably overkill.
The LIKE operator is for string types only.
Use the modulo operator % for what you are trying to do:
#item = Item.where("(id % 100) = ?", "88").all
I doubt it "works" in SQLite, even though it coerces the numeric types to strings. Without leading % the pattern just won't work.
-> sqlfiddle demo
Cast to text and use LIKE as you intended for arbitrary length:
#item = Item.where("(id::text LIKE ('%'::text || ?)", "'12345'").all
Or, mathematically:
#item = Item.where("(id % 10^(length(?)) = ?", "'12345'", "12345").all
LIKE operator does not work with number types and id is the number type so you can use it with concat
SELECT * FROM TABLE_NAME WHERE concat("id") LIKE '%ID%'

How do I sort a VARCHAR column in PostgreSQL that contains words and numbers?

I need to order a select query using a varchar column, using numerical and text order. The query will be done in a java program, using jdbc over postgresql.
If I use ORDER BY in the select clause I obtain:
1
11
2
abc
However, I need to obtain:
1
2
11
abc
The problem is that the column can also contain text.
This question is similar (but targeted for SQL Server):
How do I sort a VARCHAR column in SQL server that contains words and numbers?
However, the solution proposed did not work with PostgreSQL.
Thanks in advance, regards,
I had the same problem and the following code solves it:
SELECT ...
FROM table
order by
CASE WHEN column < 'A'
THEN lpad(column, size, '0')
ELSE column
END;
The size var is the length of the varchar column, e.g 255 for varying(255).
You can use regular expression to do this kind of thing:
select THECOL from ...
order by
case
when substring(THECOL from '^\d+$') is null then 9999
else cast(THECOL as integer)
end,
THECOL
First you use regular expression to detect whether the content of the column is a number or not. In this case I use '^\d+$' but you can modify it to suit the situation.
If the regexp doesn't match, return a big number so this row will fall to the bottom of the order.
If the regexp matches, convert the string to number and then sort on that.
After this, sort regularly with the column.
I'm not aware of any database having a "natural sort", like some know to exist in PHP. All I've found is various functions:
Natural order sort in Postgres
Comment in the PostgreSQL ORDER BY documentation