Firebird Database Split String on Field - sql

Currently working with a Firebird 1.5 database and attempting to pull the data in the correct format natively with SQL.
Consider the following database:
ID | Full Name
1 Jon Doe
2 Sarah Lee
What I am trying to achieve is a simple split on the full name field (space) within a query.
ID | First Name | Last Name
1 Jon Doe
2 Sarah Lee
The issue faced is Firebird POSITION() was introduced in v2.0. Is there any known workaround to split on a space that anyone has come across?
Much appreciate your assistance!

For Firebird 1.5, a solution is to find a UDF that either combines both functions, or provides the position (I don't use UDFs, so I am not sure if one already exists). If none is available, you might have to write one.
The other solution is to write a stored procedure for this functionality, see for example: Position of substring function in SP
CREATE PROCEDURE Pos (SubStr VARCHAR(100), Str VARCHAR(100))
RETURNS (Pos INTEGER) AS
DECLARE VARIABLE SubStr2 VARCHAR(201); /* 1 + SubStr-lenght + Str-length */
DECLARE VARIABLE Tmp VARCHAR(100);
BEGIN
IF (SubStr IS NULL OR Str IS NULL)
THEN BEGIN Pos = NULL; EXIT; END
SubStr2 = SubStr || '%';
Tmp = '';
Pos = 1;
WHILE (Str NOT LIKE SubStr2 AND Str NOT LIKE Tmp) DO BEGIN
SubStr2 = '_' || SubStr2;
Tmp = Tmp || '_';
Pos = Pos + 1;
END
IF (Str LIKE Tmp) THEN Pos = 0;
END
This example (taken from the link) can be extended to then use SUBSTRING to split on the space.
For searching on a single character like a space, a simpler solution can probably be devised than above stored procedure. For your exact needs you might need to write a selectable stored procedure specifically for this purpose.
However, upgrading your database to Firebird 2.5 will give you much more powerful internal functions that simplify this query (and your life)!

I also wanted to split a full name string to first and last name and I used the following SQL statements in firebird 2.1 Database:
Patients is the table name.
The Name field holds the full name string e.g.: "Jon Doe". The FIRST_NAME field will store the first name and the LAST_NAME field the last name
First get the first name (string part before the first space) and execute a TRIM UPDATE statement to remove any spaces.
UPDATE "Patients" SET "Patients".FIRST_NAME = (SUBSTRING("Patients"."Name" FROM 1 FOR (POSITION(' ' IN "Patients"."Name"))))
UPDATE "Patients" SET "Patients".FIRST_NAME = TRIM(BOTH ' ' FROM "Patients".FIRST_NAME)
Then get the last name (the string after the first space) and execute a TRIM UPDATE statement to remove any spaces
UPDATE "Patients" SET "Patients"."LAST_NAME" = (SUBSTRING("Patients"."Name" FROM (POSITION(' ' IN "Patients"."Name")+1)))
UPDATE "Patients" SET "Patients".LAST_NAME = TRIM(BOTH ' ' FROM "Patients".LAST_NAME)
The result will be:
ID | NAME | FIRST_NAME | LAST_NAME
1 Jon Doe Jon Doe
2 Sarah Lee Sarah Lee

You could use a UDF, but that isn't strictly SQL
you could write a stored procedure to parse and split but thats not strictly SQL either

Related

Concatenate rows in function PostgreSQL

Assume there's a table projects containing project name, location, team id, start and end years. How can I concatenate rows so that the same names would combine the other information into one string?
name location team_id start end
Library Atlanta 2389 2015 2017
Library Georgetown 9920 2003 2007
Museum Auckland 3092 2005 2007
Expected output would look like this:
name Records
Library Atlanta, 2389, 2015-2017
Georgetown, 9920, 2003-2007
Museum Auckland, 3092, 2005-2007
Each line should contain end-of-line / new line character.
I have a function for this, but I don't think it would work with just using CONCAT. What are other ways this can be done? What I tried:
CREATE OR REPLACE TYPE projects (name TEXT, records TEXT);
CREATE OR REPLACE FUNCTION records (INT)
RETURNS SETOF projects AS
$$
RETURN QUERY
SELECT p.name
CONCAT(p.location, ', ', p.team_id, ', ', p.start, '-', p.end, CHAR(10))
FROM projects($1) p;
$$
LANGUAGE PLpgSQL;
I tried using CHAR(10) for new line, but its giving a syntax error (not sure why?).
The above sample concatenate the string but expectedly leaving out duplicated names.
You do not need PL/pgSQL for that.
First eliminate duplicate names using DISTINCT and then in a subquery you can concat the columns into a single string. After that use array_agg to create an array out of it. It will then "merge" multiple arrays, in case the subquery returns more than one row. Finally, get rid of the commas and curly braces using array_to_string. Instead of using the char value of a newline, you can simply use E'\n' (E stands for escape):
WITH j (name,location,team_id,start,end_) AS (
VALUES ('Library','Atlanta',2389,2015,2017),
('Library','Georgetown',9920,2003,2007),
('Museum','Auckland',3092,2005,2007)
)
SELECT
DISTINCT q1.name,
array_to_string(
(SELECT array_agg(concat(location,', ',team_id,', ',start,'-', end_, E'\n'))
FROM j WHERE name = q1.name),'') AS records
FROM j q1;
name | records
---------+----------------------------
Library | Atlanta, 2389, 2015-2017
| Georgetown, 9920, 2003-2007
|
Museum | Auckland, 3092, 2005-2007
Note: try to not use reserved strings (e.g. end,name,start, etc.) to name your columns. Although PostgreSQL allows you to use them, it is considered a bad practice.
Demo: db<>fiddle
A bit simple query:
select
name,
string_agg( concat(location, ', ', team_id, ', ', start, '-', "end"), E'\n') AS records
FROM t
group by name;
PostgreSQL fiddle

What does the parallel operator do in SQL SET?

The table meat_poultry_egg_inspect is being updated where column zip is being set to something wherever column st matches with PR, VR and length of zip is 3. I think its making the zip column into a five digit value: '00 + zip'.
UPDATE meat_poultry_egg_inspect
SET zip = '00' || zip
WHERE st IN('PR','VI') AND length(zip) = 3;
The || is the operator for string concatenation.
This is the SQL standard operator, although not all databases support it.

SAP HANA SQL SUBSTR_REGEXPR Match Aggregation

I am using HANA and am trying to create a new column based on the following:
Regex Example 1: SUBSTR_REGEXPR('([PpSs][Tt][Ss]?\w?\d{2,6})' in "TEXT") as "Location"
How can I get this to return all results instead of just the first? Is it a string agg of this expression repeated? There would be at most 6 matches in each text field (per row).
Regex Example 1 Current Output:
Row Text Location(new column)
1 msdfmsfmdf PT2222, ST 43434 asdasdas PT2222
Regex Example 1 Desired Output:
Row Text Location(new column)
1 msdfmsfmdf PT2222, ST 43434 asdasdas PT2222, ST43434
I also have varying formats so I need to be able to use multiple variations of that regex to be able to capture all matches and put them into the new "Location" column as a delimited aggregation. Is this possible?
One of the other variations is where I would need to pull the numbers from this series:
"Locations 1, 2, 35 & 5 lkfaskjdlsaf .282 lkfdsklfjlkdsj 002"
So far I have:
Regex Example 2: "Locations (\d{1,2}.?){1,5}"
but I know that is not working. When I remove the "Locations" it picks up the numbers but also picks up the .282 and 002 which I do not want.
Regex Example 2 Current Output:
Row Text Location(new column)
1 msdfmsfmdf Locations 3,5,7 & 9" asdasdas Locations 3
Regex Example 2 Desired Output:
Row Text Location(new column)
1 msdfmsfmdf Locations 3,5,7 & 9" asdasdas 3,5,7,9
Sometimes the "Location" in the text field is in the format which would require Example 1s Regex and sometimes it is in the format requiring example 2s regex so I would need to have the regex searching for both possible formats.
Example 3 Regex in Select Statement:
Select "Primary Key",
"Text",
STRING_AGG(SUBSTR_REGEXPR('([PpSs][Tt][Ss]?\w?\d{2,6})' OR '(\d{1,2}.?){1,5})' in "Text" ),',') as "Location"
FROM Table
Needs to capture both example 1 and 2 location formats using some sort of OR condition in the create column SQL
Regex Example 3 Current Output:
Not working, no output
Regex Example 3 Desired Output:
Row Text Location(new column)
1 msdfmsfmdf Locations 3,5,7 & 9" asdasdas 3,5,7,9
2 msdfmsfmdf PT2222, ST 43434 asdasdas PT2222, ST43434
Other Tools I have access to are SAS and python. Any alternate recommendations to simplify the process are welcome. I did already try in Tableau but same problem with only returning the first match. Aggregating them makes the calculation super slow and very long.
Please help me figure this out. Any help is much appreciated.
Thanks.
For single input string values, following script can be used.
Use of SubStr_RegExpr with Series_Generate_Integer to split string using SQLScript in HANA can be descriptive to understand the use of series_generate function
declare pString nvarchar(5000);
pString := 'msdfmsfmdf PT2222, ST 43434 asdasdas';
select
STRING_AGG(SUBSTR_REGEXPR( '([PpSs][Tt][Ss]?\w?\d{2,6})' IN Replace(pString,' ','') OCCURRENCE NT.Element_Number GROUP 1),',') as "Location"
from
DUMMY as SplitString,
SERIES_GENERATE_INTEGER(1, 0, 10 ) as NT;
Output will return as PT2222,ST43434
Thanks for adding the necessary requirement examples. This makes it a lot easier to work through the problem.
In this case, your requirement is to match multiple strings against multiple patterns and to apply multiple formatting operations on the output.
This cannot be done in a single regular expression in SAP HANA.
Basically, SAP HANA SQL allows two kinds of regex operations:
Match against a pattern and return one occurrence
Match against a pattern and replace one or ALL occurrences of this match
That means for this transformation we basically can try to remove everything that does not match the pattern or loop over the input string and pick out everything that matches.
The problem with the remove-approach (e.g. using SUBSTR_REGEXPR()) is that the matching patterns are not guaranteed to not overlap. That means we could remove matches for other patterns in the process.
Instead, I would use the first approach and try and pick all matches against all pattern and return those.
For that a scalar user-defined function can be created like this:
drop function extract_locators;
create function extract_locators(IN input_text NVARCHAR(1000))
returns location_text NVARCHAR(1000)
as
begin
declare matchers NVARCHAR(100) ARRAY;
declare part_res NVARCHAR(100) := '';
declare full_res NVARCHAR (2000) := '';
declare occn integer;
declare curr_matcher integer;
-- setting up matchers
matchers[1] := '(PT\s*[[:digit:]]+)|(ST\s*[[:digit:]]+)'; -- matches PTxxxx, pt xxxx , St ... , STxxxx
matchers[2] := '(?>\s)[1-9][0-9]*'; -- matches 21, 1, 23, 34
curr_matcher :=0;
-- loop over all matchers
while (:curr_matcher < cardinality(:matchers)) do
curr_matcher := :curr_matcher + 1;
-- loop over all occurrences
occn := 1;
part_res := '';
while (:part_res IS NOT NULL) do
part_res := SUBSTR_REGEXPR(:matchers[:curr_matcher]
FLAG 'i'
IN :input_text
OCCURRENCE :occn);
if (:part_res IS NOT NULL) then
occn := :occn + 1;
full_res := :full_res
|| MAP(LENGTH(:full_res), 0, '', ',')
|| IFNULL(:part_res, '');
else
BREAK;
end if;
end while; -- occurrences
-- if current matcher matched, don't apply the others
if (:full_res !='') then
BREAK;
end if;
end while; -- matchers
-- remove spaces
location_text := replace (:full_res, ' ', '');
end;
With your test data in a table like the following:
drop table loc_data;
create column table loc_data ("CASE" integer primary key,
"INPUT_TEXT" NVARCHAR(2000));
-- PT and ST
insert into loc_data values (1, 'msdfmsfmdf PT2222, ST 43434 asdasdas');
-- Locations
insert into loc_data values (2, 'Locations 1, 2, 35 & 5 lkfaskjdlsaf .282 lkfdsklfjlkdsj 002');
You can now simply run
select
*
, extract_locators("INPUT_TEXT") as location_text
from
loc_data;
To get
1 | msdfmsfmdf PT2222, ST 43434 asdasdas | PT2222,ST43434
2 | Locations 1, 2, 35 & 5 lkfaskjdlsaf .282 lkfdsklfjlkdsj 002 | 1,2,35,5
This approach also allows for keeping the matching rules in a separate table and use a cursor (instead of the array) to loop over them. In addition to that, it keeps the single regular expressions rather small and relatively easy to understand, which is probably the biggest benefit here.
The runtime performance obviously can be an issue, therefore I would probably try and save the results of the operation and only run the function when the data changes.

Remove sub string from a column's text

I've the following two columns in Postgres table
name | last_name
----------------
AA | AA aa
BBB | BBB bbbb
.... | .....
.... | .....
How can I update the last_name by removing name text from it?
final out put should be like
name | last_name
----------------
AA | aa
BBB | bbbb
.... | .....
.... | .....
UPDATE table SET last_name = regexp_replace(last_name, '^' || name || ' ', '');
This only removes one copy from the beginning of the column and correctly removes the trailing space.
Edit
I'm using a regular expression here. '^' || name || ' ' builds the regular expression, so with the 'Davis McDavis' example, it builds the regular expression '^Davis '. The ^ causes the regular expression to be anchored to the beginning of the string, so it's going to match the word 'Davis' followed by a space only at the beginning of the string it is replacing in, which is the last_name column.
You could achieve the same effect without regular expressions like this:
UPDATE table SET last_name = substr(last_name, length(name) + 2);
You need to add two to the length to create the offset because substr is one-based (+1) and you want to include the space (+1). However, I prefer the regular expression solution even though it probably performs worse because I find it somewhat more self-documenting. It has the additional advantage that it is idempotent: if you run it again on the database it won't have any effect. The substr/offset method is not idempotent; if you run it again, it will eat more characters off your last name.
Not sure about syntax, but try this:
UPDATE table
SET last_name = TRIM(REPLACE(last_name,name,''))
I suggest first to check it by selecting :
SELECT REPLACE(last_name,name,'') FROM table
you need the replace function see http://www.postgresql.org/docs/8.1/static/functions-string.html
UPDATE table SET last_name = REPLACE(last_name,name,'')

Select all table entries which have a fully capitalized string in a specific column?

I have a database table with a few thousand entries. A part of the entries (~20%) have been entered with a fully capitalized strings in the 'name' column.
Example:
id | name
---------
1 | THOMAS GOLDENBERG
2 | Henry Samuel
3 | GIL DOFT
4 | HARRY CRAFT
5 | Susan Etwall
6 | Carl Cooper
How would an SQL query look like that selects all entries with a fully capitalized string in the name column? (i.e. in the example: those with the ID 1,3,4)
In MySQL it would be:
SELECT id FROM table WHERE name = UPPER(name);
I think this would work the same way in SQL Server, DB2 and Postgres.
What database system?
In theory you can do a simple SELECT ... WHERE name = UPPER(name); but that does not always work. Depending on the collation of your data, you may found that all records satisfy this condition because the comparison used may be case insensitive.
You need to ensure you compare using a case sensitive collation, and the correct answer depends on the database platform you use. For example, using SQL Server syntax:
SELECT ... WHERE Name COLLATE Latin1_General_100_CS_AS = UPPER(Name);
This also works in MySQL with the condition that you use a collation name valid on MySQL.
select * from your_table where name = upper(name)
Here's a MySql function to convert uppercase to title case:
example:
update your_table set name = tcase(name) where name = upper(name);
function:
CREATE FUNCTION `tcase`(str text) RETURNS text CHARSET latin1
DETERMINISTIC
BEGIN
DECLARE result TEXT default '';
DECLARE space INT default 0;
DECLARE last_space INT default 0;
IF (str IS NULL) THEN
RETURN NULL;
END IF;
IF (char_length(str) = 0) THEN
RETURN '';
END IF;
SET result = upper(left(str,1));
SET space = locate(' ', str);
WHILE space > 0 DO
SET result = CONCAT(result, SUBSTRING(str, last_space+2, space-last_space-1));
SET result = CONCAT(result, UPPER(SUBSTRING(str, space+1, 1)));
SET last_space = space;
SET space = locate(' ', str, space+2);
END WHILE;
SET result = CONCAT(result, SUBSTRING(str, last_space+2));
RETURN result;
END $$
DELIMITER ;