Split String and Mix with another string in redshift - sql

I have two columns one is the description and its corresponding tags in another column is a redshift table
description | Tags
John Plays Football | name, Verb , object
I want the output as description with tag
John name plays verb football object
The one more addition for this is where ever there is a description which contains colon ( : ) I would like to separate the words without removing :
For eg:
description | Tags
Des:John Plays Football | constant,name, Verb, object
Output
Des: constant John name plays verb football object
also need exclude : rule for numbers on both sides of colon to make sure time is getting (eg: 10:10) separated
I have no clue where I will start only

You have create UDF in Python to do this .
Sample code
create function f_py_conc (a srting, b srting)
returns srting
stable
as $$
c=a.split(" ")
d=b.split(",")
e =""
for i in range(len(c))
d=c[i]+" " +d[i]
return e
$$ language plpythonu;
Note: Python is dependent on space so may be this code can not run as it may need some change..

you can actually achieve this using Redshift functions, specifically the split part method. I am not sure if this will address the : as well but give this a go and then comment back with what you need to do next.
The idea is building a concatenated field assembling all the distinct parts of both fields. You can do this as follows:
select description, tags,
SPLIT_PART(description, ' ', 1) + ' ' + SPLIT_PART(tags, ',', 1) + ' ' +
SPLIT_PART(description, ' ', 2) + ' ' + SPLIT_PART(tags, ',', 2) + ' ' +
SPLIT_PART(description, ' ', 3) + ' ' + SPLIT_PART(tags, ',', 3)
as description_with_tag
from table_name
Find out more about this function here

Related

Concatenating columns in SQL with delimiter. If first column is null, do not want delimiter to show up after it. How do I get rid of that?

SELECT TOP 1000 places.*
, home_Desc
, CONCAT_WS(' - ', COALESCE(brand, ' '), home_Desc, location_Desc) AS homeSite
Example:
brand
home_Desc
location_Desc
blue
large
woody
NULL
small
forest
So right now I am getting:
1.' blue - large - woody '
2. ' - small - forest '
But what I want for the second set is:
small - forest
CREATE TABLE TEST(brand VARCHAR(100), home_Desc VARCHAR(100), location_Desc VARCHAR(100))
INSERT INTO TEST VALUES ('blue','large','woody'),
(NULL,'small','forest')
SELECT CONCAT_WS(' - ',brand, home_Desc, location_Desc) FROM TEST
As Squirrel say, you must replace COALESCE(brand, '') to brand
Taking advantage of the rule a non-NULL value (the dash character in this instance) concatenated with a NULL value (branch, in this instance) would yield NULL (and assuming that the other two can never be null/blank:
, COALESCE(brand + '-', ' ') + home_Desc + ' - ' + location_Desc AS homeSite

Using Upper to Capitalize the first letter of City name

I am doing some data clean-up and need to Capitalize the first letter of City names. How do I capitalize the second word in a City Like Terra Bella.
SELECT UPPER(LEFT([MAIL CITY],1))+
LOWER(SUBSTRING([MAIL CITY],2,LEN([MAILCITY])))
FROM masterfeelisting
My results is this 'Terra bella' and I need 'Terra Bella'. Thanks in advance.
Ok, I know I answered this before, but it bugged me that we couldn't write something efficient to handle an unknown amount of 'text segments'.
So re-thinking it and researching, I discovered a way to change the [MAILCITY] field into XML nodes where each 'text segment' is assigned it's own Node within the xml field. Then those xml fields can be processed node by node, concatenated together, and then changed back to a SQL varchar. It's convoluted, but it works. :)
Here's the code:
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(max) not null
);
INSERT INTO #masterfeelisting VALUES
('terra bellA')
,(' terrA novA ')
,('chicagO ')
,('bostoN')
,('porT dE sanTo')
,(' porT dE sanTo pallo ');
SELECT
RTRIM
(
(SELECT
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, '')) + ' '
FROM [xmlNodeRecordSet].[nodeField].nodes('/N') as [xmlField]([xmlNode]) FOR
xml path(''), type
).value('.', 'varchar(max)')
) as [MAILCITY]
FROM
(SELECT
CAST('<N>' + REPLACE([MAILCITY],' ','</N><N>')+'</N>' as xml) as [nodeField]
FROM #masterfeelisting
) as [xmlNodeRecordSet];
Drop table #masterfeelisting;
First I create a table and fill it with dummy values.
Now here is the beauty of the code:
For each record in #masterfeelisting, we are going to create an xml field with a node for each 'text segment'.
ie. '<N></N><N>terrA</N><N>novA</N><N></N>'
(This is built from the varchar ' terrA novA ')
1) The way this is done is by using the REPLACE function.
The string starts with a '<N>' to designate the beginning of the node. Then:
REPLACE([MAILCITY],' ','</N><N>')
This effectively goes through the whole [MAILCITY] string and replaces each
' ' with '</N><N>'
and then the string ends with a '</N>'. Where '</N>' designates the end of each node.
So now we have a beautiful XML string with a couple of empty nodes and the 'text segments' nicely nestled in their own node. All the 'spaces' have been removed.
2) Then we have to CAST the string into xml. And we will name that field [nodeField]. Now we can use xml functions on our newly created record set. (Conveniently named [xmlNodeRecordSet].)
3) Now we can read the [xmlNodeRecordSet] into the main sub-Select by stating:
FROM [xmlNodeRecordSet].[nodeField].nodes('/N')
This tells us we are reading the [nodeField] as nodes with a '/N' delimiter.
This table of node fields is then parsed by stating:
as [xmlField]([xmlNode]) FOR xml path(''), type
This means each [xmlField] will be parsed for each [xmlNode] in the xml string.
4) So in the main sub-select:
Each blank node '<N></N>' is discarded. (Or not processed.)
Each node with a 'text segment' in it will be parsed. ie <N>terrA</N>
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
This code will grab each node out of the field and take its contents '.' and only grab the first character 'char(1)'. Then it will Upper case that character. (the plus sign at the end means it will concatenate this letter with the next bit of code:
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, ''))
Now here is the beauty... STUFF is a function that will take a string, from a position, for a length, and substitute another string.
STUFF(string, start position, length, replacement string)
So our string is:
[xmlField].[xmlNode].value('.', 'varchar(max)')
Which grabs the whole string inside the current node since it is 'varchar(max)'.
The start position is 1. The length is 1. And the replacement string is ''. This effectively strips off the first character by replacing it with nothing. So the remaining string is all the other characters that we want to have lower case. So that's what we do... we use LOWER to make them all lower case. And this result is concatenated to our first letter that we already upper cased.
But wait... we are not done yet... we still have to append a + ' '. Which adds a blank space after our nicely capitalized 'text segment'. Just in case there is another 'text segment' after this node is done.
This main sub-Select will now parse each node in our [xmlField] and concatenate them all nicely together.
5) But now that we have one big happy concatenation, we still have to change it back from an xml field to a SQL varchar field. So after the main sub-select we need:
.value('.', 'varchar(max)')
This changes our [MAILCITY] back to a SQL varchar.
6) But hold on... we still are not done. Remember we put an extra space at the end of each 'text segment'??? Well the last 'text segment still has that extra space after it. So we need to Right Trim that space off by using RTRIM.
7) And dont forget to rename the final field back to as [MAILCITY]
8) And that's it. This code will take an unknown amount of 'text segments' and format each one of them. All using the fun of XML and it's node parsers.
Hope that helps :)
Here's one way to handle this using APPLY. Note that this solution supports up to 3 substrings (e.g. "Phoenix", "New York", "New York City") but can easily be updated to handle more.
DECLARE #string varchar(100) = 'nEW yoRk ciTY';
WITH DELIMCOUNT(String, DC) AS
(
SELECT #string, LEN(RTRIM(LTRIM(#string)))-LEN(REPLACE(RTRIM(LTRIM(#string)),' ',''))
),
CIPOS AS
(
SELECT *
FROM DELIMCOUNT
CROSS APPLY (SELECT CHARINDEX(char(32), string, 1)) CI1(CI1)
CROSS APPLY (SELECT CHARINDEX(char(32), string, CI1.CI1+1)) CI2(CI2)
)
SELECT
OldString = #string,
NewString =
CASE DC
WHEN 0 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,8000))
WHEN 1 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,100))
WHEN 2 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,CI2-(CI1+1))) +
UPPER(SUBSTRING(string,CI2+1,1))+LOWER(SUBSTRING(string,CI2+2,100))
END
FROM CIPOS;
Results:
OldString NewString
--------------- --------------
nEW yoRk ciTY New York City
This will only capitalize the first letter of the second word. A shorter but less flexible approach. Replace #str with [Mail City].
DECLARE #str AS VARCHAR(50) = 'Los angelas'
SELECT STUFF(#str, CHARINDEX(' ', #str) + 1, 1, UPPER(SUBSTRING(#str, CHARINDEX(' ', #str) + 1, 1)));
This is a way to use imbedded Selects for three City name parts.
It uses CHARINDEX to find the location of your separator character. (ie a space)
I put an 'if' structure around the Select to test if you have any records with more than 3 parts to the city name. If you ever get the warning message, you could add another sub-Select to handle another city part.
Although... just to be clear... SQL is not the best language to do complicated formatting. It was written as a data retrieval engine with the idea that another program will take that data and massage it into a friendlier look and feel. It may be easier to handle the formatting in another program. But if you insist on using SQL and you need to account for city names with 5 or more parts... you may want to consider using Cursors so you can loop through the variable possibilities. (But Cursors are not a good habit to get into. So don't do that unless you've exhausted all other options.)
Anyway, the following code creates and populates a table so you can test the code and see how it works. Enjoy!
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(30) not null
);
Insert into #masterfeelisting select 'terra bella';
Insert into #masterfeelisting select ' terrA novA ';
Insert into #masterfeelisting select 'chicagO ';
Insert into #masterfeelisting select 'bostoN';
Insert into #masterfeelisting select 'porT dE sanTo';
--Insert into #masterfeelisting select ' porT dE sanTo pallo ';
Declare #intSpaceCount as integer;
SELECT #intSpaceCount = max (len(RTRIM(LTRIM([MAILCITY]))) - len(replace([MAILCITY],' ',''))) FROM #masterfeelisting;
if #intSpaceCount > 2
SELECT 'You need to account for more than 3 city name parts ' as Warning, #intSpaceCount as SpacesFound;
else
SELECT
cThird.[MAILCITY1] + cThird.[MAILCITY2] + cThird.[MAILCITY3] as [MAILCITY]
FROM
(SELECT
bSecond.[MAILCITY1] as [MAILCITY1]
,SUBSTRING(bSecond.[MAILCITY2],1,bSecond.[intCol2]) as [MAILCITY2]
,UPPER(SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 1, 1)) +
SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 2,LEN(bSecond.[MAILCITY2]) - bSecond.[intCol2]) as [MAILCITY3]
FROM
(SELECT
SUBSTRING(aFirst.[MAILCITY],1,aFirst.[intCol1]) as [MAILCITY1]
,UPPER(SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, 1)) +
SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 2,LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) as [MAILCITY2]
,CHARINDEX ( ' ', SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) ) as intCol2
FROM
(SELECT
UPPER (LEFT(RTRIM(LTRIM(mstr.[MAILCITY])),1)) +
LOWER(SUBSTRING(RTRIM(LTRIM(mstr.[MAILCITY])),2,LEN(RTRIM(LTRIM(mstr.[MAILCITY])))-1)) as [MAILCITY]
,CHARINDEX ( ' ', RTRIM(LTRIM(mstr.[MAILCITY]))) as intCol1
FROM
#masterfeelisting as mstr -- Initial Master Table
) as aFirst -- First Select Shell
) as bSecond -- Second Select Shell
) as cThird; -- Third Select Shell
Drop table #masterfeelisting;

Postgresql - remove any whitespace from sub string

How can I remove any white space from substring of string?
For example I have this number '+370 650 12345'. I need all numbers to have this format country_code rest_of_the_number or in that example: +370 65012345. How could you achieve that with PostgreSQL?
I could use trim() function, but then it would remove all whitespace.
Assuming the column is named phone_number:
left(phone_number, strpos(phone_number, ' '))
||regexp_replace(substr(phone_number, strpos(phone_number, ' ') + 1), ' ', '', 'g')
It first takes everything up to the first space and then concatenates it with the result of replacing all spaces from the rest of the string.
If you also need to deal with other whitespace than just a space, you could use '\s' for the search value in regexp_replace()
If you are able to assume that a country code will always be present, you could try using a regular expression to capture the parts of interest. Assuming that your phone numbers are stored in a column named content in a table named numbers, you could try something like the following:
SELECT parts[1] || ' ' || parts[2] || parts[3]
FROM (
SELECT
regexp_matches(content, E'^\\s*(\\+\\d+)\\s+(\\d+)\\s+(\\d+)\\s*$') AS parts
FROM numbers
) t;
The following will work even if the country code is absent (see SQL Fiddle Demo here):
SELECT TRIM(REPLACE(REPLACE(REGEXP_REPLACE('+370 650 12345', '^((\+\d+)\s+)?(.*)$', '\1|\3'), ' ', ''), '|', ' '));
Returns: +370 65012345
SELECT TRIM(REPLACE(REPLACE(REGEXP_REPLACE('370 650 12345', '^((\+\d+)\s+)?(.*)$', '\1|\3'), ' ', ''), '|', ' '));
Returns: 37065012345
It looks for a country code (a set of numbers starting with a + sign) at the beginning, and replaces any whitespace following that code with a pipe |. It then replaces all the spaces in the resulting string with the empty string, then replaces occurrences of | with spaces! The choice of | is arbitrary, I suppose it could be any non-digit character.

How to glue 3 rows in one string?

In table pics I have 3 different rows, where surname, name and middle name are stored.
In query I need to glue them to one string:
pics.e_family + ' ' + pics.e_name + ' ' + pics.e_sname AS fio
All works perfectly, but one entry don't have middle name at all (no mistake, its really so). And in fio I get NULL. Is there any possibility to modify code, or make additional check if one or more of the rows are null, replace it with space symbol or just show remain rows?
If you mean columns instead of rows, you can use COALESCE or ISNULL:
pics.e_family + ' ' + COALESCE(pics.e_name, '') + ' ' + pics.e_sname AS fio

Remove inner white spaces in text data with SQL

All I have what seems to be a pretty straight forward question, that I haven't been able to figure out. For example, if I have a text string like T6L 7H5. Using SQL I need to remove the inner white space from this string so that it displays like T6L7H5.
Things to consider:
Teradata (v.13.10) is my RDBMS, so REPLACE('T6L 7H5', ' ', '') is not an
option here.
On this particular server I am a business user w/ very limited
permissions so creating a UDF is not an option either.
Can't test this, so spit-balling, but you should be able to leverage Substring and Position:
SELECT SUBSTRING('T6L 7H5', 1, POSITION (' ' IN 'T6L 7H5')-1) || SUBSTRING('T6L 7H5', POSITION (' ' IN 'T6L 7H5')+1, CHARACTER_LENGTH('T6L 7H5') - POSITION (' ' IN 'T6L 7H5') )
If the field is consistently formatted like your example then:
substring('T6L 7H5',1,3)||substring('T6L 7H5',4,3)
For single white space you can use POSITION() and SUBSTRING(). You have to offset for the whitespace location that is returned by the POSITION() function.
WITH CTE(FieldName) AS
(SELECT 'TB7 TCH' AS FieldName)
SELECT SUBSTRING(FieldName FROM 1 FOR (POSITION(' ' IN FieldName) - 1))
|| SUBSTRING(FieldName FROM (POSITION(' ' IN FieldName) + 1))
FROM CTE;
try using substr( );
Ex : 4= start at position 4 && 1 = delete one char.
substr(Table1.fieldname, 4,1)