TERADATA - How to split a character column and keep the last token? - sql

I have a table with article names and I would like to select the last word of each article of the table.
Right now I'm doing it in SAS and I my code looks like:
PROC SQL;
CREATE TABLE last_word as
SELECT scan(names,-1) as last_w
FROM articles;
QUIT;
I am aware of the STRTOK function in TERADATA but it seems that it only accepts positive values as indexes and in my case the articles names don't have a constant number of words.

You could use function REGEXP_SUBSTR to do this:
CREATE TABLE last_word as
SELECT REGEXP_SUBSTR(names, '[^,]+$') as last_w
FROM articles;
The Regex here will grab the last element of the list, where the list is comma delimited.

Related

How do you query a table filtering on a substring of one of the columns?

I have a table I wish to query. It has a string variable called comment which contains an ID along with other things. (i.e. "123456;varA;varB")
rowNo
comment
1
"123456;varA;varB"
2
"987654;varA;varB"
I want to filter based on the first substring in the comment variable.
That is, I want to filter the table on rows where the first substring of comment is "123456" (which in the example would return the first row)
How do I do this?
I was thinking something along the lines of the code below, using the "string_split" function, but it doesn't work.
SELECT *,
FROM table
WHERE (SELECT value FROM STRING_SPLIT(comment,';',1)="123456")
Does anyone have any ideas?
Note, I am querying in SQL in SAS, and this is on a large dataset, so I don't want to create a new table with a new column to then query on instead. Ideally I'd want to query on the existing table directly.
You can use the SCAN() function to parse a string.
WHERE '123456'=scan(comment,1,';')

Match specific string format - REGEXP_CONTAINS - GBQ Language

I am trying to write a query that will only match table names that match a specific format, that format being as follows: FirstWord1_SecondWord2_ThirdWord3.
So all I am trying to get are table names that match the format of three alphanumeric words separated by underscores.
I've been struggling to workout the exact way to use REGEXP_CONTAINS to get the results I want. Below is the closest i've been able to get to, but it just wont return any results, despite the fact that I know there are tables that match the format I want to query for.
SELECT table_name as tablenames
FROM project.dataset.INFORMATION_SCHEMA.TABLES
WHERE (
REGEXP_CONTAINS(table_name, '^([[:alnum:]]+_[[:alnum:]]+_[[:alnum:]])$')
)
Any assistance with this would be greatly appreciated!
Your last [[:alnum:]] is missing a + to indicate 1 or more matched characters.
SELECT table_name as tablenames
FROM project.dataset.INFORMATION_SCHEMA.TABLES
WHERE (
REGEXP_CONTAINS(table_name, '^([[:alnum:]]+_[[:alnum:]]+_[[:alnum:]]+)$')
)
or
SELECT table_name as tablenames
FROM project.dataset.INFORMATION_SCHEMA.TABLES
WHERE (
REGEXP_CONTAINS(table_name, '^[[:alnum:]]+_[[:alnum:]]+_[[:alnum:]]+$')
)
Let me know if this works for you.

How to select column name "startwith" in proc sql query in SAS

I am looking a way to select all columns name that "startwith" a specific character. My data contains the same column name multiple time with a digit number at the end and I want the code to always select all the columns regardless the last digit numbers.
For example, if I have 3 kinds of apple in my column names, the dataset will contains the column: "apple_1", "apple_2" and "apple_3". Therefore, I want to select all columns that startwith "apple_" in a proc sql statement.
Thanks you
In regular SAS code you can use : as a wildcard to create a variable list. You normally cannot use variable lists in SQL code, but you can use them in dataset options.
proc sql ;
create table want as
select *
from mydata(keep= id apple_: )
;
quit;
Use like:
proc sql;
select t.*
from t
where col like 'apple%';
If you want the _ character as well, you need to use the ESCAPE clause, because _ is a wildcard character for LIKE:
proc sql;
select t.*
from t
where col like 'apple$_%' escape '$';

SQL - just view the description for explanation

I would like to ask if it is possible to do this:
For example the search string is '009' -> (consider the digits as string)
is it possible to have a query that will return any occurrences of this on the database not considering the order.
for this example it will return
'009'
'090'
'900'
given these exists on the database. thanks!!!!
Use the Like operator.
For Example :-
SELECT Marks FROM Report WHERE Marks LIKE '%009%' OR '%090%' OR '%900%'
Split the string into individual characters, select all rows containing the first character and put them in a temporary table, then select all rows from the temporary table that contain the second character and put these in a temporary table, then select all rows from that temporary table that contain the third character.
Of course, there are probably many ways to optimize this, but I see no reason why it would not be possible to make a query like that work.
It can not be achieved in a straight forward way as there is no sort() function for a particular value like there is lower(), upper() functions.
But there is some workarounds like -
Suppose you are running query for COL A, maintain another column SORTED_A where from application level you keep the sorted value of COL A
Then when you execute query - sort the searchToken and run select query with matching sorted searchToken with the SORTED_A column

Sorting Table Variables by Prefix/Starting Letter

This is for a SAS table, so SQL commands would work, as well.
I have a table with 300 variables; they have 5 different prefixes, which I would like to sort them by. I want them in a particular order (mtr prefix before date prefix), but alphabetical would be acceptable.
I was thinking SQL would have something along the lines of:
Select mtr*, date* from Table
or
Select mtr%, date% from Table
As gbn says, you'll need to get the column names and dynamically build some sql (or data step code).
Here's a solution that retrieves the column names from an automatic SAS view that holds metadata about your session, ordered alphabetically, into a single macro variable which you can then use later in your code:
proc sql noprint;
select name into :orderedVarNames separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='YOUR_TABLE_NAME'
order by name
;
quit;
(Obviously you'll need to replace the quoted values with the correct libname and table name for your table.) Then you can use this macro variable in another step, like this:
proc sql;
select &orderedVarNames
from YOUR_TABLE_NAME
;
quit;
Here, "&orderedVarNames" is resolved to the list of column names. You can check what is in the variable by putting it out to the log thus: %put &orderedVarNames;
There are other ways to do what you're thinking of, but this is probably the quickest and will work for any table. If you were going to use this technique for a variable list in a data step, change the separator to separated by ' '.
Once you've got the hang of this, you could then tailor the solution to get the exact order you want by generating more than one macro variable and filtering what you're retrieving from sashelp.vcolumn. Something like this:
proc sql noprint;
select name into :orderedMTRvars separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='MYTABLE' and substr(name,1,3)='MTR'
order by name
;
select name into :orderedDATEvars separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='MYTABLE' and substr(name,1,4)='DATE'
order by name
;
quit;
proc sql;
select &orderedMTRVars, &orderedDATEVars
from MYTABLE
;
quit;