Remove all non-numeric characters from string DAX - ssas

Is it possible to remove all non-numeric characters from a String of varying length in DAX in a calculated column?
The model is an ssas-tabular model, so I'm unable to use M language to clean the data at query time.
Also i want to remove starting 0 digits, but not end 0.
Consider the following input in a column:
C1
C2
C3
6F
2F
5Z
05F
A10
And this is the output that i'm looking for
1
2
3
6
2
5
5
10

If the column is known width, and not too wide, you could use this approach. Let's assume that your column of mixed letters and numbers is called [Letters And Numbers]:
Just The Numbers = IFERROR(VALUE(MID([Letters And Numbers], 1, 1)), IFERROR(VALUE(MID([Letters And Numbers], 2, 1)), BLANK()))
Of course, this gets more complicated if you expect more than one number in the column.
In that case, it would be very inconvenient to use DAX language to do it; you'd need to write something like:
Just The Numbers = SUBSTITUTE(SUBSTITUTE([Letters And Numbers], "A", ""), "B", "")
except you'd need 26 substitutes, unless you expect more or fewer non-numeric characters.
Better by far to use M in the edit queries section. Add a custom column with the following definition:
= Table.AddColumn(#"Previous Step", "Just The Numbers", each Text.Combine(List.RemoveItems(Text.ToList([Letters And Numbers]),{"A".."z"})))
And if you're unable to use M because you are using direct query to a SSAS tabular model, then the only option is probably to modify the SQL query that loads the table into the tabular model to add an additional column.
There are probably many examples of some T-SQL that could do that, here is one:
USE AdventureWorksDW2012;
WITH split AS (
SELECT AddressLine1, v.number, character.c
FROM DimReseller AS r
JOIN master..spt_values AS v ON v.number BETWEEN 1 AND LEN(r.AddressLine1)
CROSS APPLY (VALUES(SUBSTRING(r.AddressLine1, v.number, 1))) AS character(c)
WHERE v.type = 'P'
AND character.c LIKE '[0-9]')
SELECT AddressLine1,
output = (SELECT c
FROM split
WHERE r.AddressLine1 = split.AddressLine1
ORDER BY number ASC
FOR XML PATH, TYPE).value(N'.[1]', N'bigint')
FROM dbo.DimReseller AS r
GROUP BY AddressLine1;

Related

IIF Function returning incorrect calculated values - SQL Server

I am writing a query to show returns of placing each way bets on horse races
There is an issue with the PlaceProfit result - This should show a return if the horses finishing position is between 1-4 and a loss if the position is => 5
It does show the correct return if the horses finishing position is below 9th, but 10th place and above is being counted as a win.
I include my code below along with the output.
ALTER VIEW EachWayBetting
AS
SELECT a.ID,
RaceDate,
runners,
track.NAME AS Track,
horse.NAME as HorseName,
IndustrySP,
Place AS 'FinishingPosition',
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
IIF(A.Place = '1', 1.0 * (A.IndustrySP-1), '-1') AS WinProfit,
IIF(A.Place <='4', 1.0 * (A.IndustrySP-1)/5, '-1') AS PlaceProfit
FROM dbo.NewRaceResult a
LEFT OUTER JOIN track ON track.ID = A.TrackID
LEFT OUTER JOIN horse ON horse.ID = A.HorseID
WHERE a.Runners > 22
This returns:
As I mention in the comments, the problem is your choice of data type for place, it's varchar. The ordering for a string data type is completely different to that of a numerical data type. Strings are sorted by character from left to right, in the order the characters are ordered in the collation you are using. Numerical data types, however, are ordered from the lowest to highest.
This means that, for a numerical data type, the value 2 has a lower value than 10, however, for a varchar the value '2' has a higher value than '10'. For the varchar that's because the ordering is completed on the first character first. '2' has a higher value than '1' and so '2' has a higher value than '10'.
The solution here is simple, fix your design; store numerical data in a numerical data type (int seems appropriate here). You're also breaking Normal Form rules, as you're storing other data in the column; mainly the reason a horse failed to be classified. Such data isn't a "Place" but information on why the horse didn't place, and so should be in a separate column.
You can therefore fix this by firstly adding a new column, then updating it's value to be the values that aren't numerical and making place only contain numerical data, and then finally altering your place column.
ALTER TABLE dbo.YourTable ADD UnClassifiedReason varchar(5) NULL; --Obviously use an appropriate length.
GO
UPDATE dbo.YourTable
SET Place = TRY_CONVERT(int,Place),
UnClassifiedReason = CASE WHEN TRY_CONVERT(int,Place) IS NULL THEN Place END;
GO
ALTER TABLE dbo.YourTable ALTER COLUMN Place int NULL;
GO
If Place does not allow NULL values, you will need to ALTER the column first to allow them.
In addition to fixing the data as Larnu suggests, you should also fix the query:
SELECT nrr.ID, nrr.RaceDate, nrr.runners,
t.NAME AS Track, t.NAME as HorseName, nrr.IndustrySP,
Place AS FinishingPosition,
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
(CASE WHEN nrr.Place = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN nrr.Place <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
FROM dbo.NewRaceResult nrr LEFT JOIN
track t
ON t.ID = nrr.TrackID LEFT JOIN
horse h
ON h.ID = nrr.HorseID
WHERE nrr.Runners > 22;
The important changes are removing single quotes from numbers and column names. It seems you need to understand the differences among strings, numbers, and identifiers.
Other changes are:
Meaningful table aliases, rather than meaningless letters such as a.
Qualifying all column references, so it is clear where columns are coming from.
Switching from IFF() to CASE. IFF() is bespoke SQL Server; CASE is standard SQL for conditional expressions (both work fine).
Being sure that the types returned by all branches of the conditional expressions are consistent.
Note: This version will work even if you don't change the type of Place. The strings will be converted to numbers in the appropriate places. I don't advocate relying on such silent conversion, so I recommend fixing the data.
If place can have non-numeric values, then you need to convert them:
(CASE WHEN TRY_CONVERT(int, nrr.Place) = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN TRY_CONVERT(int, nrr.Place) <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
But the important point is to fix the data.

Teradata column not sorting correctly

I'm trying to join two tables by a column, and sort a table by the same column.
Here is some example data from the two tables:
table.x
state
00039
01156
table.y
state
39
1156
How do I join and sort the tables in SQL assistant?
Simplest solution would be to cast both sides to integer as #Andrew mentioned, so you could use simple casting, or trycast(...) which will try to cast the value and if that fails won't return an error, but NULL value instead:
select *
from x
inner join y on
trycast(y.state as integer) = trycast(y.state as integer)
order by y.state
Old answer (leaving this here for sake of future readers and what you can / can't do):
If you have a recent version of Teradata (you didn't specify it) you would also have LPAD function. Assuming that y.state is not text, but a number we'd also need to cast it, as lpad takes string as argument. If it is, omit cast(...):
select *
from x
inner join y on
x.state = lpad(cast(y.state as varchar(5)), 5, '0')
order by y.state
If you don't have an LPAD function, then some dirty code with substring might come in handy:
select *
from x
inner join y on
x.state = substring('00000' from char_length(cast(y.state as varchar(5))+1) || cast(y.state as varchar(5)
order by y.state
Above assumes that you store numbers within maximum 5 digits. If it's beyond that number (your sample data says 5) then you need to adjust the code.

How to find the next sequence number in oracle string field

I have a database table with document names stored as a VARCHAR and I need a way to figure out what the lowest available sequence number is. There are many gaps.
name partial seq
A-B-C-0001 A-B-C- 0001
A-B-C-0017 A-B-C- 0017
In the above example, it would be 0002.
The distinct name values total 227,705. The number of "partial" combinations is quite large A=150, B=218, C=52 so 1,700,400 potential combinations.
I found a way to iterate through from min to max per distinct value and list all the "missing" (aka available) values, but this seems inefficient given we are not using anywhere close to the max potential partial combinations (10,536 out of 1,700,400).
I'd rather have a table based on existing data with a partial value, it's next available sequence value, and a non-existent partial means 0001.
Thanks
Hmmmm, you can try this:
select coalesce(min(to_number(seq)), 0) + 1
from t
where partial = 'A-B-C-' and
not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
);
EDIT:
For all partials you need a group by:
You can use to_char() to convert it back to a character, if necessary.
select partial, coalesce(min(to_number(seq)), 0) + 1
from t
where not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
)
group by partial;

SQL how many records start with the same letter

So I have thousands of records in a database in a column A.
I want to see how many start with each letter of the alphabet and all single digit numbers.
So i need a count and the letter associated to it. I also want to see all the two alphanumeric combinations i.e. aa ab ac ad ae etc. and their count.
Also with three and four characters etc.
You can generally GROUP BY an expression like LEFT(columnname, 1), which allows you to perform a COUNT() aggregate grouped by an arbitrary expression. The most ideal substring function to use may depend on your RDBMS.
SELECT
UPPER(LEFT(columnname, 1)) AS first_char,
COUNT(*)
FROM yourtable
GROUP BY UPPER(LEFT(columnname, 1))
ORDER BY first_char ASC
Likewise, to get the 2 character match
SELECT
UPPER(LEFT(columnname, 2)) AS first_2char,
COUNT(*)
FROM yourtable
GROUP BY UPPER(LEFT(columnname, 2))
ORDER BY first_2char ASC
Some RDBMS will allow you to use the column alias in the GROUP BY rather than the full expression, as in the simplified GROUP BY first_char.
Note that I have upper-cased them so you don't get separate matches for Ab, AB, ab, aB if you are using a case-sensitive collation. (I believe SQL Server uses case-insensitive collations by default, however)

Splitting text in SQL Server stored procedure

I'm working with a database, where one of the fields I extract is something like:
1-117 3-134 3-133
Each of these number sets represents a different set of data in another table. Taking 1-117 as an example, 1 = equipment ID, and 117 = equipment settings.
I have another table from which I need to extract data based on the previous field. It has two columns that split equipment ID and settings. Essentially, I need a way to go from the queried column 1-117 and run a query to extract data from another table where 1 and 117 are two separate corresponding columns.
So, is there anyway to split this number to run this query?
Also, how would I split those three numbers (1-117 3-134 3-133) into three different query sets?
The tricky part here is that this column can have any number of sets here (such as 1-117 3-133 or 1-117 3-134 3-133 2-131).
I'm creating these queries in a stored procedure as part of a larger document to display the extracted data.
Thanks for any help.
Since you didn't provide the DB vendor, here's two posts that answer this question for SQL Server and Oracle respectively...
T-SQL: Opposite to string concatenation - how to split string into multiple records
Splitting comma separated string in a PL/SQL stored proc
And if you're using some other DBMS, go search for "splitting text ". I can almost guarantee you're not the first one to ask, and there's answers for every DBMS flavor out there.
As you said the format is constant though, you could also do something simpler using a SUBSTRING function.
EDIT in response to OP comment...
Since you're using SQL Server, and you said that these values are always in a consistent format, you can do something as simple as using SUBSTRING to get each part of the value and assign them to T-SQL variables, where you can then use them to do whatever you want, like using them in the predicate of a query.
Assuming that what you said is true about the format always being #-### (exactly 1 digit, a dash, and 3 digits) this is fairly easy.
WITH EquipmentSettings AS (
SELECT
S.*,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 5, 1) EquipmentID,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 3, 3) Settings
FROM
SourceTable S
INNER JOIN master.dbo.spt_values V
ON V.Value BETWEEN 1 AND Len(S.AwfulMultivalue) / 6
WHERE
V.type = 'P'
)
SELECT
E.Whatever,
D.Whatever
FROM
EquipmentSettings E
INNER JOIN DestinationTable D
ON E.EquipmentID = D.EquipmentID
AND E.Settings = D.Settings
In SQL Server 2005+ this query will support 1365 values in the string.
If the length of the digits can vary, then it's a little harder. Let me know.
Incase if the sets does not increase by more than 4 then you can use Parsename to retrieve the result
Declare #Num varchar(20)
Set #Num='1-117 3-134 3-133'
select parsename(replace (#Num,' ','.'),3)
Result :- 1-117
Now again use parsename on the same resultset
Select parsename(replace(parsename(replace (#Num,' ','.'),3),'-','.'),1)
Result :- 117
If the there are more than 4 values then use split functions