I am currently trying to convert this query from Access into proper SQL.
The Left and Mid functions in the statement have me kind of baffled.
SELECT
name,
entnum
IIf(Left(Mid([entnum],4,3),1)=0,Mid([entnum],5,2),Mid([entnum],4,3)) as AGENCYCODE
FROM CUSTFILE
the entnum field's type is varchar 15
Any help with trying to understand this would be greatly appreciated.
You can use SUBSTRING for MID and LEFT. Conditional IIF statements exists in some dialects of SQL, but you might be safer with a CASE statement.
Looking at your statement, I think it can be reduced to the following:
SELECT
name,
entnum,
CASE
WHEN SUBSTRING(entnum,4,1) = '0' THEN SUBSTRING(entnum,5,2)
ELSE SUBSTRING(entnum,4,3)
END agencycode
FROM CUSTFILE
Try this instead:
SELECT
name,
entnum,
CASE
WHEN LEFT(SUBSTRING([entnum], 4, 3), 1) = '0'
THEN SUBSTRING([entnum], 5, 2)
ELSE SUBSTRING([entnum], 4, 3)
END as AGENCYCODE
FROM CUSTFILE
SUBSTRING is used exactly as MID. The CASE statement allows you to specify multiple WHEN..THEN conditions as well as an ELSE.
Related
I want to group a value which is astring of decimals into named groups.
Example :
CASE WHEN CAST(X as NUMERIC)<1000 THEN "Under1000" ELSE "Over1000" END
As I've got some values missing, I would rather use safe_cast instead of cast and want a specific group for missing values.
I could go for :
CASE WHEN SAFE_CAST(X as NUMERIC) = NULL THEN "MissingData" WHEN SAFE_CAST(X as NUMERIC)<1000 THEN "Under1000" ELSE "Over1000" END
But what annoys me here is that I'm reapeating the safe_cast operation.
Is there a way to avoid that ?
I've been reading following example :
CASE operation(X) WHEN result1 THEN "result1" WHEN result2 THEN "result2" ELSE "other_result" END
But that kind of syntax seems to work only for equality operator in the when statements (ie operation(X) = result1 or operation(X) = result2 etc.).
And here I use inferior (or superior)... So I don't know how to manage that.
I guess there must be a way to avoid that operation repetition but can't figure out how.
Thanks for your help.
This should help you to avoid writing SAFE_CAST() several times:
WITH your_data AS (
SELECT "Bob" as name, "150.19" as weight UNION ALL
SELECT "Tom", "2000.90" UNION ALL
SELECT "Jerry", Null)
, transform as (
SELECT name, CAST(weight as NUMERIC) as weight
FROM your_data
)
SELECT
name,
CASE
WHEN weight IS NULL
THEN "MissingData"
WHEN weight<1000
THEN "Under1000"
ELSE "Over1000"
END as weight_agg
FROM transform
Results:
Not sure there is a syntax-based answer that would help, but one potential way to achieve "cleaner" or non-repetitive code is break up your query into logical chunks using CTEs.
with data as (), -- raw data
casted as (), -- do your safe_cast here
transformed as () - do your case statement here
select * from transformed
It does make for "longer" code, but it also allows for cleaner logic in the transformation stage (your stated goal).
I'm using RegEx in a View in Oracle 11g and I need to display certain codes that have an 'S' in the 8th position.
Using https://regexr.com/2v41h,
I was able to display these results with
REGEXP_SUBSTR(code, '\S{8}')
Y38.9X2S
Y38.9X2D
Y38.9X2A
Y38.9X1S
My issue is that I need to return only the values that have an 'S' in the last position which is the 8th position counting the decimal. What expression should I use?
Example:
Y38.9X2S
Y38.9X1S
I have tried:
REGEXP_SUBSTR(code, '\b[S]*[8]\b') AS CODE
Thank you in advance for your help.
I am thinking:
select substr(code, -8)
from t
where code like '%_______S'
If the code is always long enough, just use '%S'.
Or, as a case expression:
select (case when code like '%_______S' then substr(code, -8) end)
You will need regular expressions if code has other characters but they may not be necessary.
SAS CODE:
data table1;
set table2;
_sep1 = findc(policynum,'/&,');
_count1 = countc(policynum,'/&,');
_sep2 = findc(policynum,'-');
_count2 = countc(policynum,'-');
_sep3 = findc(policynum,'_*');
_count3 = countc(policynum,'_*');
How can I convert this into a select statement like below:
select
*,
/*Code converted to SQL from above*/
from table2
For example I tried the below code:
select
*,
charindex('/&,',policynum) as _sep1,
LEN(policynum) - LEN(REPLACE(policynum,'/&,','')) as _count1
from table2
But I got a ERROR 42S02: Function 'CHARINDEX(UNKNOWN, VARCHAR)' does not exist. Unable to identify a function that satisfies that given argument types. You may need to add explicit typecasts.
Please note that the variable pol_no is: 'character varying(50) not null'.
I am running this on using Aginity Workbench for Netezza. I believe this is IBM.
Assuming Oracle based on CHARINDEX() this may work:
You need to apply it twice, once for each character and take the minimum to find the first occurrence.
There may be a better suited function within Oracle, but I don't know enough to suggest one.
select
*,
min(charindex('/',policynum), charindex('&', policynum)) as _sep1
from table2
EDIT: based on OP notes.
Netezza seems like IBM which means use the INSTR function, not CHARINDEX.
select
*,
min(instr(policynum, '/'), instr(policynum, '&')) as _sep1
from table2
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_2336.htm
FINDC & COUNTC functions are basically used for searching a character & counting them.
You can use LIKE operator from SQL to find characters with '%' and '_' wildcards
e.g. -
SELECT * FROM <table_name> WHERE <column_name> LIKE '%-%';
and
SELECT COUNT(*) FROM <table_name> WHERE <column_name> LIKE '%-%';
You can use regular expressions in the LIKE operator as well
I am trying to write a query in Hive with a Case statement in which the condition depends on one of the values in the current row (whether or not it is equal to its predecessor). I want to evaluate it on the fly, this way, therefore requiring a nested query, not by making it another column first and comparing 2 columns. (I was able to do the latter, but that's really second-best). Does anyone know how to make this work?
Thanks.
My query:
SELECT * ,
CASE
WHEN
(SELECT lag(field_with_duplicates,1) over (order by field_with_duplicates) FROM my_table b
WHERE b.id=a.id) = a.field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM my_table a
Error:
java.sql.SQLException: org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'lag' '(' in expression specification; line 4 pos 9
Notes:
The reason I needed the complicated 'lag' function is that the unique Id's in the table are not consecutive, but I don't think that's where it's at: I tested by substituting another simpler inner query and got the same error message.
Speaking of 'duplicates', I did search on this issue before posting, but the only SELECT's inside CASE's I found were in the THEN statement, and if that works the same, it suggests mine should work too.
You do not need the subquery inside CASE:
SELECT a.* ,
CASE
WHEN prev_field_with_duplicates = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
FROM (select a.*,
lag(field_with_duplicates,1) over (order by field_with_duplicates) as prev_field_with_duplicates
from my_table a
)a
or even you can use lag() inside CASE instead without subquery at all (I'm not sure if it will work in all Hive versions ):
CASE
WHEN lag(field_with_duplicates,1) over (order by field_with_duplicates) = field_with_duplicates
THEN “Duplicate”
ELSE “”
END as Duplicate_Indicator
Thanks to #MatBailie for the answer in his comment. Don't I feel silly...
Resolved
I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE