converting Informatica transformations to Pyspark - apache-spark-sql

I am trying to convert informatica transformation to pyspark transformation, but I am stuck in replacing char in the code shown below:
"DECODE(TRUE,
ISNULL(v_check_neg_**) OR v_check_neg_** = '',
i_default,
NOT IS_NUMBER(v_check_neg_** ,
i_default,
REPLACECHR(0,v_check_neg_**, '+-0123456789.' ,'')<>'',
i_default,
TO_DECIMAL(v_check_neg_**,5))
v_check_neg_** = IIF(INSTR(i_string_**,'-')!=0,'-'||SUBSTR(i_string_**,1,INSTR(i_string_**,'-')-1),i_string_**)"
This is what I tried:
def is_digit(value):
if value:
return value.isdigit()
else:
return False
is_digit_udf = udf(is_digit, BooleanType())
df_informatica=df_informatica.withColumn(column_name,when((isnull(col(column_name)) |(col(column_name==' ')),i_default).when(is_digit_udf(col(column_name)),i_default)
df_informatica=df_informatica.withColumn
Please help me convert informatica to pyspark transformation.

i cant see whole statement but your decode logic is -
if (v_check_neg_** is null or v_check_neg_='' or v_check_neg_ is not number or v_check_neg_** has anything other than numbers) then i_default
else TO_DECIMAL(v_check_neg_**,5)
Use python to check above cases and you should be good to go.
Like you can use string.isnumeric() to check positive number. And use try except to check -ve, decimal etc.
Example to check negative number-
def check_negative(s):
try:
f = float(s)
if (f < 0):
return True
# Otherwise return false
return False
except ValueError:
return False

Related

Swapping characters in Strings Python

I have been trying to make something like an encoder:
here is my idea
dict = {
1: "!",
2: "#"
}
in = 21 # Input number in
out = ?
print(out) # Returns "#!"
Is there any way I could perform this?
What you want is exactly the translate function of str:
x="12"
y="!#"
in=12
txt=str(in)
mapping = txt.maketrans(x, y)
out=txt.translate(mapping)
You can check the complete reference here.

splitting of email-address in spark-sql

code:
case when length(neutral)>0 then regexp_extract(neutral, '(.*#)', 0) else '' end as neutral
The above query returns the output value with # symbol, for example if the input is 1234#gmail.com, then the output is 1234#. how to remove the # symbol using the above query. And the resulting output should be evaluated for numbers, if it contains any non-numeric characters it should get rejected.
sample input:1234#gmail.com output: 1234
sample input:123adc#gmail.com output: null
You could phrase the regex as ^[^#]+, which would match all characters in the email address up to, but not including, the # character:
REGEXP_EXTRACT(neutral, '^[^#]+', 0) AS neutral
Note that this approach is also clean and frees us from having to use the bulky CASE expression.
Try this code:
val pattern = """([0-9]+)#([a-zA-Z0-9]+.[a-z]+)""".r
val correctEmail = "1234#gmail.com"
val wrongEmail = "1234abcd#gmail.com"
def parseEmail(email: String): Option[String] =
email match {
case pattern(id, domain) => Some(id)
case _ => None
}
println(parseEmail(correctEmail)) // prints Some(1234)
println(parseEmail(wrongEmail)) // prints None
Also, it is more idiomatic to use Options instead of null

Karate : Trying to convert array to string using js method toString() in karate [duplicate]

This question already has an answer here:
Change type from string to float/double for a key value of any json object in an array
(1 answer)
Closed 1 year ago.
I'm trying to convert an Array to string using a simple js function placed in the reusable feature file. I don't see any reason why the array is not getting converted to a string when I try to run the same function on the console it works without any issue.
Can anyone suggest a way to get this issue sorted?
"""
* def formatter = function(str){
var formatstring = str.toString();
return formatstring
}
"""
feature file
* def format = call read('../common/resuable.feature)
* def result = format.formatter(value)
* print result
Input = ["ID3:Jigglypuff(NORMAL)"]
Actual result = ["ID3:Jigglypuff(NORMAL)"]
Expected result = ID3:Jigglypuff(NORMAL)
[![When tried same on console][1]][1]
[1]: https://i.stack.imgur.com/tAcIz.png
Sorry, if you print an array, it will have square-brackets and all, that's just how it is.
Please unpack arrays if you want the plain string / content:
* def input = ["ID3:Jigglypuff(NORMAL)"]
* def expected = input[0]

Strings concatenation in Spark SQL query

I'm experimenting with Spark and Spark SQL and I need to concatenate a value at the beginning of a string field that I retrieve as output from a select (with a join) like the following:
val result = sim.as('s)
.join(
event.as('e),
Inner,
Option("s.codeA".attr === "e.codeA".attr))
.select("1"+"s.codeA".attr, "e.name".attr)
Let's say my tables contain:
sim:
codeA,codeB
0001,abcd
0002,efgh
events:
codeA,name
0001,freddie
0002,mercury
And I would want as output:
10001,freddie
10002,mercury
In SQL or HiveQL I know I have the concat function available, but it seems Spark SQL doesn't support this feature. Can somebody suggest me a workaround for my issue?
Thank you.
Note:
I'm using Language Integrated Queries but I could use just a "standard" Spark SQL query, in case of eventual solution.
The output you add in the end does not seem to be part of your selection, or your SQL logic, if I understand correctly. Why don't you proceed by formatting the output stream as a further step ?
val results = sqlContext.sql("SELECT s.codeA, e.code FROM foobar")
results.map(t => "1" + t(0), t(1)).collect()
It's relatively easy to implement new Expression types directly in your project. Here's what I'm using:
case class Concat(children: Expression*) extends Expression {
override type EvaluatedType = String
override def foldable: Boolean = children.forall(_.foldable)
def nullable: Boolean = children.exists(_.nullable)
def dataType: DataType = StringType
def eval(input: Row = null): EvaluatedType = {
children.map(_.eval(input)).mkString
}
}
val result = sim.as('s)
.join(
event.as('e),
Inner,
Option("s.codeA".attr === "e.codeA".attr))
.select(Concat("1", "s.codeA".attr), "e.name".attr)

Add extra field in Django QuerySet as timedelta type

I have the following model:
class UptimeManager(models.Manager):
def with_length(self):
"""Get querySet of uptimes sorted by length including the current one. """
extra_length = Uptime.objects.extra(select={'length':
"""
SELECT
IF (end is null,
timestampdiff(second,begin,now()),
timestampdiff(second,begin,end))
FROM content_uptime c
WHERE content_uptime.id = c.id
"""
})
return extra_length
class Uptime(models.Model):
begin = models.DateTimeField('beginning')
end = models.DateTimeField('end', null=True) I call
host = models.ForeignKey("Host")
objects = UptimeManager()
...
then I call Uptime.objects.with_length().order_by('-length')[:10] to get list of longest uptimes.
But the length in template is of integer type. How to modify my code as the length of object returned by manager would be accessible in template as timedelta object?
I almost could do it by returning a list and converting number of seconds to timedelta objects, but then I have to do sorting, filtering etc. in my Python code which is rather ineffective in comparison to one well done SQL query.
Add a property to the model that looks at the actual field and converts it to the appropriate type.
My solution is to create a filter that determines type of length var and returns timedelta in case it's some integer type
from django import template
import datetime
register = template.Library()
def timedelta(value):
if isinstance(value, (long,int)):
return datetime.timedelta(seconds=value)
elif isinstance(value, datetime.timedelta):
return value
else: raise UnsupportedOperation
register.filter('timedelta',timedelta)
and use in template it's trivial
{{ uptime.length|timedelta }}