Counting items in Jitterbit JSON transformation - conditional-statements

I am trying to create a Jitterbit condition which depending on the json passed into it will call one of two operations if true or false.
The previous transformation filters the JSON using a transformation condition so if a certain piece of data exists or not in the JSON it will return either one or no items in the JSON.
How do I create a condition after this which will branch depending on whether one or no items are passed to this? I was expecting something like :
json$item.Count 'GREATER THAN' 0
to work?
Thanks
Martin

Solved using scripting in Jitterbit (Javascript rather than the Jitterbit scripting)

Related

pass SQL Query values to Data Factory variable as array for foreachloop

similar to this question how to pass variables to Azure Data Factory REST url's query stirng
However, I have a pipeline to query against graphapi, where I need to pass in a userid as part of the Url to get their manager to build an ActiveDirectory staff hierarchy, this is fine on an individual basis, or even as a predefined array variable where I insert["xx","xxx"] into the pipeline variable etc. My challenge is that I need to pass the results of a SQL query to be the array variable. So, instead of defining the list of users, I need to pass into the foreach loop the results from a SQL query.
I can use a lookup to a set variable, but the url seems to be misconstructed and has extra characters added in for some reason.
returning graph.microsoft.com/v1.0/users/%7B%7B%22id%22:%22xx9e7878-bwbbb-bwbwbwr-7897-414a8e60c78c%22%7D%7D/?$expand=xxxxxx where the "%7B%7B%22id%22:%" and "%22%7D%7D/" is all unnecessary and appears to come from the json rather than just utilising the value.
The lookup runs the query from SQL
The Set variable uses the lookup value's (below) to assign to a pipeline variable as an array.
then the foreachloop uses the variable value in the source
#concat('users/{',item(),'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
If anyone can suggest how to construct the array value dynamically that would be great.
I have used
SELECT '["'+STRING_AGG(CONVERT(NVARCHAR(MAX),t.[id]),'","')+'"]' AS id FROM
stage.extract_msgraphapi_users t LEFT JOIN stage.extract_msgraphapi_users s ON s.id = t.id
and this returns something that looks like an array ["xx","xxx"] but data factory still interpreted this as a string and not an array. Any help would be appreciated.
10 minutes later:
#concat('users/{',item().id,'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
note the reference to item().id to use the id level of the array. Works like a dream for anyone else facing the same issue

Dynamically execute a transformation against a column at runtime

I have a Pentaho Kettle job that can load data from x number of tables, and put it into target tables with a different schema.
Assume I have table 1, like so:
I want to load this table into a destination table that looks like this:
The columns have been renamed, the order has been changed, and the data has been transformed. The rename, and order is easily managed by using the Select Values step, which can be used within an ETL Metadata Injection step, making it dependent on some configuration values loaded at runtime.
But if I need to perform some transformation logic on some of the columns, based on where they go in the target table, this seems to be less straightforward.
In my example, I want the column "CountryName" to be capitalised, and the column "Rating" to be floored (as in changing the real number to the previous integer value).
While I could do this by just manually adding a transformation to accomplish each, I want my solution to be dynamic, so it could just as easily run the "CountryName" column through a checksum component, or perform a ceiling on "Rating" instead.
I can easily wrap these transformations in another transformation so that they can be parameterised and executed when needed:
But, where I'm having trouble is, when I process a row of data, I need a way to be able to say:
Column "CountryName" should be passed through the Capitalisation transform
Column "Rating" should be passed through the Floor transform
Column(s) "AnythingElse" should be passed through the SomeOther transform
Is there a way to dynamically split out the columns in a row, and execute a different transform on each one, based on some configuration metadata that can be supplied?
Logically, it would be something like this, although I suspect there may be a way to handle it as a loop or some form of dynamic transformation, rather than mapping out a path per column:
Kettle is so flexible that it seems like there must be a way to do this, I'm just struggling to know which components to use and how to do it. Any experts out there have some suggestions?
I'm dealing with some biggish data sets here (hundreds of millions of rows) so reluctant to use Row Normaliser/Denormaliser or writing to file/DB if possible.
Have you considered the Modified Java Script Value step? Start with the Data Grid step, the a Select Values step, then the Modified Java Script Value step. In that step you will transform the value of each column in what you form you want and output that in a file.
That of course requires some Java script knowledge but given your example it seems that the required knowledge is pretty basic.

How do you write a test for dynamic API content?

I am working on a wrapper for an API, and one of the endpoints returns data that doesn't have the same results each time.
What is a good strategy to test that the endpoint is still valid?
This is a general question, although I am mostly interested in getting this to work in Python.
You need to define what you actually expect from the result. What are the statements that always hold for the result?
Popular candidates/examples are
it is valid JSON/HTML/XML
it contains certain substrings
it has certain "fields"
certain fields can be parsed as a date using a specific format, and the resulting date is within +/-1h of now.

Django: how to filter for rows whose fields are contained in passed value?

MyModel.objects.filter(field__icontains=value) returns all the rows whose field contains value. How to do the opposite? Namely, construct a queryset that returns all the rows whose field is contained in value?
Preferably without using custom SQL (ie only using the ORM) or without using backend-dependent SQL.
field__icontains and similar are coded right into the ORM. The other version simple doesn't exist.
You could use the where param described under the reference for QuerySet.
In this case, you would use something like:
MyModel.objects.extra(where=["%s LIKE CONCAT('%%',field,'%%')"], params=[value])
Of course, do keep in mind that there is no standard method of concatenation across DMBS. So as far as I know, there is no way to satisfy your requirement of avoiding backend-dependent SQL.
If you're okay with working with a list of dictionaries rather than a queryset, you could always do this instead:
qs = MyModel.objects.all().values()
matches = [r for r in qs if value in r[field]]
although this is of course not ideal for huge data sets.

Converting SQL Result Sets to XML

I am looking for a tool that can serialize and/or transform SQL Result Sets into XML. Getting dumbed down XML generation from SQL result sets is simple and trivial, but that's not what I need.
The solution has to be database neutral, and accepts only regular SQL query results (no db xml support used). A particular challenge of this tool is to provide nested XML matching any schema from row based results. Intermediate steps are too slow and wasteful - this needs to happen in one single step; no RS->object->XML, preferably no RS->XML->XSLT->XML. It must support streaming due to large result sets, big XML.
Anything out there for this?
With SQL Server you really should consider using the FOR XML construct in the query.
If you're using .Net, just use a DataAdapter to fill a dataset. Once it's in a dataset, just use its .WriteXML() method. That breaks your DB->object->XML rule, but it's really how things are done. You might be able to work something out with a datareader, but I doubt it.
Not that I know of. I would just roll my own. It's not that hard to do, maybe something like this:
#!/usr/bin/env jruby
import java.sql.DriverManager
# TODO some magic to load the driver
conn = DriverManager.getConnection(ARGV[0], ARGV[1], ARGV[2])
res = conn.executeQuery ARGV[3]
puts "<result>"
meta = res.meta_data
while res.next
puts "<row>"
for n in 1..meta.column_count
column = meta.getColumnName n
puts "<#{column}>#{res.getString(n)}</#{column}"
end
puts "</row>"
end
puts "</result>"
Disclaimer: I just made all of that up, I'm not even bothering to pretend that it works. :-)
In .NET you can fill a dataset from any source and then it can write that out to disk for you as XML with or without the schema. I can't say what performance for large sets would be like. Simple :)
Another option, depending on how many schemas you need to output, and/or how dynamic this solution is supposed to be, would be to actually write the XML directly from the SQL statement, as in the following simple example...
SELECT
'<Record>' ||
'<name>' || name || '</name>' ||
'<address>' || address || '</address>' ||
'</Record>'
FROM
contacts
You would have to prepend and append the document element, but I think this example is easy enough to understand.
dbunit (www.dbunit.org) does go from sql to xml and vice versa; you might be able to modify it more for your needs.
Technically, converting a result set to an XML file is straight forward and doesn't need any tool unless you have a requirement to convert the data structure to fit specific export schema. In general the result set gets the top-level element of an XML file, then you produce a number of record elements containing attributes, which effectively are the fields of a record.
When it comes to Java, for example, you just need appropriate JDBC driver for interfacing with DBMS of your choice addressing the database independency requirement (usually provided by a DBMS vendor), and a few lines of code to read a result set and print out an XML string per record, per field. Not a difficult task for an average Java developer in my opinion.
Anyway, the more concrete purpose you state the more concrete answer you get.
In Java, you may just fill an object with the xml data (like an entity bean) and then use XMLEncoder to get it to xml. From there you may use XSLT for further conversion or XMLDecoder to bring it back to an object.
Greetz, GHad
PS: See http://ghads.wordpress.com/2008/09/16/java-to-xml-to-java/ for an example for the Object to XML part... From DB to Object multiple more way are possible: JDBC, Groovy DataSets or GORM. Apache Common Beans may help to fill up JavaBeans via Reflection-like methods.
I created a solution to this problem by using the equivalent of a mail merge using the resultset as the source, and a template through which it was merged to produce the desired XML.
The template was standard XML, with a Header element, a Footer element and a Body element. Using a CDATA block in the Body element allowed me to include a complete XML structure that acted as the template for each row. In order to include a fields from the resultset in the template, I used markers that looked like this <[FieldName]>. The template was then pre-parsed to isolate the markers such that in operation, the template requests each of the fields from the resultset as the Body is being produced.
The Header and Footer elements are output only once at the beginning and end of the output set. The body could be any XML or text structure desired. In your case, it sounds like you might have several templates, one for each of your desired schemas.
All of the above was encapsulated in a Template class, such that after loading the Template, I merely called merge() on the template passing the resultset in as a parameter.