Is there any best practice on when to use Datamapper vs Groovy transformers etc when performing transformations or mappings in Mule?
For example I need to transform xml to json etc. I can do this nicely with Groovy using the XML builders and Json builders and its open source etc. It requires me to write some code though.
Where as Datamapper is EE only and seems a lot more opaque being a visual drag and drop.
Are there any downsides to not using DataMapper?
As you said data mapper is a graphical building tool. As pros:
Easier to maintain
User don't need any special programing skill
Support for data sense in studio
But as you said there is nothing that you can not do either with groovy or a java component.
Yes, Datamapper are for enterprise edition, but there are following advantages :-
1. Extraction and loading of flat and structured data formats
2. Filtering, extraction and transformation of input data using XPath and powerful scripting
3. Augmenting data with input parameters and lookups from other data sources
Live design-time previews of transformation results
4. It definitely has high-performance, scalable data mapping operations
Full reference :- https://developer.mulesoft.com/docs/display/current/Datamapper+User+Guide+and+Reference
Only issue I see in Datamapper is you need to maintain the mapping files.
But for the community edition user, they need to find other options in transforming and mapping .
In that case as you said, they might use their custom Java classes, Groovy component, expression component etc.
Related
I have some doubts. I'm doing a BI for my company, and I needed to develop a data converter in ETL because the database to which it's connected (PostgreSQL) is bringing me some negative values within the time CSV. It doesn't make much sense to bring from the database (to which we don't have many accesses) negative data like this:
The solution I found so that I don't need to rely exclusively on dealing directly with the database would be to perform a conversion within the cloudconnect. I verified in my researches that the one that most contemplates would be the normalizer, but there are not many explanations available. Could you give me a hand? because I couldn't parameterize how I could convert this data from 00:00:-50 to 00:00:50 with the normalizer.
It might help you to review our CC documentation: https://help.gooddata.com/cloudconnect/manual/normalizer.html
However, I am not sure if normalizer would be able to process timestamps.
Normalizer is basically a generic transform component with a normalization template. You might as well use reformat component that is more universal.
However, what you are trying to do would require some very custom transform script, written in CTL (CloudConnect transformation language) or java.
You can find some templates and examples in the documentation: https://help.gooddata.com/cloudconnect/manual/ctl-templates-for-transformers.html
I am interested in importing a Fixed Width file using Pentaho PDI.
I have used its main GUI tool that sets the widths graphically in Spoon.
BUT if the number of fields is very large, like a few hundred fields, it would be prone to error and take a lot of time.
In other ETL tools, I am able to import a meta-file that describes the column properties, such as name, size etc.
I see that pentaho has this thing calls Meta-Data Injection, but there is not much tutorials at all, just a couple, and either the use cases are really complex and make use of Javascript for scripting or they describe it in very abstract ways.
So hope someone who is familiar can explain my particular use case of Fixed Width files.
Yes, You can use metadata injection step for applying dynamic properties like filename, fields, length datatype etc..
For that,
You need to create one transformation with file input step.
Create another transformation with the metadata injection step and where
you can add a transformation that created in step1.
In inject Metadata tab of metadata injection step you can add length from the input step.
Background:
We are using cloud data flow runner in Beam 2.0 to ETL our data to our warehouse in BigQuery. We would like to use the BigQuery Client Libraries (Beta) to create the schema of our data warehouse prior to the beam pipelines populating them with data. (Reasons: full control over table definitions,e.g. partitioning, ease of creating DW instances, i.e. datasets,separation of ETL logic from DW design, and code modularisation)
Problem:
The BigQury IO in Beam uses TableFieldSchema and TableSchema Classes under com.google.api.services.bigquery.model for representing BigQuery fields and schemas, while the BigQuery Client Libraries uses TableDefinitionunder com.google.cloud.bigquerypackage for the same stuff, so the field and schema definitions can not be defined in one place and re-used at another place.
Is there a way to define the schema at one place and re-use it?
Thanks,
Soby
p.s. we are using the Java SDK in Beam
A similar question was asked here.
I wrote some utils and published them on GitHub that might be of interest to you.
The ParseToProtoBuffer.py downloads the schema from BigQuery and parses it into a Protobuf schema (you might want to look into Protobuffers to boost your pipelines performance as well). If you compile this into a Java class, use it in your project you can use the makeTableSchema function in ProtobufUtils.java to get the TableSchema for that class. You might want to use makeTableRow as well if you decide to develop your pipeline with Protobuffers.
The code I pushed there is WIP and not being used in production or anything yet, but I hope it gives you a push in the right direction.
I'm trying to build an interface for my tool to query from Semantic/Relational DB using C#.NET
I now need to have a layer above the query layer to convert NL input to SQL/SPARQL, I read through papers of NLIs, The process of making such a layer is such a load for my project besides, it's not the main target, it's an add-on.
I don't care if the dll supports Guided input only or freely input text and handles unmatchings, I just need a dll to start from and add some code on it.
The fact of whether it should support both SQL and SPARQL doesn't really matter, because I can manage to convert one to another in my project's domain (something local)
any idea on available dlls ?
You could try my Natural Language Engine for .NET. Sample project on Bitbucket and Nuget packages available.
Using TokenPhrase in your rules can match any unmatched strings in the input, or quoted strings.
In the next revision that I'll be releasing soon it also supports 'production rules' and operator precedence which make it even easier to define your grammar.
Uniquely it delivers strongly-typed .NET objects and executes your rules in a manner similar to ASP.NET MVC with controllers, dependency injection and action methods. All rules are defined in code simply by writing a method that accepts the tokens you want to match. It includes tokens for common things like numbers, distances, times, weights and temporal expressions including finite and infinite temporal expressions.
I use it in various applications to build SQL queries so it shouldn't be too hard to use it to create SPARQL queries.
Check out Kueri.me
It's not a DLL but rather a server exposing an API, so Currently it doesn't have a wrapper specifically for C#. There's an API exposed via XmlRpc that you can integrate with any language.
It converts English to SQL and gives google-style suggestions If you want to implement a search-box(supports several DB providers - like MySQL, MSSQL etc).
At work, we design solutions for rather big entities in the financial services area, and we prefer to have our deployment mappings in XML, since it's easy to change without having to recompile.
We would like to do our development using annotations and generate from them the orm.xml mapping files. I found this proof of concept annotation processor, and something like that is what I'm looking for, but something that has support for most JPA annotations.
We're using WebSphere for development so we would prefer something that considers the OpenJPA implementation
Here is a possible approach:
use the annotated classes to generate the database schema
use OpenJPA's SchemaTool to reverse engineer the database schema into their XML schema file
use OpenJPA's ReverseMappingTool to generate XML mapping files from the XML schema file