How would I discover all field mappings in an Informatica repository? - repository

Within an Informatica repository, how I might I generate a report to show, from a high level, how every field in every target was derived? For example I'd like to show:
Whether a field was "simply" passed through from a source
Whether a field was computed and/or transformed and/or aggregated, etc., based on one or more source fields and perhaps external factors
I know this is a vague question, but this is a question that I myself have been asked to answer.
Might a query of Informatica's tables yield this kind of information? I'm new both to Informatica and to this specific repository.
Thanks in advance!

Informatica provides repository database views for you to retrieve information from the repository. Please see the Informatica PowerCenter Repository Guide for more detailed information about the views. It may be possible to get that info what you are looking for from the views, but I think is not easy to get this kind of info. Repository views gives you e.g. general information about the source and target tables.

It will be difficult to achieve using the Repository. Look for Data Lineage available in PowerCenter Metadata Manager. If you don't have this available and would need to analyze Source to Target dependencies for a single workflow, you can try a tool I've created here: xmlAnalyzer It's a simple, online tool to chceck all mappings within an XML file and list all Source-Target dependencies. You can find out for example how many Source fields have impact on a given Target field. It will not show however what are the transformations along the way.

You will find following tool from informatica marketplace useful for this requriement.
https://community.informatica.com/solutions/xml_reporter_for_informatica

Related

Does Semantic tools like Anzo create a copy of data?

I'm new to semantic technologies. I understand what RDF, OWL and Ontologies and other basic terminologies are and how semantic search uses them. When we create a semantic search module using anzo with enterprise search capabilities. It connects with various data sources and creates relationship between them. Now I'm interested in knowing what a semantic tool like anzo does internally.
Does it creates a copy of data on local machine or it hits data sources every time we execute a SPARQL query
If it stores data, is this data stored in its row format or data is stored after cleaning and creating semantic relation between them.
What happens to data after query is executed. How does it get current data every time?
Any thoughts over it would be valuable for me.
Thanks a lot in advance!
Based on your comments, it appears you're using Anzo Graph Query Engine? If so, then the answers to you questions are
A copy of the data is held in memory
Not clear from any of the published information.
It doesn't. You need to load in the data using the 'LOAD' command.
A bit more on 3: You would be responsible for implementing a mechanism to keep the data in here up-to-date with the underlying data source. (which might be as simple as rebuilding the graph from a nightly dump or trying to implement a change data capture against the underlying store which replicated CRUD operations on the graph)
My answers are based on the marketing and support information available on the CambridgeSemantics site.

Dynamically add columns and adapt schema to it

If my source table keeps getting one column added to it at a time, how do I map the new column to my query/source?
It is different from slowly changing dimension, as it is not records that are changing, but the number of columns itself, i.e. the schema.
How do I design a job to do this? Any solution is fine, even if it requires custom functions, scripts, etc.
From my perspective it is not possible. This is a use case of sql injection (i.e. somehow you have to play with the ATL or repository metadata) which SAP I bet would never suggest. I think Pentaho and Talend Integration does indeed support this functionality.
According to me by using template tables it can be possible.
Thanks.

Liquibase load data in a format other than CSV

With the load data option that Liquibase provides, one can specify seed data in a CSV format. Is there a way I can provide say, a JSON or XML file with data that Liquibase would understand?
The use case is we are trying to put in some sample data which is hierarchical. E.g. Category - Subcategory relation which would require putting in parent id for all related categories. If there is a way to avoid including the ids in the seed data via say, JSON.
{
"MainCat1": ["SubCat11", "SubCat12"],
"MainCat2": ["SubCat21", "SubCat22"]
}
Very likely to have this as not supported (couldn't make Google help me) but is there a way to write a plugin or something that does this? Pointer to a guide (if any) would help.
NOTE: This is not about specifying the change log in that format.
This not currently supported and supporting it robustly would be pretty difficult. The main difficultly lies in the fact that Liquibase is designed to be database-platform agnostic, combined with the design goal of being able to generate the SQL required to do an operation without actually doing the operation live.
Inserting data like you want without knowing the keys and just generating SQL that could be run later is going to be very difficult, perhaps even impossible. I would suggest approaching Nathan, who is the main developer for Liquibase, more directly. The best way to do that might be through the JIRA bug database for Liquibase.
If you want to have a crack at implementing it, you could start by looking at the code for the LoadDataChange class (source in Github), which is where the CSV support currently lives.

How to turn MongoDB collection into a 'Table'

I've been given access to a cloud MongoDB (MongoLab) and need to extract some data into Excel so I can analyse it. The data isn't particularly complicated or large and is well suited to a 'normal' relational structure.
My research suggests things are trickier because the data has 'nested' aspects although conceptually its pretty clear how this would become a table. Here is what a document in the collection looks like, essntinaly the stuff highlighted blue would be columns in the table while the yellow would create a row for each "marketing_event" with the specifics of each event also being in a column:
Ideally I would use Power Query to get the data into Power Pivot but at this point anything will do!
I've tried a bunch of things all of which haven't got me much closer to end result that I'm looking for:
I downloaded MongoVue which I used to successfully connect to the database and while it enabled me to see the data in a basic table form, it does nothing with the nested stuff and the documentation is minimal in terms of how it could be of more use.
I also tried Pentaho PDI based on this article:http://sqlmag.com/blog/integrating-mongodb-and-open-source-data-stores-power-pivot but the steps aren't detailed and although I can see the collection, trying to replicate some sample queries I found on the web were totally unsuccesful.
I've tried to get a trial of Simba's ODBC connector but as yet the download doesn't seem to be working. I have contacted them but without response just yet.
I've even installed Mongo locally and tried to use the command prompt to connect which I was unable to do. Even if I pursued this I wouldn't be confident about knowing where to start in terms of creating the end product.
Happy to hear any suggestions or recommendations.
TIA
Jacob
Here's a solid ODBC driver that helps maintain the fidelity of your mongoDB data by exposing the nested MongoDB data model as a set of relational tables to Excel and other ODBC apps. in the sample document above, this driver will do exactly what you're looking for. The embedded documents and arrays can be extracted as separate related tables from the fields at the root level of the document.
https://www.progress.com/odbc/mongodb
I don't know if you already found the solution - but Simba ODBC is providing support for nested arrays.
Have a look here:
https://www.simba.com/resources/webinars/connect-tableau-big-data-source. This is an example how to connect Tableau BI to MongoDB. You might find it helpful.
And some more information on handling no-sql data in BI tools is provided in this whitepaper: http://info.mongodb.com/rs/mongodb/images/MongoDB_BI_Analytics.pdf

How do you maintain a library of useful SQL in a team environment?

At my work everyone has sql snippets that they use to answer questions. Some are specific to a customer, while some are generic for a given database. I want to consolidate those queries into a library/repository that can be accessed by anyone on the team. The requirements would be:
Accessible
Searchable
Tagable (multiple tags allowed per sql)
Exportable (create a document containing all queries with certain tags)
I'm interested in what has been found to work in other team environments.
You could use a wiki.
You could get started with something as simple as Tiddly wiki.
A wiki is a great approach.
For database specific or project specific snippets it's also very useful to have links to where a similar construct occurs in the code. We use trac's wiki which gives nice integration with out SVN for this.
Rather than pasting SQL snippets, I would consider graduating to an ORM (Object-Relational Mapper) or some other library to make representing and manipulating the data easier. It provides a layer of encapsulation to guard against schema changes and a layer of abstraction so you can think of the data in terms of business logic (ie. a user) rather than a collection of tables (ie. a user table, a password table, an access table...).
In Perl this would be something like DBIx::Class.
Another approach you may want to look at is creating views in your database. 'select * from some_view' can hide quite a bit of SQL. You'll still want to use a wiki to document them, but if its a view you don't have to worry about people keeping outdated copies.