Loading data regulary from ServiceNow to Pentaho Kettle - pentaho

I'm working on a BI project and I want to retrieve data from ServiceNow and load it to Pentaho Data Integration so I can record it in my data warehouse, and I want to do this regulary, in other words I want to retrieve the new records regulary from servicenow , only the new ones that haven't been loaded yet to the data warehouse, someone knows how can I acheive my goal? Help me please

The question is too vague.
You need to set up an ETL job that incrementally loads data. That will require you to define a timestamp or incremental key to identify which records are more recent than the ones already loaded.
You will need to schedule that job, e.g., using crontab and calling kitchen from the command line.
Your question pretty much translates to "please develop my ETL project". Too wide in scope.

Related

How to force report manually in BigQuery to run?

I have wondering if there is an option to run scheduled report in BigQuery manually. I've got report in Google Data Studio that source is BigQuery table which is scheduled from BigQuery view every hour. But sometimes when I am working on query and would like to try if the changes that I have made are correct, but I have to wait that 1 hour to check it out. I read that backfill can do it but if I set start date and end date as today I can't go further. How can I solve this problem?
If you want realtime reports, just create a View with your query and create a Report in Data Studio that consumes this View.
Another approach would be to put the custom query directly on Data Studio. This way you can change the query in Data Studio and it will reprocess your data everytime you refresh the report.
Obviously, this is not the most cost-effective or efficient solution, but it is a good workaround if you just want to test something while developing.
For a production scenario (with lots of concurrent users), if you're able to pre-process your data as you already do, your reports will be faster and they'll probably consume less Big Query resources.

Best way to replicate MongoDB NoSQL into SQL tables

How can i replicate (incremental load) MongoDB (NoSQL) to SQL tables.
We have a web-based solution that loading data into MongoDB. The data size is almost 1TB. We need to do BI Reporting in the Looker BI tool. but looker doesn't support MongoDB directly. So we have to replicate our data into SQL form we have redshift for the target database.
Main requirements for parsing NoSQL to SQL:
Parent Node should be the main table
Nested node/arrays should be a separate table with parent key (foreign key)
Whenever a new column is introduced in MongoDB source it should automatically start replicating that new field from any document to the target database.
Incremental refresh from source to target.
I've seen Stitch Data ETL which fits my requirement but I'm looking for OpenSource any ETL/DB tool or library.
Please help.
Posting answers to help out others with the same requirements.
I'm not able to get any open source ETL tool who can full fill the above 4 requirements.
Trying to writing python code to do so. But a paid tool named Precog helped me to fulfill all the above requirements, and a little bit cheaper than Stitch Data ETL.
Thanks

Is There A Way To Append Deleted and Updated Data To History Table

Alright, so I am working on a project at work and I need to append data to a new history table every time the data in our other table is updated or deleted. However, we are getting access to our sql tables from another company and they only gave us read-only privileges and we can only view them through Microsoft Power BI and Excel.
So I wanted to see if there was any way of creating a trigger of some sorts.
Thank You
From your question, you are trying to do an incremental load of data, to be able to append new data to a table. Also you are looking to have some sort of archive process to a history table, via a trigger. Incremental loads are a Power BI Premium feature only. However for the way you want to move the data based on a trigger, this is not supported in Power BI.
I would recommend trying to get better access to the SQL, or use Excel to get the data, dump it into Excel/CSV files, then create a process to load the new file(s) and figure out the changes, using some other database/etl process, then output to a file/table the results that PBI can read from.
Hope that helps

Excel sheet to SQL table upload automation

I am trying to find the easiest and simplest and quickest way to upload a sheet from Excel to a table in SQL Server 2012 automatically every morning as a job from a location on my folder to the table.
SSIS is the ETL tool you could use, but if it’s a very simple job you can just write a BCP command.
https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017
The way the schedule it is to add the task to the agent job on the server. A few things to bear in mind with ETL:
Will your file be named the same each day?
Do you need to retain archived versions of the file?
How do you do error handling if it’s absent or malformed?
Does the DDL need to change periodically to accomodate new date ranges (I.e. new day/month year)
Will this pattern be reused in the future?
Do you need to test logically (duplicates/logical fallacies/referential integrity etc)?
Under whose account will the job run (hint, don’t use your own - get a service account)?
The more complex the answers are to these types of questions the more likely you’ll need a real ETL tool like SSIS

The Pentaho BI Platform Workflow Issue

I have been working with Pentaho for the last few days. I have been able to setup the Pentaho Report Designer to generate a sample report by follow their documentation. Then I follow this article http://www.robertomarchetto.com/www/how_to_use_pentaho_report_designer_tutorial and managed to export the report to Pentaho BI server.
All I don't understand is Pentaho workflow. What should be the process I should follow which means what's the purpose of exporting the export to Pentaho BI server? Why there is a Data Integration tool? Why there is a BI sever when I can export the report from the Designer tool?
Requirement
All I want to do is retrieve the data from the MYSQL DB. Put them into a data-mart. Then from the data-mart generate a report.(According to what I have read, creating a data mart is the efficient way).
How can I get it done?
Pentaho Data Integration can be used to make this report generation automated.
In report designer you will be passing a parameter or set of parameters to generate a single report output.
With Data integration you can generate the reports for different set of parameters. for eg: if reports are generated on daily basis, we can make it automated for the whole month, so that there is no need of generating reports daily and manually.
And using the Pentaho Business Intelligence server we can make all these operations scheduled.
To generate Data/Table(Fact tables/dimension table) in MYSQL DB From difference source like files/different DB - Data Integration tool comes in to picture .
To create Schema on top of Fact tables - Mondrian tool
To handle user/roles on top of created cubes -Meta data editor
To create simple reports on top of small tables - Report Designer
For sequential Execution (at a go) usage of DI jobs/transformation , Reports, Java script - Design Studio
thanks to user surya.thanuri # forums.pentaho.com
The Data Integration tool is mostly for ETL, it's a separate tool and you can ignore it unless you are doing complex analysis of data from multiple dissimilar data sources. You don't need to 'export' reports to the pentaho server, you can write them directly to a directory then refresh the repository from inside the Pentaho web application. Exporting them is just one workflow technique.
You're going to find that there are about a dozen ways to do any one thing with Pentaho. For instance I use the CDA datasources with my reports vice placing the sql code inside my report. Alternatively you can link up to a Data Integration server to execute the Data Integration scripts to view a result set.
Just to answer your datamart question. In general a datamart should probably be supported by either the Data Integration tool (depending on your situation I don't exactly recommend this) or database functions/replication streams (recommended).
Just to hazard a guess, it sounds like someone tossed you a project saying: We need a BI system, here's the database where the data is stored, here are the reports we're already getting. X looked at Pentaho and liked it. You should use that.
First thing you need to do is understand the shape of the data, volume, tables, interrelations. Figure out what the real questions they want to answer are. Determine whether they need real time reporting, etc..etc. Just getting the datamart together itself, if you even need one, can take quite awhile. I think you may have jumped the gun on Pentaho itself.
thanks to user flamierd # forums.pentaho.com