PENTAHO 7.1 - Generating large number of different reports by script - pentaho

On my Pentaho CE 7.1 I often need to generate large number of reports (*.prpt) with different attributes.
For example, I have a report that shows data for a day, and I need to generate those reports for each day since September 2017.
Is there any way how create a script, that would execute those *.prpt files one by one for each day since September 2017 until now?
I have been checking API on official Pentaho documentation, but it does not seem to be such option there. Perhaps some kind of hack, like sending parameters in URL or so?

Create your *.prpt with the Report Designer and use a parameter to select one day in your data.
Then open PDI, with first step to generate a date starting from 2017-09-10, and give this date to a Pentaho Reporting Example step. Then do what you need with the report output (mail, save them in the Pentaho-solutions,...).
You have a use case very similar in the sample directory which is shipped with the Pentaho Data Integrator. It is named Pentaho Reporting Output Example.ktr.

Related

Can I automate an SSIS package that requires user input?

I've been developing a data pipeline in SSIS on an on-premise VM during my internship, and was tasked with gathering data from Marketo (re: https://www.marketo.com/ ). This package runs without error, starting with a Truncate table execute SQL task, followed by 5 data flow tasks that gather data from different sources within Marketo and moves them to staging tables within SQL Server, and concludes with an execute SQL task to load processing tables with only new data.
The problem I'm having: my project lead wants this process to be automated to run daily, and I have noticed tons of resources online that show automation of an SSIS package, but within my package, I have to have user input for the Marketo source. The Marketo source requires a user input of a time frame from which to gather data.
Is it possible to automate this package to run daily even with user input required? I was thinking there may be a way to increment the date value by one for the start and end dates (So start date could be 2018-07-01, and end date could be 2018-07-02, incrementing each day by one), to make the package run by itself. Thank you in advance for any help!
As you are automating your extract, this suggests that you have a predefined schedule on which to pull that data. From this schedule, you should be able to work out your start and end dates based on the date that the package was run.
In SSIS there are numerous ways to achieve this depending on the data source and your connection methods. If you are using a script task, you can simply calculate the dates required using your .Net code. Another alternative would be to use calculated variables that return the result of an expression, such as:
DATEADD("Month", -1, GETDATE())
Assuming you schedule your extract to run on the first day of the month, the expression above would return the first day of the previous month.

Get the Last Modified date for all BigQuery tables in a BigQuery Project

I have several databases within a BigQuery project which are populated by various jobs engines and applications. I would like to maintain a dashboard of all of the Last Modified dates for every table within our project to monitor job failures.
Are there any command line or SQL commands which could provide this list of Last Modified dates?
For a SQL command you could try this one:
#standardSQL
SELECT *, TIMESTAMP_MILLIS(last_modified_time)
FROM `dataset.__TABLES__` where table_id = 'table_id'
I recommend you though to see if you can log these errors at the application level. By doing so you can also understand why something didn't work as expected.
If you are already using GCP you can make use of Stackdriver (it works on AWS as well), we started using it in our projects and I recommend giving it a try (we tested for python applications though, not sure how the tool performs on other clients but it might be quite similar).
I've just queried stacked GA4 data using the following code:
FROM analytics_#########.__TABLES__
where table_id LIKE 'events_2%'
I have kept the 2 on the events to ensure my intraday tables do not pull through also.

Import Column transformation hangs without any indication what is going on

Backstory
I have recently been given the task of maintaining a SSIS process that a former colleague oversaw. I have only a minimal experience with BIDS/SSIS/"What ever MS marketing wants to call it now" and have an issue which I can't seem to find a solution to.
Issue
I have a Data Flow that includes reading images data from a table as well as doing a file read on the images them self's.
For the image read a 'Import Column transformation' (here by called ICt) is being used, and it hangs indefinitely.
The module gets handed 2500 rows of image data (name, path, date created etc) and using the 'path' column the ICt tries to read the file. I've set the correct input column under 'Input and Output Properties' as well as setting the output column. The input column has the output columns ID in its FileDataColumnId.
When running the process it just hangs as yellow and nothing happens. I can access the images in the explorer, and know they exist (at least some).
Tools used
Windows 7
Visual Studio 2008 sp2
SQL-Server 2012
All hints, tips or possible solutions would be appreciated.

Where is MS Reports Log files located and accessed?

I have two MS Reports with each their own dataset.
The first one works but the other does not fill anything in it's table. When I debug the dataset, just before showing the Report, it's fill, and I did the same setup as with the first report.
I get no errors og other input.. The table just not show any rows at all.
Is their any log files that can tell me something and if so, where can I find them? Thanks.
Check 3 things:
See that the dataset binded are the same (that u r filling)
is the report Showing the formatted headers(table and column header) this means that table format is ok.
if you want you can check report .rdlc file (it is xml base file generated for the report)
The report server log file location can be found in the registry:
HKLM
Software
Microsoft
Microsoft SQL Server
{Report Server Instance Name, mine is MSRS10_50.MSSQLSERVER}
CPE
The log file location is the data value associated with the ErrorDumpDir key.
When previewing in report designer any error messages will be displayed in the preview tab. Sounds like you may have a different problem that won't be reported in the logs. Double-check that the query returns data. You may want to use SQL Server Profiler (assuming your database is SQL Server) to debug queries executed against the database.

How to automate the retrival of files based on datestamp

Im new to the pentaho suite and its automation functionality. i have files that come in on a daily basis and two columns need to be put in place. I have figured out how to add the columns but now i am stuck on the automation side of things. The filename is constant but it has a datestamp at the end. EG: LEAVER_REPORT_NEW_20110623.csv. The file will always be in the same directory. How do i go about using Pentaho data integration to solve this issue? ive tried get files but that doesnt seem to work.
create a variable in a previous transform which contains 20110623 (easy with a get system info step to get the date, and then a select values step to format to string, then a set variables step)
then change the filename of the textfile input to use:
LEAVER_REPORT_NEW_${variablename}.csv