Related
Our organization has used Pentaho for data integration purposes for over a decade. Due to organizational changes earlier this year, our team should take responsibility for data integration going forward.
The problem is, our team has nearly zero experience with Pentaho and use Frends for data integration. We want to move everything out of Pentaho to Frends. But alas, Pentaho seems to be a fairly challenging customer.
Problem 1: There's a lot and I mean A LOT of shit in Pentaho. The depth of some jobs is quite staggering, at least by our standards.
Problem 2: There is zero external documentation.
Problem 3: There is almost zero documentation within the jobs.
Problem 4: Pentaho is slow. Switching between open tabs easily takes 10 seconds. Opening a new tab 20 seconds, closing a tab 10 seconds.
Problem 5: A job within a job may be named as ABC, but when you open ABC, it is actually named as DEF. Combine this with Problem 4 and trying to keep up where you are, and where you were is hard.
At first I tried to document relevant jobs into our wiki using a tabular structure. But it quickly became a mess that was nearly impossible to navigate. Considering the outcome, spending any more time to manually document Pentaho seems stupid.
Is there a way to generate (hopefully readable) documentation automatically? I googled for answers and found this page. But considering the zero experience with Pentaho, I do not know if that page describes what I am looking for, and the instructions are written in rather broad strokes I have no idea how to apply in practice.
Thank you in advance.
Re your problem 4: there's no reason for Pentaho to take that long opening or switching tabs. Unless you're running it on particularly slow VMs with little memory, it shouldn't take that long. It's not the quickest gui out there, but it should be fast enough to allow working.
The only documentation tool I know of is the one you mention on your post. But as far as I know it's quite old and I don't know how well it'll work for your version. Give it a try, and see how it goes.
Other than that, if there's no doco available, you have litle recourse other than going through the various jobs and sub-jobs, transformations and sub-transformations and reverse engineer them.
I'm a civil engineer designing a program that allows the user to define number of cross sections of a roadway and then calculate the quantity of the different materials used to build the roadway layers. I need to be able to plot a representation of the cross section that the users has defined. I'm not sure if this would be best accomplished by plotting various series on a chart, or drawing shape objects. Does anyone have any thoughts?
Yeah, not only is Excel pretty good for this, it's also pretty common to use it for this. The Newton Excel Bach blog may be where you want to spend some good time - it's an Excel for engineers site. He's got a great series on drawing with Excel. Here's one that addresses your immediate question: Drawing in Excel 7 – Creating drawings from coordinates
Since I know nothing about your problem domain or your programming skills, I can only give some general thoughts:
Excel is really good for modeling and building certain prototypes. Modeling this problem and building some charts by hand should give you and your users a good idea if the Excel solution is going to fly. If you can't get the graphics you want I would look elsewhere. Perhaps Visual Studio and Visual Basic or C#. These have mature drawing capabilities and also charting controls in recent editions.
Excel VBA has a pretty good programming layer for charts. You can also draw custom objects with VBA. I have not done this but I am sure there are references on the web. In any event, if the manually built Excel prototype looks good, it might be worthwhile to automate it with VBA.
Another factor is how many, and of what skill set will the users be? Fewer users, who know Excel pretty well make a case for using Excel. Supporting a large number of users could become onerous as it is possible to change the code in an individual file.
Finally, how long will this application be around? Versioning Excel applications can be done, but it easier to do this with more sophisticated programming environments. Also if you are going to continue to add features you might run into a wall with VBA's feature set. Hope this helps.
I have been offered by my employer to work on SAP Business Objects to analyse large amount of data they have.
I have the following doubts before I could accept that:
a. I love programming and do not want to lose touch with it. Do you think working on this tool would excite a person who loves building software? Or Is it like most part of the tool configurable through Wizard like interface?
b. Is this tool capable of working on data collected for research and testing purpose?
I tried googling but all I could get is some videos which mentions "Business Intelligence" more than 12 times a minute. Any suggestion or even links to help me make the preliminary analysis would be helpful. Thanks...
Business Objects is not rocket science. A competent developer should be able to figure out how to build a universe in a few days. My first experience took me about two days to figure out how to build a universe and another two days or so to get some analytic reports out of it.
However, 'research data' suggests that the actual structure of the data will vary depending on the nature of the survey so you will probably find yourself constantly making ad-hoc changes or new bespoke universes for each job. Business Objects is probably a reasonably flexible way to do this (a custom universe for a tabular set of research data could probably be set up in a few hours). However, the job would basically devolve to a reporting analyst position.
If you're not a 'tools guy' by nature you will probably find this sort of work unsatisfying. I do full life-cycle work on data warehouse systems and from time to time this involves developing front ends using Business Objects. I'm quite happy to work with it casually as part of a larger job but I wouldn't want a job solely working with just one reporting tool.
If you think of yourself as a programmer I would recommend against accepting the job if it was limited to just working with Business Objects.
I have experience working with Designer and reporting in Business Objects... Honestly, it's quite easy. I have to say I'm a total programmer at heart, and absolutely hate working with it, but that's what possessed me to write a program that uses the DLL's to automate everything. I enjoyed automating it, and ended up making a program that did in about 5 minutes what it previously took me weeks to do. Now all the BO developers use it, and I mostly spend my time updating that.
In summary... It sucks to work with when it's +60% of your job, but you don't have to lose out on Programming. If anything, I think I've improved my programming. Now I barely do the crappy side of the work. I just run my program, and everything works out.
I'm not sure what you are asking in question "B".
Everyone I work with is obsessed with the data-centric approach to enterprise development and hates the idea of using custom collections/objects. What is the best way to convince them otherwise?
Do it by example and tread lightly. Anything stronger will just alienate you from the rest of the team.
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
No single person has all the answers.
If you are working on legacy code (e.g., apps ported from .NET 1.x to 2.0 or 3.5) then it would be a bad idea to depart from datasets. Why change something that already works?
If you are, however, creating a new apps, there a few things that you can cite:
Appeal to experiencing pain in maintaining apps that stick with DataSets
Cite performance benefits for your new approach
Bait them with a good middle-ground. Move to .NET 3.5, and promote LINQ to SQL, for instance: while still sticking to data-driven architecture, is a huge, huge departure to string-indexed data sets, and enforces... voila! Custom collections -- in a manner that is hidden from them.
What is important is that whatever approach you use you remain consistent, and you are completely honest with the pros and cons of your approaches.
If all else fails (e.g., you have a development team that utterly refuses to budge from old practices and is skeptical of learning new things), this is a very, very clear sign that you've outgrown your team it's time to leave your company!
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
Seconded. The whole idea that "enterprise development" is somehow distinct from (and usually the implication is 'more important than') normal development really irks me.
If there really is a benefit for using some technology, then you'll need to come up with a considered list of all the pros and cons that would occur if you switched.
Present this list to your co workers along with explanations and examples for each one.
You have to be realistic when creating this list. You can't just say "Saves us lots of time!!! WIN!!" without addressing the fact that sometimes it is going to take MORE time, will require X months to come up to speed on the new tech, etc. You have to show concrete examples where it will save time, and exactly how.
Likewise you can't just skirt over the cons as if they don't matter, your co-workers will call you on it.
If you don't do these things, or come across as just pushing what you personally like, nobody is going to take you seriously, and you'll just get a reputation for being the guy who's full of enthusiasm and energy but has no idea about anything.
BTW. Look out for this particular con. It will trump everything, unless you have a lot of strong cases for all your other stuff:
Requires 12+ months work porting our existing code. You lose.
Of course, "it depends" on the situation. Sometimes DataSets or DataTables are more suited, like if it really is pretty light business logic, flat hierarchy of entities/records, or featuring some versioning capabilities.
Custom object collections shine when you want to implement a deep hierarchy/graph of objects that cannot be efficiently represented in flat 2D tables. What you can demonstrate is a large graph of objects and getting certain events to propagate down the correct branches without invoking inappropriate objects in other branches. That way it is not necessary to loop or Select through each and every DataTable just to get the child records.
For example, in a project I got involved in two and half years ago, there was a UI module that is supposed to display questions and answer controls in a single WinForms DataGrid (to be more specific, it was Infragistics' UltraGrid). Some more tricky requirements
The answer control for a question can be anything - text box, check box options, radio button options, drop-down lists, or even to pop up a custom dialog box that may pull more data from a web service.
Depending on what the user answered, it can trigger more sub-questions to appear directly under the parent question. If a different answer is given later, it should expose another set of sub-questions (if any) related to that answer.
The original implementation was written entirely in DataSets, DataTables, and arrays. The amount of looping through the hundreds of rows for multiple tables was purely mind-bending. It did not help the programmer came from a C++ background attempting to ref everything (hello, objects living in the heap use reference variables, like pointers!). Nobody, not even the originally programmer, could explain why the code is doing what it does. I came into the scene more than six months after this, and it was stil flooded with bugs. No wonder the 2nd-generation developer I took over from decided to quit.
Two months of tying to fix the chaotic mess, I took it upon myself to redesign the entire module into an object-oriented graph to solve this problem. yeap, complete with abstract classes (to render different answer control on a grid cell depending on question type), delegates and eventing. The end result was a 2D dataGrid binded to a deep hierarchy of questions, naturally sorted according to the parent-child arrangement. When a parent question's answer changed, it would raise an event to the children questions and they would automatically show/hide their rows in the grid according to the parent's answer. Only question objects down that path were affected. The UI responsiveness of this solution compared to the old method was by orders of magnitude.
Ironically, I wanted to post a question that was the exact opposite of this. Most of the programmers I've worked with have gone with the custom data objects/collections approach. It breaks my heart to watch someone with their SQL Server table definition open on one monitor, slowly typing up a matching row-wrapper class in Visual Studio in another monitor (complete with private properties and getters-setters for each column). It's especially painful if they're also prone to creating 60-column tables. I know there are ORM systems that can build these classes automagically, but I've seen the manual approach used much more frequently.
Engineering choices always involve trade-offs between the pros and cons of the available options. The DataSet-centric approach has its advantages (db-table-like in-memory representation of actual db data, classes written by people who know what they're doing, familiar to large pool of developers etc.), as do custom data objects (compile-type checking, users don't need to learn SQL etc.). If everyone else at your company is going the DataSet route, it's at least technically possible that DataSets are the best choice for what they're doing.
Datasets/tables aren't so bad are they?
Best advise I can give is to use it as much as you can in your own code, and hopefully through peer reviews and bugfixes, the other developers will see how code becomes more readable. (make sure to push the point when these occurrences happen).
Ultimately if the code works, then the rest is semantics is my view.
I guess you can trying selling the idea of O/R mapping and mapper tools. The benefit of treating rows as objects is pretty powerful.
I think you should focus on the performance. If you can create an application that shows the performance difference when using DataSets vs Custom Entities. Also, try to show them Domain Driven Design principles and how it fits with entity frameworks.
Don't make it a religion or faith discussion. Those are hard to win (and is not what you want anyway)
Don't frame it the way you just did in your question. The issue is not getting anyone to agree that this way or that way is the general way they should work. You should talk about how each one needs to think in order to make the right choice at any given time. give an example for when to use dataSet, and when not to.
I had developers using dataTables to store data they fetched from the database and then have business logic code using that dataTable... And I showed them how I reduced the time to load a page from taking 7 seconds of 100% CPU (on the web server) to not being able to see the CPU line move at all.. by changing the memory object from dataTable to Hash table.
So take an example or case that you thing is better implemented differently, and win that battle. Don't fight the a high level war...
If Interoperability is/will be a concern down the line, DataSet is definitely not the right direction to go in. You CAN expose DataSets/DataTables over a service but whether you SHOULD or is debatable. If you are talking .NET->.NET you're probably Ok, otherwise you are going to have a very unhappy client developer from the other side of the fence consuming your service
You can't convince them otherwise. Pick a smaller challenge or move to a different organization. If your manager respects you see if you can do a project in the domain-driven style as a sort of technology trial.
If you can profile, just Do it and profile. Datasets are heavier then a simple Collection<T>
DataReaders are faster then using Adapters...
Changing behavior in an objects is much easier than massaging a dataset
Anyway: Just Do It, ask for forgiveness not permission.
Most programmers don't like to stray out of their comfort zones (note that the intersection of the 'most programmers' set and the 'Stack Overflow' set is the probably the empty set). "If it worked before (or even just worked) then keep on doing it". The project I'm currently on required a lot of argument to get the older programmers to use XML/schemas/data sets instead of just CSV files (the previous version of the software used CSV's). It's not perfect, the schemas aren't robust enough at validating the data. But it's a step in the right direction. The code I develop uses OO abstractions on the data sets rather than passing data set objects around. Generally, it's best to teach by example, one small step at a time.
There is already some very good advice here but you'll still have a job to convince your colleagues if all you have to back you up is a few supportive comments on stackoverflow.
And, if they are as sceptical as they sound, you are going to need more ammo.
First, get a copy of Martin Fowler's "Patterns of Enterprise Architecture" which contains a detailed analysis of a variety of data access techniques.
Read it.
Then force them all to read it.
Job done.
data-centric means less code-complexity.
custom objects means potentially hundreds of additional objects to organize, maintain, and generally live with. It's also going to be a bit faster.
I think it's really a code-complexity vs performance question, which can be answered by the needs of your app.
Start small. Is there a utility app you can use to illustrate your point?
For instance, at a place where I worked, the main application had a complicated build process, involving changing config files, installing a service, etc.
So I wrote an app to automate the build process. It had a rudimentary WinForms UI. But since we were moving towards WPF, I changed it to a WPF UI, while keeping the WinForms UI as well, thanks to Model-View-Presenter. For those who weren't familiar with Model-View-Presenter, it was an easily-comprehensible example they could refer to.
Similarly, find something small where you can show them what a non-DataSet app would look like without having to make a major development investment.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have being toying with the idea of creating software “Robots” to help on different areas of the development process, repetitive task, automatable task, etc.
I have quite a few ideas where to begin.
My problem is that I work mostly alone, as a freelancer, and work tends to pill up, and I don’t like to extend or “blow” deadline dates.
I have investigated and use quite a few productivity tools. I have investigated CodeGeneration and I am projecting a tool to generate portions of code. I use codeReuse techniques. Etc.
Any one as toughs about this ? as there any good articles.
I wouldn't like to use code generation, but I have developed many tools to help me do many of the repetitive tasks.
Some of these could do nice things:
Email Robots
These receive emails and do a lot of stuff with them, they need to have some king of authentication to protect you from the bad stuff :
Automatically logs whatever was entered in a database or excel spreadsheet.
Updates something in a database.
Saves all the attachments in a specific shared folder.
Reboot a server.
Productivity
These will do repetitious tasks:
Print out all the invoices for the month.
Automatically merge data from several sources.
Send reminders of GTD items.
Send reminders of late TODO items.
Automated builds
Automated testing
Administration
These automate some repetitive server administration tasks:
Summarize server logs, remove regular items and send the rest by email
Rebuild indexes in a database
Take automatic backups
Meta-programming is a great thing. If you easily get access to the data about the class structure then you can automate a few things. In the high level language I use, I define a class like 'Property' for example. Add an integer for street number, a string for street name and a reference to the owning debtor. I then auto generate a form that has a text box for street number and street name, a lookup box for the debtor reference and the code to save and load is all auto-generated. It knows that street number is an integer so its text box can only accept integers. If I declare a read only property it will also make sure the text box is read only.
There are software robots, but often you really don't see them. For example consider a robot that is used to package stuff. There is a person who monitors the robot in case of a failure. When the robot fails, the person shuts the robot down and fixes things. That person is like a programmer who operates IDE to compile, refactor etc. When errors occur, the programmer fixes the code and runs the compiler again.
Well compiling is not very robot like, but then there are software that compile your project automatically. Now that is more like a kind of a robot. That software robot also checks things in the code like is there enough comments and so on.
Then we have software that generates code according to our input. For example we can create forms in MS Access easily with Wizards. The wizards are not automatically producing new forms form after form after form, because we need every form to be different. But the form generator is a kind of robot-like tool that is operated.
Of course you could input the details of every form first and then run generate, but people like to see soon every form. Also the input mechanism is the form pretty much already, so you get what you create on the fly. Though with data transformation tools you can create descriptions of forms from a list of field names, generate the forms, and call that as using robots.
There are even whole books about automated software production, but the biggest problem is, that the automation of the process lasts longer then the process itself.
Mostly programmers give up on this, since they try to achive everything on one step, from manual programming, to automation.
Common automation in software production is done through IDEs, CodeGenerators and such, until now nearly no logic is automated.
I would appreciate any advance in this topic. Try to automate little tasks from the process, and connect those tasks afterwards. Going step by step.
I'm guessing that, just like just about every software developer on planet Earth, you want to write software that writes software by itself. Unfortunately, it's an idea that only works on paper. I mean, we have things like code generators, DSLs, transformation pipelines, Visual Studio add-ins that statically analyse code and generate derivative code, and so on. But it's nowhere near anything one would call a 'robot'.
Personally, I think more needs to be done in this area. For example, the IDE should be able to infer things and make suggestions based on what I'm actually doing. For example, if I'm adding a property, the IDE infers what attributes other properties in the file has, and how the property itself is structured, and adjusts the property accordingly.
Any sort of AI is hard work and, regrettably, does not have such a great ROI. But it sure if fun.
Scripting away the repetitive tasks - that's what you refer? I guess you're a Windows developer where scripting is not as nearly common as in *nix world. Hence your question.
You might want to have a look at the *nix side of software development arena where the workflow is more or less similar to what you describe (at least more than Windows). Plowing your way via bash, perl, python, etc.. will get you what you want.
ps. Also look at nsr81's post in comments for similar scripting tools on Windows.
Code generation is certainly a viable tool for some tasks. If done poorly it can create maintenance problems, but it doesn't have to be done poorly. See Code Generation Network for a fairly active community, with conference, papers, etc.
Code Generation in Action is one book that comes to mind.
You can try Robot framework
http://robotframework.org/
Robot Framework is a generic automation framework,It has easy-to-use tabular test data syntax and it utilizes the keyword-driven approach.
Even you can used this tools as software bot (RPA).
Robotic Process Automation
First, a little back-story... In 2011, I was the Operations Manager for Contracting Center of Excellence at Bristol-Myers Squibb. We were in the early stages of rolling out a brand new Global Contracting System. This new system was replacing a great deal of manual effort across the globe with the intention of one system to create, store and retrieve Contracting information for all of the organization. No small task to be sure, and one we certainly underestimated the scope and eventual impact of. Like most organizations getting a handle on this contract management process, we found it to be from 4 to 10 times larger than originally expected.
We did a lot of things very right, including the building of a support organization from the ground up, who specialized on this specific application and becoming true subject matter experts to the organization in (7) languages and most time zones.
The application, on the other hand, brought it's own challenges which included missing features, less than stellar performance and a lot of back-end work needing done by the Operations team. This is where the Robotics Process Automation comes into the picture.
Many of the 'features' of this software were simply too complicated for end users to use, but were required to create contracts. The first example was adding a "Contact" to whom the Contract would be made with. The "Third Party", if you will. This is a seemingly simple thing, which took (7) screens of data entry, a cryptic point of access, twenty two minutes and a masters degree to figure out, on your own for each one. We quickly made the business decision to have the Operations team create these 'Contacts' on behalf of our end users. We anticipated the need to be a few thousand a year. We very quickly passed 800 requests per week. With three FTE's working on it, we had a backlog ever growing and a turn-around time of more than two weeks per request. Obviously, this would NOT due in any business environment.
The manual process was so complicated, even my staff had a large number of errors in creating them, even as subject matter experts. The resulting re-work further complicated the issue and added costs. I had some previous Automation experience and products that I worked with, but this need was even more intense and complicated than I had encountered before. I needed something great, fast, easy to implement and that would NOT require IT assistance (as that had it's own pitfalls.) I investigated a number of products, all professing to do similar things. One of course, stood out to me. It seemed to be the most capable, affordable and had good support options. The product I selected was Automation Anywhere at the bargain price of about $4000.00 USD.
I am not here to pitch for Automation Anywhere, or any specific product, for that matter. But, my experiences with this tool, forever changed my expectations and understanding of what Robotic Process Automation really means.
Now, don't get me wrong, I am not here to pitch for Automation Anywhere, or any specific product, for that matter. But, my experiences with this tool, forever changed my expectations and understanding of what Robotic Process Automation really means. (see below, if you are unsure)
After my first week, buying the tool and learning some of the features, I was able to implement a replacement of the manual process of creating a "Contact" in the contracting system from a two week turn around, to a (1) hour turn-around. It took the FTE effort of 22 minutes for each entry, to zero. I was able to run this Automated process from a desktop PC and handle every request, fully automated, including the validation and confirmation steps into other external systems to ensure better data quality than was ever possible, previously. In the first week, my costs for the software were recovered by over 200% in saved labor, allowing those resources to focus on other higher value tasks. I don't care where you are from, that is an amazing ROI!
That was just the beginning, now that we had this tool, and in fact it could do much more than this initial task I needed, it became one of the most valued resources for developing functional Proof of Concept/prototypes of more complex processes we needed to bridge the gaps in the contracting system. I was able to add on to the original purchase with an Enterprise License and secure a more robust infrastructure partnering with our IT department at a an insanely low cost for total implementation. I now had (5) dedicated Corporate servers operating 24/7 and (2) development licenses for building and supporting automation tasks and we were able to continue to support the Contracting initiative, even with the volume so much greater than anticipated with the same number of FTEs as we started with. It became the platform for reporting, end user notification, system alerts, updating data, work-flow, job scheduling, monitoring, ETL and even data entry and migration from other systems. The cost avoidance because of implementing this Robotic Process Automation tool can not be over stated. The soft-dollar savings from delivering timely solutions to the business community and the continued professional integrity we were able to demonstrate and promote is evident in the successful implementation to more than 48 countries in under (1) year and the entry of over 120,000 Contracts entered each year since.
It became the platform for reporting, end user notification, system alerts, updating data, work-flow, job scheduling, monitoring, ETL and even data entry and migration from other systems.
While the term, Robotic Process Automation is currently all the buzz, the concepts have been around for some time. Please, please however, don't make the assumption that this means it is a build and forget situation. As it grows, and it will grow, you need a strong plan to manage tasks, resources and infrastructure to keep things running. These tools basically mimic anything a human can do, and much more than a human as well. However, a human can rather quickly change their steps in a process if one of the 'source' systems she/he is using has a change in the user interface. Your Automation Tasks will need 'tweaked' to make that change in most cases. Some business processes can be easier than others to Automate and might be two complex for a casual "Automation task creator" to build and or maintain. Be very sure you have solid resources to build and maintain the tasks. If you plan to do more than one thing with your RPA tool, make sure to have solid oversight, governance, resources and a corporate 'champion' or I assure you, your efforts will not be successful.
Robotic Process Automation Defined:
(IRPA) Institute for Robotic Process Automation: “Robotic process automation (RPA) is the application of technology that allows employees in a company to configure computer software or a “robot” to capture and interpret existing applications for processing a transaction, manipulating data, triggering responses and communicating with other digital systems.”
Wikipedia: “Examples of robotic automation include the use of industrial robots in manufacturing and the use of software robots in automating clerical processes in services industries. In the latter case, the use of the term robot is metaphorical, conveying the similarity of those software products – which are produced to provide a generic automation capability and then configured within the end user environment to execute manual and repetitive tasks – to their industrial robot counterparts. The metaphor is apt in the sense that the software “robot” is now mimicking or replacing a function classically associated with a person.”