Does anybody actually use the PSP (Personal Software Process)? - process

I've been reading a bit about this recently but it looks to be a bit heavy. Does anybody have real world experience using it?
Are there any light weight alternatives?

The Personal Software Process is a personal improvement process. The full-blown PSP is quite heavy and there are several forms, templates, and documents associated with it. However, a key point is that you are supposed to tailor the PSP to your specific needs.
Typically, when you are learning about the PSP (especially if you are learning it in a course), you will use the full PSP with all of its forms. However, as Watts S. Humphrey says in "PSP: A Self-Improvement Process for Software Engineers", it's important to "use a process that both works for you and produces the desired results". Even for an individual, multiple projects will probably require variations on the process in order to achieve the results you want to.
In the book I mentioned above, "PSP: A Self Improvement Process for Software Engineers", the steps that you should follow when defining your own process are:
Determine needs and priorities
Define objectives, goals, and quality criteria
Characterize the current process
Characterize the target process
Establish a strategy to develop the process
Validate the process
Enhance the process
If you are familiar with several process models, it should be fairly easy to take pieces from all of them and create a process or workflow that works on your particular project. If you want more advice, I would suggest picking up the book. There's an entire chapter dedicated to extending and modifying the PSP as well as creating your own process.

The Personal Software Process itself is a subset of the Capability Maturity Model (CMM) processes. There are no light weight alternatives available as of now.

Related

Choosing a chat-bot framework for data science research project and understanding the hidden costs of the development and rollout?

The question is about using a chat-bot framework in a research study, where one would like to measure the improvement of a rule-based decision process over time.
For example, we would like to understand how to improve the process of medical condition identification (and treatment) using the minimal set of guided questions and patient interaction.
Medical condition can be formulated into a work-flow rules by doctors; possible technical approach for such study would be developing an app or web site that can be accessed by patients, where they can ask free text questions that a predefined rule-based chat-bot will address. During the study there will be a doctor monitoring the collected data and improving the rules and the possible responses (and also provide new responses when the workflow has reached a dead-end), we do plan to collect the conversations and apply machine learning to generate improved work-flow tree (and questions) over time, however the plan is to do any data analysis and processing offline, there is no intention of building a full product.
This is a low budget academy study, and the PHD student has good development skills and data science knowledge (python) and will be accompanied by a fellow student that will work on the engineering side. One of the conversational-AI options recommended for data scientists was RASA.
I invested the last few days reading and playing with several chat-bots solutions: RASA, Botpress, also looked at Dialogflow and read tons of comparison material which makes it more challenging.
From the sources on the internet it seems that RASA might be a better fit for data science projects, however it would be great to get a sense of the real learning curve and how fast one can expect to have a working bot, and the especially one that has to continuously update the rules.
Few things to clarify, We do have data to generate the questions and in touch with doctors to improve the quality, it seems that we need a way to introduce participants with multiple choices and provide answers (not just free text), being in the research side there is also no need to align with any specific big provider (i.e. Google, Amazon or Microsoft) unless it has a benefit, the important consideration are time, money and felxability, we would like to have a working approach in few weeks (and continuously improve it) the whole experiment will run for no more than 3-4 months. We do need to be able to extract all the data. We are not sure about which channel is best for such study WhatsApp? Website? Other? and what are the involved complexities?
Any thoughts about the challenges and considerations about dealing with chat-bots would be valuable.

Swarm Intelligence - what kinds of problems are effectively solved?

I am looking for practical problem (or implementations, applications) examples which are effectively algoritmized using swarm intelligence. I found that multicriteria optimization is one example. Are there any others?
IMHO swarm-intelligence should be added to the tags
Are you looking for toy problems or more for real-world applications?
In the latter category I know variants on swarm intelligence algorithms are used in Hollywood for CGI animations such as large (animated) armies riding the fields of battle.
Related but more towards the toy-problem end of the spectrum you can model large crowds with similar algorithms, and use it for example to simulate disaster-scenarios. AFAIK the Dutch institute TNO has research groups on this topic, though I couldn't find an English link just by googling.
One suggestion for a place to start further investigation would be this PDF book:
http://www.cs.vu.nl/~schut/dbldot/collectivae/sci/sci.pdf
That book also has an appendix (B) with some sample projects you could try and work on.
If you want to get a head start there are several frameworks (scientific use) for multi-agent systems such as swarming intelligence (most of 'em are written with Java I think). Some of them include sample apps too. For example have a look at these:
Repast:
http://repast.sourceforge.net/repast_3/
Swarm.org:
http://swarm.org/
Netlogo:
http://ccl.northwestern.edu/netlogo
Post edited, added more info.
I will take your question like: what kind of real-world problems SI can solve?
There are alot. Swarm intelligence is based on the complex behaviour of swarms, where agents in the swarm coordinate and cooperate by executing very simple rules to generate an emergent complex auto organized behaviour. Also, the agents often make a deliberation process to make efficient decisions, and also, the emergent behaviour of the swarms allows them to find patterns, learn and adapt to their environment. Therefore, real-world applications based on SI are those that often required coordination and cooperation techniques, optimization process, exploratory analysis, dynamical poblems, etc. Some of these are:
Optimization techniques (mathematical functions for example)
Coordination of a swarm of robots (to organize inventory for example)
Routing in communication networks. (This is also dynamical combinatorial optimization)
Data analysis (usually exploratory, like clustering). SI has alot of applications in data mining and machine learning. This allows SI algorithms to find interesting patterns in big sets of data.
Np problems in general
I'm sure there are alot more. You should check the book:
"Swarm Intelligence: from natural to artificial systems". This is the basic book.
Take care.

Encouraging management to scrap manual tests and do things the proper way

I am working in a project which is quite complex in terms of size (it's to make a web app). The first problem is that nobody is interested in any products which could really solve the problems surrounding the project (lack of time, no adjustments in timescales in response to ever changing requirements). Bare in mind these products are not expensive ( < $500 for a company making millions) and not products which require a lot of configuration (though the project needs products like that, such as build automation tools, to free up time).
Anyway, this means that testing is all done manually as documentation is a deliverable - this means the actual technical design, implementation and testing of the site suffers (are we developers or document writers? What are we trying to do here? are questions which come to mind). The site is quite large and complex (not on the scale of Facebook or anything like that), but doing manual tests as instructed to do so (despite my warnings) tells me this is not high quality testing and therefore not a high quality product to come out of it.
What benefits can I suggest to the relevant people to encourage automated testing (which they know I can implement)? I know it is possible to change resolution via cmd with a 3rd party app for Windows, so this could all be part of an automated build. Instead, I will probably have to run through all these permutations of browsers, screen resolutions, and window sizes manually. Also, where do recorded tests fall down on? Do they break when windows are minimised? The big problem with this is that I am doing the work in monitoring the test and the PC is not doing ALL of the work, which is my job (make the pc do all the work). And given a lack of resources, this clogs up a dev box - yes, used for development and then by me for testing. Much better to automate this for a night run when the box is free.
Thanks
Talking about money is usually the best way to get management attention, so here are a few suggestions:
Estimate how long it takes you to do your current manual testing.
Get a list of critical bugs that were found by customers - ideally with an idea of the impact cost (fixing a bug after release is always much more expensive than before), but it's usually good enough just to describe one or two particularly bad bugs. Your manual testing didn't catch these customer bugs, so this is a good way to demonstrate that your manual testing is inadequate.
Come up with a pilot project where you automate testing a certain area of the product where bugs were found in production. Estimate the cost of the pilot project - doing a restricted pilot has the advantages of being easier to scope and estimate. Then compare the ongoing cost of repeatedly running the automation versus testing every release manually; after a few release you should break even on the cost of the automation tool plus the test development. Be careful picking the automation area - try to avoid areas like a complex UI that might change significantly between releases and thus require a lot of time to be spent on updating the automated tests.
Good luck to you. I screamed for all of this and I work for a billion+ company. We still perform manual testing (including regression testing). Automated tests are finally being instituted because some of the developers went out and got demos of some of the software you're describing and began configuring a framework.
Your best bet is to come up with an actual dollars and cents documented comparison between working with a product and working without a product to prove unequivocably to the management figures in charge of spending the money and designing the processes that the ROI is not only there but people who need to perform testing and/or change their existing processes will actually find their jobs a little bit easier.
Go grassroots. Talk to your team, get them on board. Talk to your business analysts, get them on board. Talk to any QA people you have and get them on board. When the villagers attack the castle with pitchforks and torches, you can bet that the wallets will open up and you'll be performing automated testing.
I would just try to automate as much as you can, whenever you can. I don't think you need to necessarily ask for permission to do things like this. Maybe your management doesn't think of these things, and often they won't see the benefit until you show them a great example.
Is it just that capital expenditures are difficult ? I've seen places where the time of existing employees is already spent, and therefore, essentially worthless in comparison to new purchases.
As for convincing managers, cost of manual regression tests versus cost to automate. If you are running lots of manual tests, this should be an easy win. If you aren't running the tests often, try for cost of a bug. However, in many companies, the cost for a bug isn't attributed to the development department, quality and the cost of bug may not be a strong motivation (in other words, quality is just about pride and ego, not actually what it costs).
Convincing developers...if they aren't already on board...electo-shock therapy ? If they aren't there, it's going to be an up hill battle.
Have been trying to similar on my current project... I can say there's another factor - time. There's a learning curve on automated tools and automated test development. The first release that is tested with automated tools will not be tested as quickly as it was manually, because the testers are learning the tools in addition to exercising tests. The second release will be much faster and every release after that will be faster still - but the first one will be a schedule hit, if not a cost hit.
The financial case is not too hard - over time, the project saves lots of money, as resources for repetitive testing are vastly reduced.
But the hard part to find a strategy that lets you get the tool into usage with a minimum of schedule drag on the first release that uses the test tool. Testing is always squashed at the end of the schedule, so it's the thing most sensitive to schedule stress. Anything you can do to show management how to reduce or remove the learning curve and automated test setup and installation time is likely to increase your chances of using the tool.

Ideas for a distributed processing project?

I am looking for a project idea in distributed processing on Unix based systems. I wish to use only the C programming language. I have to finish the project in 4 months and it's a part of my course work. Can someone help me with an idea?
Cryptography problems
Distributed Ray Tracer
Chess AI (really, AI for any game)
Large Prime Number Search
Web crawler or other search mechanism
Generic Problem Solver (push out problem definition on the fly, followed by problem data).
Note on the last one:
An example would be if you have a gaming website with lots of board games that you were coming out with all the time. You don't want to have to install new clients on all your servers every time you write a new AI for a board game, so you have a program which you can send new AIs to and then after that you can just send the game data and the pushed AI will be used to solve the problem. This is best used for problems which can be broken into smaller chunks.
It is hard to answer without knowing anything about performance, the scale of the project, what you are trying to accomplish, etc. For example, is it one task or multiple tasks? Is the project just totally open?
4 months is pretty short, but maybe some kind of physics problem or math problem. Sorting or some kind of database work might be dull but beneficial.
Check out mapreduce for ideas! I was really motivated by this work, personally.
We used distributed processing here at work, but it's such a broad field..
Yeah.
Why not write a distributed compiler. You may then present an interface for people to compile things on the fly, and it will be passed to your distribute compilenet. Java is probably well-suited, and you'll get to do fun things, like be very mindful of security and so on.
The BOINC project is always looking for help and is very interesting:
http://boinc.berkeley.edu/
If you want to leave your mark and change the way we search the web,
look into B-Trees.
B-Trees and offspring/variants are the working horse of the internet.
Google uses them extensively to index the web.
Database indexes/indices are B-Tree offspring/variants.
Every LAMP system uses a database and indexes/indices.
Also, they are used extensively in distributed VLDB (Very Large DataBases)
Perhaps you can improve existing distributed databases (Cassandra and HBase)
These are lofty goals, but for me, this would leave a lasting mark
in the way Web data is processed, indexed and stored.
Write a distributed, fault tolerant, redundant network B+Tree or B*Tree.
Read Drozdek's book Data Structures and Algorithms in C++.
It's a good survey of B-Trees.
Read about skip trees
http://www.cs.huji.ac.il/~ittaia/papers/AAY-OPODIS05.pdf
Read about Efficient B-tree Based Indexing for Cloud Data Processing
http://www.comp.nus.edu.sg/~ooibc/vldb10-cgindex.pdf
Google search "Network B+Tree"
https://www.google.com/search?rlz=1C1CHKZ_enUS431US431&sourceid=chrome&ie=UTF-8&q=Network+B%2BTree

Profiling a VxWorks system

We've got a fairly large application running on VxWorks 5.5.1 that's been developed and modified for around 10 years now. We have some simple home-grown tools to show that we are not using too much memory or too much processor, but we don't have a good feel for how much headroom we actually have. It's starting to make it difficult to do estimates for future enhancements.
Does anybody have any suggestions on how to profile such a system? We've never had much luck getting the Wind River tools to work.
For bonus points: the other complication is that our system has very different behaviors at different times; during start-up it does a lot of stuff, then it sits relatively idle except for brief bursts of activity. If there is a profiler with some programmatic way to have to record state information, I think that'd be very useful too.
FWIW, this is compiled with GCC and written entirely in C.
I've done a lot of performance tuning of various kinds of software, including embedded applications. I won't discuss memory profiling - I think that is a different issue.
I can only guess where the "well-known" idea originated that to find performance problems you need to measure performance of various parts. That is a top-down approach, similar to the way governments try to control budget waste, by subdividing. IMHO, it doesn't work very well.
Measurement is OK for seeing if what you did made a difference, but it is poor at telling you what to fix.
What is good at telling you what to fix is a bottom-up approach, in which you examine a representative sample of microscopic units of what is being spent, and finding out the full explanation of why each one is being spent. This works for a simple statistical reason. If there is a reason why some percent (for example 40%) of samples can be saved, on average 40% of samples will show it, and it doesn't require a huge number of samples. It does require that you examine each sample carefully, and not just sort of aggregate them into bigger bunches.
As a historical example, this is what Harry Truman did at the outbreak of the U.S. involvement in WW II. There was terrific waste in the defense industry. He just got in his car, drove out to the factories, and interviewed the people standing around. Then he went back to the U.S. Senate, explained what the problems were exactly, and got them fixed.
Maybe this is more of an answer than you wanted. Specifically, this is the method I use, and this is a blow-by-blow example of it.
ADDED: I guess the idea of finding-by-measuring is simply natural. Around '82 I was working on an embedded system, and I needed to do some performance tuning. The hardware engineer offered to put a timer on the board that I could read (providing from his plenty). IOW he assumed that finding performance problems required timing. I thanked him and declined, because by that time I knew and trusted the random-halt technique (done with an in-circuit-emulator).
If you have the Auxiliary Clock available, you could use the SPY utility (configurable via the config.h file) which does give you a very rough approximation of which tasks are using the CPU.
The nice thing about it is that it does not require being attached to the Tornado environment and you can use it from the Kernel shell.
Otherwise, btpierre's suggestion of using taskHookAdd has been used successfully in the past.
I've worked on systems that have had luck using locally-built monitoring utilities based on taskSwitchHookAdd and related functions (delete hook, etc).
"Simply" use this to track the number of ticks a given task runs. I realize that this is fairly gross scale information for profiling, but it can be useful depending on your needs.
To see how much cpu% each task is using, calculate the percentage of ticks assigned to each task.
To see how much headroom you have, add a lowest priority "idle" task that just does "while(1){}", and see how much cpu% it is assigned to it. Roughly speaking, that's your headroom.