So I've already searched if there was a question like this posted before, but I wasn't able to find the answer I liked.
I've been working with some PLCs and variable frequency drives lately and thought it was about time I finally found out what cyclic and non-cyclic communication is.
So correct me if I'm wrong, but when I think of cyclic data, I think of data that is continuously being updated and is able to be sent/sampled to other devices. With relation to what I'm doing, I'm thinking that the variable frequency drive is able to update information such as speed and frequency that can be sampled/read from a PLC. This is what I would consider cyclic communication, something that is always updating a certain type of information that can be sent as data.
So I might be completely wrong with this assumption, and that leaves me with the question of what exactly would be considered non-cyclic or acyclic communication.
Any help?
Forenote: This is mostly a programming based site, and while your question does have an answer within the contexts of programming, I happen to know that in your industrial application, the importance of cyclic vs acyclic tends to be very hardware/protocol specific, and is really more of a networking problem than a programming one.
Cyclic data is not simply "continuous" data. In industry, it refers to data delivered on a guaranteed (or at least highly predictable) schedule. If the data stream were to violate the schedule, it could have disastrous consequences (a VFD misses its shutdown command by a fraction of a second, and you lose your arm!).
Acyclic data is still reliable for machine control, it is just delivered in a less deterministic way (on the order of milliseconds, sometimes up to several seconds). When accessing a single VFD with a single PLC, you will probably never notice this bursting behavior, and in fact, you may perceive smoother and quicker data transmissions. From the hardware interface perspective, acyclic data transfer does not provide as strong of a guarantee about if or when one machine will respond to the request of another.
Both forms of data transfer deliver data at speeds much faster than humans can deal with, but in certain applications they will each have their own consequences.
Cyclic networks usually must take the form of master/slave, where only one device is allowed speak at a time, and answers are always returned, even if just to confirm that the message was received. Cyclic networks usually do not allow as many devices on the same wire, and often they will pass larger amounts of data at slower rates.
Acyclic networks might be thought of as a bit more choatic, but since they skip handshaking formalities, they can often cheat more devices onto the network and get higher speeds all at the same time. This comes at the cost of occasional data collisions/bottlenecks, and sometimes even, requests for critical data are simply ignored/lost with no indication of failure or success from the target ( in the case the sender will likely be sitting and waiting desperately for a message it will not get, and often then trigger process watchdogs that will shutdown the system).
From a programmer perspective, not much is different between these two transmission types.
What will usually dictate a situation,
how many devices are running on the wire (sometimes this forces the answer right away)
how sensitive/volatile is the data they want to share (how useful are messages if they are a little late)
how much data they might be required to send at any given time ( shifting demands on a network that already produces race conditions can be hard to anticipate/avoid if you don't see it coming before hand).
Hope that helps :)
I'm starting on my first commercial sized application, and I often find myself making a design, but stopping myself from coding and implementing it, because it seems like a huge use of resources. This is especially true when it's on a piece that is peripheral (for example an enable for the output taps of a shift register). It gets even worse when I think about how large the generic implementation can get (4k bits for the taps example). The cleanest implementation would have these, but in my head it adds a great amount of overhead.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation? In general I worry less about the number of flip-flops, and more when it comes to width of signals. This may just be coming from a CS background where all application boundarys should be as small as possibly feasable to prevent overhead.
Point 1. We learn by playing, so play! Try a couple of things. See what the tools do. Get a feel for the problem. You won't get past this is you don't try something. Often the problems aren't where you think they're going to be.
Point 2. You need to get some context for these decisions. How big is adding an enable to a shift register compared to the capacity of the FPGA / your design?
Point 3. There's two major types of 'resource' to consider :- Cells and Time.
Cells is relatively easy in broad terms. How many flops? How much logic in identifiable blocks (e.g. in an ALU: multipliers, adders, etc)? Often this is defined by the design you're trying to do. You can't build an ALU without registers, a multiplier, an adder, etc.
Time is more subtle, and is invariably traded off against cells. You'll be trying to hit some performance target and recognising the structures that will make that hard are where to experience from point 1 comes in.
Things to look out for include:
A single net driving a large number of things. Large fan-outs cause a heavy load on a single driver which slows it down. The tool will then have to use cells to buffer that signal. Classic time vs cells trade off.
Deep clumps of logic between register stages. Again the tool will have to spend more cells to make logic meet timing if it's close to the edge. Simple logic is fast and small. Sometimes introducing a pipeline stage can decrease the size of a design is it makes the logic either side far easier.
Don't worry so much about large buses, if each bit is low fanout and you've budgeted for the registers. Large buses are often inherent in fast designs because you need high bandwidth. It can be easier to go wide than to go to a higher clock speed. On the other hand, think about the control logic for a wide bus, because it's likely to have a large fan-out.
Different tools and target devices have different characteristics, so you have to play and learn the rules for your set-up. There's always a size vs speed (and these days 'vs power') compromise. You need to understand what moves you along that curve in each direction. That comes with experience.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation?
Only rule I can come up with is 'Have I got time? or not?'
If I have, I'll explore. If not I better just make something work.
Ahhh, the life of doing design to a deadline!
It's something that comes with experience. Here's some pointers:
adding numbers is fairly cheap
choosing between them (multiplexing) gets big quite quickly if you have a lot of inputs to the multiplexer (the width of each input is a secondary issue also).
Multiplications are free if you have spare multipliers in your chip, they suddenly become expensive when you run out of hard DSP blocks.
memory is also cheap, until you run out. For example, your 4Kbit shift register easily fits within a single Xilinx block RAM, which is fine if you have one to spare. If not it'll take a large number of LUTs (depending on the device - an older Spartan 3 can fit 17 bits into a LUT (including the in-CLB register), so will require ~235 LUTS). And not all LUTs can be shift registers. If you are only worried about the enable for the register, don't. Unless you are pushing the performance of the device, routing that sort of signal to a few hundred LUTs is unlikely to cause major timing issues.
I am working on a point of sale (POS vending machine) project which has many images on the screen where the customer is expected to browse almost all of them. Here are my questions:
Can you please suggest me test cases for testing load time for images?
What is the acceptable load time for these images on screen?
Do we have any standards for testing these kind of acceptable load time?
"What is an acceptable loading time?" is a very broad question, one that has been studied as a research question for human computer interaction issues. In general the answer depends on:
How predictable the loading time is? (does it vary according to time of day, e.g. from 9am to 2am. unpredictable is usually the single most annoying thing about waiting)
How good is the feedback to the user? (does it look like it's broken or have a nice progress bar during the waiting? knowing it's nearly there can help ease the pain, even if the loading times are always consistent)
Who are the users and what other systems have they used previously? If it was all writing in a book before then waiting 2 minutes for images is going to be positively slow. If you're replacing something that took 3 minutes then it's pretty fast.
Ancillary input issues, e.g. does it buffer presses whilst loading and also move items around on the display so people press before it's finished and accidentally press the wrong thing? Does it annoyingly eat input soon after you've started to input it so you have to type/scan it again?
In terms of testing I'm assuming you're not planning on observing users and asking "how hard would you like to hurt this proxy for frustration?" What you can realistically test is how it copes under realistic loads and how accurate the predictions are.
We've got a fairly large application running on VxWorks 5.5.1 that's been developed and modified for around 10 years now. We have some simple home-grown tools to show that we are not using too much memory or too much processor, but we don't have a good feel for how much headroom we actually have. It's starting to make it difficult to do estimates for future enhancements.
Does anybody have any suggestions on how to profile such a system? We've never had much luck getting the Wind River tools to work.
For bonus points: the other complication is that our system has very different behaviors at different times; during start-up it does a lot of stuff, then it sits relatively idle except for brief bursts of activity. If there is a profiler with some programmatic way to have to record state information, I think that'd be very useful too.
FWIW, this is compiled with GCC and written entirely in C.
I've done a lot of performance tuning of various kinds of software, including embedded applications. I won't discuss memory profiling - I think that is a different issue.
I can only guess where the "well-known" idea originated that to find performance problems you need to measure performance of various parts. That is a top-down approach, similar to the way governments try to control budget waste, by subdividing. IMHO, it doesn't work very well.
Measurement is OK for seeing if what you did made a difference, but it is poor at telling you what to fix.
What is good at telling you what to fix is a bottom-up approach, in which you examine a representative sample of microscopic units of what is being spent, and finding out the full explanation of why each one is being spent. This works for a simple statistical reason. If there is a reason why some percent (for example 40%) of samples can be saved, on average 40% of samples will show it, and it doesn't require a huge number of samples. It does require that you examine each sample carefully, and not just sort of aggregate them into bigger bunches.
As a historical example, this is what Harry Truman did at the outbreak of the U.S. involvement in WW II. There was terrific waste in the defense industry. He just got in his car, drove out to the factories, and interviewed the people standing around. Then he went back to the U.S. Senate, explained what the problems were exactly, and got them fixed.
Maybe this is more of an answer than you wanted. Specifically, this is the method I use, and this is a blow-by-blow example of it.
ADDED: I guess the idea of finding-by-measuring is simply natural. Around '82 I was working on an embedded system, and I needed to do some performance tuning. The hardware engineer offered to put a timer on the board that I could read (providing from his plenty). IOW he assumed that finding performance problems required timing. I thanked him and declined, because by that time I knew and trusted the random-halt technique (done with an in-circuit-emulator).
If you have the Auxiliary Clock available, you could use the SPY utility (configurable via the config.h file) which does give you a very rough approximation of which tasks are using the CPU.
The nice thing about it is that it does not require being attached to the Tornado environment and you can use it from the Kernel shell.
Otherwise, btpierre's suggestion of using taskHookAdd has been used successfully in the past.
I've worked on systems that have had luck using locally-built monitoring utilities based on taskSwitchHookAdd and related functions (delete hook, etc).
"Simply" use this to track the number of ticks a given task runs. I realize that this is fairly gross scale information for profiling, but it can be useful depending on your needs.
To see how much cpu% each task is using, calculate the percentage of ticks assigned to each task.
To see how much headroom you have, add a lowest priority "idle" task that just does "while(1){}", and see how much cpu% it is assigned to it. Roughly speaking, that's your headroom.
A two parter:
1) Say you're designing a new type of application and you're in the process of coming up with new algorithms to express the concepts and content -- does it make sense to attempt to actively not consider optimisation techniques at that stage, even if in the back of your mind you fear it might end up as O(N!) over millions of elements?
2) If so, say to avoid limiting cool functionality which you might be able to optimise once the proof-of-concept is running -- how do you stop yourself from this programmers habit of a lifetime? I've been trying mental exercises, paper notes, but I grew up essentially counting clock cycles in assembler and I continually find myself vetoing potential solutions for being too wasteful before fully considering the functional value.
Edit: This is about designing something which hasn't been done before (the unknown), when you're not even sure if it can be done in theory, never mind with unlimited computing power at hand. So answers along the line of "of course you have to optimise before you have a prototype because it's an established computing principle," aren't particularly useful.
I say all the following not because I think you don't already know it, but to provide moral support while you suppress your inner critic :-)
The key is to retain sanity.
If you find yourself writing a Theta(N!) algorithm which is expected to scale, then you're crazy. You'll have to throw it away, so you might as well start now finding a better algorithm that you might actually use.
If you find yourself worrying about whether a bit of Pentium code, that executes precisely once per user keypress, will take 10 cycles or 10K cycles, then you're crazy. The CPU is 95% idle. Give it ten thousand measly cycles. Raise an enhancement ticket if you must, but step slowly away from the assembler.
Once thing to decide is whether the project is "write a research prototype and then evolve it into a real product", or "write a research prototype". With obviously an expectation that if the research succeeds, there will be another related project down the line.
In the latter case (which from comments sounds like what you have), you can afford to write something that only works for N<=7 and even then causes brownouts from here to Cincinnati. That's still something you weren't sure you could do. Once you have a feel for the problem, then you'll have a better idea what the performance issues are.
What you're doing, is striking a balance between wasting time now (on considerations that your research proves irrelevant) with wasting time later (because you didn't consider something now that turns out to be important). The more risky your research is, the more you should be happy just to do something, and worry about what you've done later.
My big answer is Test Driven Development. By writing all your tests up front then you force yourself to only write enough code to implement the behavior you are looking for. If timing and clock cycles becomes a requirement then you can write tests to cover that scenario and then refactor your code to meet those requirements.
Like security and usability, performance is something that has to be considered from the beginning of the project. As such, you should definitely be designing with good performance in mind.
The old Knuth line is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." O(N!) to O(poly(N)) is not a "small efficiency"!
The best way to handle type 1 is to start with the simplest thing that could possibly work (O(N!) cannot possibly work unless you're not scaling past a couple dozen elements!) and encapsulate it from the rest of the application so you could rewrite it to a better approach assuming that there is going to be a performance issue.
Optimization isn't exactly a danger; its good to think about speed to some extent when writing code, because it stops you from implementing slow and messy solutions when something simpler and faster would do. It also gives you a check in your mind on whether something is going to be practical or not.
The worst thing that can happen is you design a large program explicitly ignoring optimization, only to go back and find that your entire design is completely useless because it cannot be optimized without completely rewriting it. This never happens if you consider everything when writing it--and part of that "everything" is potential performance issues.
"Premature optimization is the root of all evil" is the root of all evil. I've seen projects crippled by overuse of this concept. At my company we have a software program that broadcasts transport streams from disk on the network. It was originally created for testing purposes (so we would just need a few streams at once), but it was always in the program's spec requirements that it work for larger numbers of streams so it could later be used for video on demand.
Because it was written completely ignoring speed, it was a mess; it had tons of memcpys despite the fact that they should never be necessary, its TS processing code was absurdly slow (it actually parsed every single TS packet multiple times), and so forth. It handled a mere 40 streams at a time instead of the thousands it was supposed to, and when it actually came time to use it for VOD, we had to go back and spend a huge amount of time cleaning it up and rewriting large parts of it.
"First, make it run. Then make it run fast."
or
"To finish first, first you have to finish."
Slow existing app is usually better than ultra-fast non-existing app.
First of all peopleclaim that finishign is only thing that matters (or almost).
But if you finish a product that has O(N!) complexity on its main algorithm, as a rule of thumb you did not finished it! You have an incomplete and unacceptable product for 99% of the cases.
A reasonable performance is part of a working product. A perfect performance might not be. If you finish a text editor that needs 6 GB of memory to write a short note, then you have not finished a product at all, you have only a waste of time at your hands.. You must remember always that is not only delivering code that makes a product complete, is making it achieve capability of supplying the costumer/users needs. If you fail at that it matters nothing that you have finished the code writing in the schedule.
So all optimizations that avoid a resulting useless product are due to be considered and applied as soon as they do not compromise the rest of design and implementation proccess.
"actively not consider optimisation" sounds really weird to me. Usually 80/20 rule works quite good. If you spend 80% of your time to optimize program for less than 20% of use cases, it might be better to not waste time unless those 20% of use-cases really matter.
As for perfectionism, there is nothing wrong with it unless it starts to slow you down and makes you miss time-frames. Art of computer programming is an act of balancing between beauty and functionality of your applications. To help yourself consider learning time-management. When you learn how to split and measure your work, it would be easy to decide whether to optimize it right now, or create working version.
I think it is quite reasonable to forget about O(N!) worst case for an algorithm. First you need to determine that a given process is possible at all. Keep in mind that Moore's law is still in effect, so even bad algorithms will take less time in 10 or 20 years!
First optimize for Design -- e.g. get it to work first :-) Then optimize for performance. This is the kind of tradeoff python programmers do inherently. By programming in a language that is typically slower at run-time, but is higher level (e.g. compared to C/C++) and thus faster to develop, python programmers are able to accomplish quite a bit. Then they focus on optimization.
One caveat, if the time it takes to finish is so long that you can't determine if your algorithm is right, then it is a very good time to worry about optimization earlier up stream. I've encountered this scenario only a few times -- but good to be aware of it.
Following on from onebyone's answer there's a big difference between optimising the code and optimising the algorithm.
Yes, at this stage optimising the code is going to be of questionable benefit. You don't know where the real bottlenecks are, you don't know if there is going to be a speed problem in the first place.
But being mindful of scaling issues even at this stage of the development of your algorithm/data structures etc. is not only reasonable but I suspect essential. After all there's not going to be a lot of point continuing if your back-of-the-envelope analysis says that you won't be able to run your shiny new application once to completion before the heat death of the universe happens. ;-)
I like this question, so I'm giving an answer, even though others have already answered it.
When I was in grad school, in the MIT AI Lab, we faced this situation all the time, where we were trying to write programs to gain understanding into language, vision, learning, reasoning, etc.
My impression was that those who made progress were more interested in writing programs that would do something interesting than do something fast. In fact, time spent worrying about performance was basically subtracted from time spent conceiving interesting behavior.
Now I work on more prosaic stuff, but the same principle applies. If I get something working I can always make it work faster.
I would caution however that the way software engineering is now taught strongly encourages making mountains out of molehills. Rather than just getting it done, folks are taught to create a class hierarchy, with as many layers of abstraction as they can make, with services, interface specifications, plugins, and everything under the sun. They are not taught to use these things as sparingly as possible.
The result is monstrously overcomplicated software that is much harder to optimize because it is much more complicated to change.
I think the only way to avoid this is to get a lot of experience doing performance tuning and in that way come to recognize the design approaches that lead to this overcomplication. (Such as: an over-emphasis on classes and data structure.)
Here is an example of tuning an application that has been written in the way that is generally taught.
I will give a little story about something that happened to me, but not really an answer.
I am developing a project for a client where one part of it is processing very large scans (images) on the server. When i wrote it i was looking for functionality, but i thought of several ways to optimize the code so it was faster and used less memory.
Now an issue has arisen. During Demos to potential clients for this software and beta testing, on the demo unit (self contained laptop) it fails due to too much memory being used. It also fails on the dev server with really large files.
So was it an optimization, or was it a known future bug. Do i fix it or oprtimize it now? well, that is to be determined as their are other priorities as well.
It just makes me wish I did spend the time to reoptimize the code earlier on.
Think about the operational scenarios. ( use cases)
Say that we're making a pizza-shop finder gizmo.
The user turns on the machine. It has to show him the nearest Pizza shop in meaningful time. It Turns out our users want to know fast: in under 15 seconds.
So now, any idea you have, you think: is this going to ever, realistically run in some time less than 15 seconds, less all other time spend doing important stuff..
Or you're a trading system: accurate sums. Less than a millisecond per trade if you can, please. (They'd probably accept 10ms), so , agian: you look at every idea from the relevant scenarios point of view.
Say it's a phone app: has to start in under (how many seconds)
Demonstrations to customers fomr laptops are ALWAYS a scenario. We've got to sell the product.
Maintenance, where some person upgrades the thing are ALWAYS a scenario.
So now, as an example: all the hard, AI heavy, lisp-customized approaches are not suitable.
Or for different strokes, the XML server configuration file is not user friendly enough.
See how that helps.
If I'm concerned about the codes ability to handle data growth, before I get too far along I try to set up sample data sets in large chunk increments to test it with like:
1000 records
10000 records
100000 records
1000000 records
and see where it breaks or becomes un-usable. Then you can decide based on real data if you need to optimize or re-design the core algorithms.