I'm working on a project which will entail multiple devices, each with an embedded (ARM) processor, communicating. One development approach which I have found useful in the past with projects that only entailed a single embedded processor was develop the code using Visual Studio, divided into three portions:
Main application code (in unmanaged C/C++ [see note])
I/O-simulating code (C/C++) that runs under Visual Studio
Embedded I/O code (C), which Visual Studio is instructed not to build, runs on the target system. Previously this code was for the PIC; for most future projects I'm migrating to the ARM.
Feeding the embedded compiler/linker the code from parts 1 and 3 yields a hex file that can run on the target system. Running parts 1 and 2 together yields code which can run on the PC, with the benefit of better debugging tools and more precise control over I/O behavior (e.g. I can make the simulation code introduce certain types of random hiccups more easily than I can induce controlled hiccups on real hardware).
Target code is written in C, but the simulation environment uses C++ so as to simulate I/O registers. For example, I have a PortArray data structure; the header file for the embedded compiler includes a line like unsigned char LATA # 0xF89; and my header file for simulation includes #define LATA _IOBIT(f89,1) which in turn invokes a macro that accesses a suitable property of an I/O object, so a statement like LATA |= 4; will read the simulated latch, "or" the read value with 4, and write the new value. To make this work, the target code has to compile under C++ as well as under C, but this mostly isn't a problem. The biggest annoyance is probably with enum types (which behave as integers in C, but have to be coaxed to do so in C++).
Previously, I've used two approaches to making the simulation interactive:
Compile and link a DLL with target-application and simulation code, and have VB code in the same project which interacts with it.
Compile the target-application code and some simulation code to an EXE with instance of Visual Studio, and use a second instance of Visual Studio for the simulation-UI. Have the two programs communicate via TCP, so nearly all "real" I/O logic is in the simulation program. For example, the aforementioned `LATA |= 4;` would send a "read port 0xF89" command to the TCP port, get the response, process the received value, and send a "write port 0xF89" command with the result.
I've found the latter approach to run a tiny bit slower than the former in some cases, but it seems much more convenient for debugging, since I can suspend execution of the unmanaged simulation code while the simulation UI remains responsive. Indeed, for simulating a single target device at a time, I think the latter approach works extremely well. My question is how I should best go about simulating a plurality of target devices (e.g. 16 of them).
The difficulty I have is figuring out how to make each simulated instance get its own set of global variables. If I were to compile to an EXE and run one instance of the EXE for each simulated target device, that would work, but I don't know any practical way to maintain debugger support while doing that. Another approach would be to arrange the target code so that everything would compile as one module joined together via #include. For simulation purposes, everything could then be wrapped into a single C++ class, with global variables turning into class-instance variables. That would be a bit more object-oriented, but I really don't like the idea of forcing all the application code to live in one compiled and linked module.
What would perhaps be ideal would be if the code could load multiple instances of the DLL, each with its own set of global variables. I have no idea how to do that, however, nor do I know how to make things interact with the debugger. I don't think it's really necessary that all simulated target devices actually execute code simultaneously; it would be perfectly acceptable for simulation instances to use cooperative multitasking. If there were some way of finding out what range of memory holds the global variables, it might be possible to have the 'task-switch' method swap out all of the global variables used by the previously-running instance and swap in the contents applicable to the instance being switched in. Although I'd know how to do that in an embedded context, though, I'd have no idea how to do that on the PC.
Edit
My questions would be:
Is there any nicer way to allow simulation logic to be paused and examined in VS2010 debugger, while keeping a responsive UI for the simulator front-end, than running the simulator front end and the simulator logic in separate instances of VS2010, if the simulation logic must be written in C and the simulation front end in managed code? For example, is there a way to tell the debugger that when a breakpoint is hit, some or all other threads should be allowed to keep running while the thread that had hit the breakpoint sits paused?
If the bulk of the simulation logic must be source-code compatible with an embedded system written in C (so that the same source files can be compiled and run for simulation purposes under VS2010, and then compiled by the embedded-systems compiler for use in real hardware), is there any way to have the VS2010 debugger interact with multiple simulated instances of the embedded device? Assume performance is not likely to be an issue, but the number of instances will be large enough that creating a separate project for each instance would be likely be annoying in the absence of any way to automate the process. I can think of three somewhat-workable approaches, but don't know how to make any of them work really nicely. There's also an approach which would be better if it's possible, but I don't know how to make it work.
Wrap all the simulation code within a single C++ class, such that what would be global variables in the target system become class members. I'm leaning toward this approach, but it would seem to require everything to be compiled as a single module, which would annoyingly affect the design of the target system code. Is there any nice way to have code access class instance members as though they were globals, without requiring all functions using such instances to be members of the same module?
Compile a separate DLL for each simulated instance (so that e.g. if I want to run up to 16 instances, I would include 16 DLL's in the project, all sharing the same source files). This could work, but every change to the project configuration would have to be repeated 16 times. Really ugly.
Compile the simulation logic to an EXE, and run an appropriate number of instances of that EXE. This could work, but I don't know of any convenient way to do things like set a breakpoint common to all instances. Is it possible to have multiple running instances of an EXE attached to a single debugger instance?
Load multiple instances of a DLL in such a way that each instance gets its own global variables, while still being accessible in the debugger. This would be nicest if it were possible, but I don't know any way to do so. Is it possible? How? I've never used AppDomains, but my intuition would suggest that might be useful here.
If I use one VS2010 instance for the front-end, and another for the simulation logic, is there any way to arrange things so that starting code in one will automatically launch the code in the other?
I'm not particularly committed to any single simulation approach; while it might be nice to know if there's some way of slightly improving the above, I'd also like to know of any other alternative approaches that could work even better.
I would think that you'd still have to run 16 copies of your main application code, but that your TCP-based I/O simulator could keep a different set of registers/state for each TCP connection that comes in.
Instead of a bunch of global variables, put them into a single structure that encompasses the I/O state of a single device. Either spawn off a new thread for each socket, or just keep a list of active sockets and dedicate a single instance of the state structure for each socket.
the simulators I have seen that handle multiple instances of the instruction set/processor are designed that way. There is a structure usually that contains a complete set of registers, and a new pointer or an array of these structures are used to multiply them into multiple instances of the processor.
Related
in my project i injected a DLL(64-bit Windows 10) in to a external process with Manual-map & Thread-hijacking and i do some stuff in there.
In current state i use "RtlCreateUserThread" to create a new thread and do some extra workload in there to distribute it for better performance.
My question is now... Is it possible to access other threads from the current process (hijack it) and add your own workload/code there. Without creating a new thread?
I didn't found anything helpful yet in the internet and the code i used and modified for Thread-hijacking seems to only work for a DLL file. Because i am pretty new to C++ i am still learning i am already thankful for any help.
(If you want to see the source for injector Google GHInjector your find the library on github.)
It is possible, but so complicated and may not work in all cases.
You need to splice existing thread's machine codes, so you will need write access to code page memory.
Logic:
find thread id and thread handle, then suspend thread with SuspendThread WINAPI call
suspended thread can be in wait state or in system DLL call now, so you need to analyze current execution stack, backtrace it and find execution address from application space. You need API functions StackWalk, and PDB files in some cases. Also it depends on running architecture (x86, amd64, ...). Walk through stack until your EIP/RIP will not be in application memory address space
decode machine instruction (it will be 'call') and splice next instructions to your function call. You need to use __declspec(naked) declared function or ASM implemented one for execute your code and replaced instructions.
ResumeThread
This method may work only once because no guarantees that application code is executed in loop.
I'm looking for a little advice on "hacking" Mono (and in fact, .NET too).
Context: As part of the Isis2 library (Isis2.codeplex.com) I want to support very fast "zero copy" replication of memory-mapped files on machines that have the right sort of hardware (Infiband NICs), and minimal copying for more standard Ethernet with UDP. So the setup is this: We have a set of processes {A,B....} all linked to Isis2, and some member, maybe A, has a big memory-mapped file, call it F, and asks Isis2 to please rereplicate F onto B, D, G, and X. The library will do this very efficiently and very rapidly, even with heavy use by many concurrent initiators. The idea would be to offer this to HPC and cloud developers who are running big-data applications.
Now, Isis2 is coded in C# on .NET and cross-compiles to Linux via Mono. Both .NET and Mono are managed, so neither wants to let me do zero-copy network I/O -- the normal model would be "copy your data into a managed byte[] object, then use SendTo or SendAsync to send. To receive, same deal: Receive or ReceiveAsync into a byte[] object, then copy to the target location in the file." This will be slower than what the hardware can sustain.
Turns out that on .NET I can hack around the normal memory protections. I built my own mapped file wrapper (in fact based on one posted years ago by a researcher at Columbia). I pull in the Win32Kernel.dll library, and then use Win32 methods to map my file, initiate the socket Send and Receive calls, etc. With a bit of hacking I can mimic .NET asynchronous I/O this way, and I end up with something fairly clean and coded entirely in C# with nothing .NET even recognizes as unsafe code. I get to treat my mapped file as a big unmanaged byte array, avoiding all that unneeded copying. Obviously I'll protect all of this from my Isis2 users; they won't know.
Now we get to the crux of my question: on Linux, I obviously can't load the Win32 kernel dll since it doesn't exist. So I need to implement some basic functionality using core Linux O/S calls: the fmap() call will map my file. Linux has its own form of asynchronous I/O too: for Infiniband, I'll use the Verbs library from Mellanox, and for UDP, I'll work with raw IP sends and signals ("interrupts") on completion. Ugly, but I can get this to work, I think. Again, I'll then try to wrap all this to look as much like standard asynchronous Windows async I/O as possible, for code cleanness in Isis2 itself, and I'll hide the whole unmanaged, unsafe mess from end users.
Since I'll be sending a gigabyte or so at a time, in chunks, one key goal is that data sent in order would ideally be received in the order I post my async receives. Obviously I do have to worry about unreliable communication (causes stuff to end up dropped, and I'll then have to copy). But if nothing is dropped I want the n'th chunk I send to end up in the n'th receive region...
So here's my question: Has anyone already done this? Does anyone have any tips on how Mono implements the asynchronous I/O calls that .NET uses so heavily? I should presumably do it the same way. And does anyone have any advice on how to do this with minimal pain?
One more question: Win32 is limited to 2Gb of mapped files. Cloud systems would often run Win64. Any suggestions on how to maximize interoperability while allowing full use of Win64 for those who are running that? (A kind of O/S reflection issue...)
From what experience I have programming whenever a program has a problem it crashes, whether it is from an unhanded exception or a piece of code that should have been checked for errors, but was not and threw one. What would cause a program to completely freeze a system to the point of requiring a restart.
Edit: Thanks for the answers. As for the language and OS this question was inspired by me playing Fallout and the game freezing twice in an hour causing me to have to restart the xbox, so I am guessing c++.
A million different things. The most common that come to mind are:
Spawning too many threads or processes, which drowns the OS scheduler.
Gobbling too much RAM, which puts the memory manager into page-fault hell.
In a Dotnet/Java type environment its quite difficult to seize a system up, because the Runtime keeps you code at a distance from the OS.
Closer to the metal say C or C++, Assembly etc you have to play fair with the rest of the system - If you dont have it already grab a copy of Petzold and observe/experiment yourself with the amount of 'boilerplate' code to get a single Window running...
Even closer, down at the driver level all sorts of things can happen...
There are number of reasons, being internal or external that leads to deadlocked application, more general case is when something is being asked for by a program but is not given that leads to infinite waiting, the practical example to this is, a program writes some text to a file, but when it is about to open a file for writing, same file is opened by any other application, so the requesting app will wait (freeze in some cases if not coded properly) until it gets exclusive control of the file.
And a critical freeze that leads to restarting the system is when the file which is asked for is something which very important for the OS. However, you may not need to restart the system in order to get it back to normal, unless the program which was frozen is written in a language that produces native binary, i.e. C/C++ to be precise. So if application is written in a language which works with the concept of managed code, like any .NET language, it will not need a system restart to get things back to normal.
page faults, trying to access inaccessible data or memory(acces violation), incompatible data types etc.
Would it be possible to take the source code from a SNES emulator (or any other game system emulator for that matter) and a game ROM for the system, and somehow create a single self-contained executable that lets you play that particular ROM without needing either the individual rom or the emulator itself to play? Would it be difficult, assuming you've already got the rom and the emulator source code to work with?
It shouldn't be too difficult if you have the emulator source code. You can use a method that is often used to store images in c source files.
Basically, what you need to do is create a char * variable in a header file, and store the contents of the rom file in that variable. You may want to write a script to automate this for you.
Then, you will need to alter the source code so that instead of reading the rom in from a file, it uses the in memory version of the rom, stored in your variable and included from your header file.
It may require a little bit of work if you need to emulate file pointers and such, or you may be lucky and find that the rom loading function just loads the whole file in at once. In this case it would probably be as simple as replacing the file load function with a function to return your pointer.
However, be careful for licensing issues. If the emulator is licensed under the GPL, you may not be legally allowed to store a proprietary file in the executable, so it would be worth checking that, especially before you release / distribute it (if you plan to do so).
Yes, more than possible, been done many times. Google: static binary translation. Graham Toal has a good howto paper on the subject, should show up early in the hits. There may be some code out there I may have left some code out there.
Completely removing the rom may be a bit more work than you think, but not using an emulator, definitely possible. Actually, both requirements are possible and you may be surprised how many of the handheld console games or set top box games are translated and not emulated. Esp platforms like those from Nintendo where there isnt enough processing power to emulate in real time.
You need a good emulator as a reference and/or write your own emulator as a reference. Then you need to write a disassembler, then you have that disassembler generate C code (please dont try to translate directly to another target, I made that mistake once, C is portable and the compilers will take care of a lot of dead code elimination for you). So an instruction of a make believe instruction set might be:
add r0,r0,#2
And that may translate into:
//add r0,r0,#2
r0=r0+2;
do_zflag(r0);
do_nflag(r0);
It looks like the SNES is related to the 6502 which is what Asteroids used, which is the translation I have been working on off and on for a while now as a hobby. The emulator you are using is probably written and tuned for runtime performance and may be difficult at best to use as a reference and to check in lock step with the translated code. The 6502 is nice because compared to say the z80 there really are not that many instructions. As with any variable word length instruction set the disassembler is your first big hurdle. Do not think linearly, think execution order, think like an emulator, you cannot linearly translate instructions from zero to N or N down to zero. You have to follow all the possible execution paths, marking bytes in the rom as being the first byte of an instruction, and not the first byte of an instruction. Some bytes you can decode as data and if you choose mark those, otherwise assume all other bytes are data or fill. Figuring out what to do with this data to get rid of the rom is the problem with getting rid of the rom. Some code addresses data directly others use register indirect meaning at translation time you have no idea where that data is or how much of it there is. Once you have marked all the starting bytes for instructions then it is a trivial task to walk the rom from zero to N disassembling and or translating.
Good luck, enjoy, it is well worth the experience.
I'm new to VB6 but i'm currently in charge of maintaining a horror of editor like tool with plenty of forms, classes, modules and 3rd party tools all chunk together like the skin faces on that guy in the texas chainsaw massacre...
What i don't understand is why i get different results when i run the app in debugging mode, vs when i compiled it and run it on my devevelopment pc vs when i installed it on a different pc.
Yes i know i'm dumb, so please direct me to where i can find out more about this. I'm hoping to find out something like different linking, registry related etc connection that i'm simply not getting right now, i.e. something like wax on, wax off :P
The main pain in the neck is when i'm trying to debug some errors from my QA and i need to find a spare pc to test this on plus i can't directly debug because i don't know where the code is if i do it that manner.
Thanks.
i run the app in debugging mode vs when i compiled it and run it on my
devevelopment pc
When you compile you have the option of compiling to native code or pcode. The debugger runs using pcode only. Under rare circumstances when you compile to native code there will be a change in behavior. This particular is really rare. I used VB6 since it's release and I may get it once or twice a year. My application is a complex CAD/CAM creating shapes and running metal cutting machine and has two dozen DLLs. Not a typical situation. At home with my hobby software I never ran into this problem.
There are another class of errors that result from event sequencing problems. While VB6 isn't truly multi-tasking it has the ability to jump out of the current code block to process a event. If it re-enters the same block for the new event interesting things (to say the least) can result. I think this is the likely source of your problems as you software is an editor which is a highly interactive type of software.
In general the problem is fixed by reordering the effected areas. You find the effected area by inserting MsgBox or write to a text file to log where you are. I recommend logging to a text file as MsgBox tend to alter behavior that are timing or multi-tasking related.
Remember if a event fire while VB6 in the middle of a code block and there a DoEvents floating around then it will leave the code block process the event and return to the original code block. If it re-enters the same code block and you didn't mean for this to happen then you will have problems. And you will have different problems on different computers as the timing will be different for each.
The easiest way to deal with this type of issues is create some flag variables. In multi-tasking parlance they are known as semaphores or mutexes. WHen you enter a critical section of code, you set it true. When you leave the routine you set it to false. If it is already true when you enter that section of code you don't execute it.
when i installed it on a different pc.
These are usually the result of the wrong DLL installed. Most likely you have an older version while the target has a newer version. I would download the free Virtual PC and create a clean Window XP install to double check this.
If your problem is event timing this too can be different on different computers. This is found by logging (not MsgBox) suspect regions.
If you can display a screen shot or the text of your specific errors then I can help better.
The first thing to check would be the versions of all the dlls that your app depends on - including the service pack version of the VB6 dll.
Have you any more specific details about what's behaving differently?