Are " process image" and "address space of a process" the same concept? - process

I just saw "process image" in the description for exec- functions in GNU Libc's manual. Is it the same concept as the address space of a process? Thanks.

I'd say they are related, but not quite the same.
The process image is everything you get from the program executable and load into memory, plus everything that gets added or modified at load time.
The address space is all its virtual addresses, and whatever is in them.
Same stuff - almost - but different viewpoint.
Much of the time, when I'm concerned about address space, I just want to know whether some particular address is valid for that process, and/or whatever addresses are currently available. Maybe I care about details like "this one's currently resident; that one's paged out; this other one will be initialized with zeroes when it's first accessed". But I really don't care about the contents, and I don't care about details like text/data/bss/heap/stack/mmap etc.
When I'm concerned about a process image, I'm pretty much concerned with the stuff that comes from the executable file, and getting it set up right. And if the program's already started running, I care about things like its registers, not just its memory.

A process consists of:
An address space
One or more threads of execution.
That which is sometimes called the process image consists of both.
An exec- function will:
Create a new address space
Wipe out all the threads and create a main thread.
A process image is really the same as a process.

Related

SCNkit: Issues with unhidden nodes

I am creating a 3-D game with a cave as the main environment. The cave is made of a large number of ring segments, one attached to the other, thus creating a currently small tunnel system.
If the Player is inside the cave, only a small part of the segments are visible. I am figuring that actually hiding the not-visible segments could save a lot of gpu time, which I need for other objects like buildings or enemies.
So what I try to do first is hiding the entire cave and then unhiding the visible segments by turning ‚node.isHidden’ true and false.
The particular nodes are being found and accessed by their names: ‚Node.childnode (withName: „XYZ003“, recursively: false).isHidden = true‘ (or false).
It works to the point where the segments are unhidden, but once I am trying to hide a previously unhidden segment, the renderer crashes with an EXC_BAD_ACCESS.
Doing the hiding on a hidden object (of course useless, but helping to understand the problem) is fine, so is unhiding unhidden segments.
Following the hint of another thread, I moved the routine into the renderer delegate so not doing the switching during the wrong time, but instead during the phase in which such changes are supposed to happen, but this did not help.
As an alternative, I did the hiding (and unhiding) by SCNActions, but I received the same result, which really puzzles me, as this would be kind of the ‚official way‘ to do it...
I also played around with the ‚recursively’ boolean, getting the same outcome (works for unhide, crashes on isHidden = true).
Then I tried to change opacity or other properties of the nodes - which worked perfectly. On the other hand, trying to remove the nodes from the parent resulted in the mentioned crash as well.
I need this to work, because older hardware could never cope with several thousand nodes (trying this, the frame rate dropped to 10fps, even without enemies around). And newer hardware might break down once the enemies appear...
My thinking is that the pointer is somehow messed up by the first unhiding (and hence the BAD_ACCESS error), so maybe an additional bonding (often seen with spritekit-routines) or another way to get the node-pointer could be the solution. On the other hand, if the pointer is broken, why can I still access all other properties? Maybe it‘s the subnodes that cause the problem - everyone of the nodes has 20 subnodes, which are supposed to change visibility, too.
Did anyone come across this behavior before me? I could not find anything during my google-research...
Hiding and unhiding nodes frequently is typically not a problem by itself. You can hide a main node and any sub-nodes of the main node will automatically hide themselves, so you shouldn't have to loop them individually.
I'm not an expert debugger and don't know your skill level, but BAD_ACCESS can mean that you tried to send a msg to a block of memory that can't execute the message or whenever the app tried to deference a corrupt pointer. Search "What Is EXC_BAD_ACCESS and How to Debug It" for a decent tutorial on some options for dealing with it.
I do my changes in the render delegate as well, but depending on the number of changes and how long they take, I sometimes use timers to control the amount of changes that can be made in a certain amount of time. That way, and after some adjustments, I'm pretty sure that I'm not bogging it down to a point where it just spirals out of control.
Structure can matter - personal preference, but I try to setup an array of classes that create individual nodes (and sub-nodes) and therefore have direct access to them. That way I'm not iterating through the whole node structure or finding nodes by name. Sometimes a lot is going on before I really have to make a modification to the node itself and so I can loop through my array of classes, check values, compare, etc. before taking action that involves the display. That also gives me a chance to remove particle systems, remove actions, set geometry = nil and update logic counters when I need to remove a node.
I'm sure opinions vary, but this has worked well for me. Once I standardized the structure, I just keep repeating the pattern.
Hope that helps

Having trouble making my VI work as a Sub VI

I am having trouble getting the terminals to pass any data to what they are connected to because the controls they connect to are in a while loop. My frustration level is high since I would have already had this done if I wrote it in C.
First, let me say this might get a little long so if you don't want to read it, then don't. Here goes. I have watched a couple of tutorials, read a lot, and even tried a few things out in code. I get why this can't be done directly in a while loop. Having said that, it seems that I have no choice but to use while loop(s) in my VI.
My VI is loosely based on Queued Message Handler in the Templates section of creating a new VI. I have 2 things that must take place. One - I have created a TCP client where I constantly send messages to get status from the equipment I am communicating with. This is a timed event and must be handled in a while loop so I can maintain the connection to the server. I am not doing the Open, Send, Close, Reopen, Send, Close, etc. type of message handling. Too inefficient. This is the lower half of the example template.
Second - On occasion the user will press a button on the front panel which creates a message that is sent to the equipment to make it do something. And this, it would seem, needs to be in a while loop also, hence my problem. Some/most of the controls exist with the event structure. This is the top half of the example template.
I actually have this working as a front panel, but every thing is in just one while loop and I cannot get the terminals to work. Here is where my confusion comes in, if I am passing something to the while loop, I only get its value once and if it changes, you don't get that change, and if you are passing the data out of the while loop, you only get it when the loop ends. These two things are really baffling me. How can pass data that changes while using a while loop, because I have to, but the while loop breaks using the terminals. Seems circular. The TCP communications cannot stop, and I cannot find an example of how to do this using my friend Google. Am I the only person on this planet that needs to do this? Doubt it.
Not going to show my code, as this in not a code problem. It is an understanding how LabView does things vs. how you would just write the code in C using some library. And also just being unfamiliar with all the things you can do in LabView, not to mention how things are different. I don't know what I don't know, but I can learn.
I want to be able to give the VI I have created to any user and let them use it to control my equipment. If they just want to run it as a front panel, or if they want to use it as a Sub VI that is OK too. I just need to be able to make the terminals actually pass data when used that way.
Thanks, I did order a book on LabView today, but I won't get it soon. I really need to put this problem to bed.
Cannot help that much without seeing the code. But I can try to give you a little bit of an idea of what is going on.
Dataflow is an important concept to understand in LabVIEW. elements (VIs, loops, etc.) will not start until all of their inputs (ie. terminals) have been received or set by something called before, and then they only take their inputs once. If your terminal is outside the loop, then the loop can only read it's starting value. (See "Infinite Loops" on this page). A simple way of solving this would be to put the terminal inside of the loop rather than outside, so it is then read on every iteration of the loop.
As for passing values outside of the loop, there are a number of methods for this. Again, because of dataflow, you will not usually be able to access the value of something inside the loop until the loop finishes executing. However, there are a number of ways to read those values in a different loop. Local or global variables would be the simplest way, but they are not recommended by NI. The proper way of handling this is using something on the synchronization pallet. More info on the options can be found here.
Seeing as you are basing something on the Queued Message Handler, a queue might be a good way to start. LabVIEW has built in examples of code to show you how to use these functions.
Loop synchronization and asynchronous programming are fundamental concepts for writing LabVIEW code. If these are not concepts you are familiar with, I would say that you will gain a lot from showing others your actual code and having people help you with the issues. If you are concerned about sharing something proprietary, try making a simple example and posting that code instead to understand the concepts better.
Event structure to react to evr8and functional global to pass data out.
Suggest pasting block diagram.

How are TLB's so fast?

When handling virtual memory you often use a TLB (I'm asking about software-managed TLB's) to make things faster. Instead of plugging your virtual address into a page table to get the physical address it maps to you can use a TLB in between to store recently used pages.
So, what you do is that you take your virtual address and look if the TLB contains your address, if it does it will return that physical address the virtual address maps to, but if it doesn't get a hit it will go to the page table, load that page into the TLB and return the physical address.
My question is: How can the TLB be so fast if it has to search for your virtual addresses all the time? The entries seemingly come in no certain order, so going through them would be very slow, I'd assume.
For context, this is the image of a TLB-lookup I have in my head (picture is taken from Wikipedia).
The entries seemingly come in no certain order, so going through them would be very slow, I'd assume.
That's not how things usually work in hardware, "going through" entries isn't really a thing. Well you can do it that way, but then it's not fast anymore so it misses the point.
All entries can be compared simultaneously, and the matching entry (if it exists) can be extracted with a log-depth circuit (constant-depth if you allow arbitrary fan-in, but that has its own problems).
The TLB is just a cache. The idea behind a cache is that it's easier to search a small number of recent references rather than a larger, more "distant", number of unprecendened references.
If I understand, your question is more: "why is caching fast?"
Perhaps someone can explain from an architecture standpoint better than I?

VB.net Garbage collector not releasing objects

First of all, thanks in advance for your help.
I've decided to ask for help in forums like this one because after several months of hard working, I couldn't find a solution for my problem.
This can be described as 'Why an object created in VB.net isn't released by the GC when it is disposed even when the GC was forced to be launched?"
Please consider the following piece of code. Obviously my project is much more complex, but I was able to isolate the problem:
Imports System.Data.Odbc
Imports System.Threading
Module Module1
Sub Main()
'Declarations-------------------------------------------------
Dim connex As OdbcConnection 'Connection to the DB
Dim db_Str As String 'ODBC connection String
'Sentences----------------------------------------------------
db_Str = "My ODBC connection String to my MySQL database"
While True
'Condition: Infinite loop.
connex = New OdbcConnection(db_Str)
connex.Open()
connex.Close()
'Release created objects
connex.Dispose()
'Force the GC to be launched
GC.Collect()
'Send the application to sleep half a second
System.Threading.Thread.Sleep(500)
End While
End Sub
End Module
This simulates a multithreaded application making connections to a MySQL database. As you can see, the connection is created as a new object, then released. Finally, the GC was forced to be launched. I've seen this algorithm in several forums but also in the MSDN online help, so as far as I am concerned, I am not doing anything wrong.
The problem begins when the application is launched. The object created is disposed within the code, but after a while, the availiable memory is exhausted and the application crashes.
Of course, this problem is hard to see in this little version, but on the real project, the application runs out of memory very quickly (due to the amount of connections made over the time) and as result, the uptime is only two days. Then I need to restart the application again.
I installed a memory profiler on my machine (Scitech .Net Memory profiler 4.5, downloadable trial version here). There is a section called 'Investigate memory leaks'. I was absolutely astonished when I saw this on the 'Real Time' tab. If I am correct, this graphic is telling me that none of the objects created on the code have been actually released:
The surprise was even bigger when I saw this other screen. According to this, all undisposed objects are System.Transactions type, which I assume are internally managed within the .Net libraries as I am not creating any object of this type on my code. Does it mean there is a bug on the VB.net Standard libraries???:
Please notice that in my code, I am not executing any query. If I do, the ODBCDataReader object won't be released either, even if I call the .Close() method (surprisingly enough, the number of unreleased objects of this type is exactly the same as the unreleased objects of type System.Transactions)
Another important thing is the statement GC.Collect(). This is used by the memory profiler to refresh the information to be displayed. If you remove it from the code, the profiler wont' update the real time diagram properly, giving you the false impression that everything is correct.
Finally, if you ommit the connex.Open() statement, the screenshot #1 will render a flat line (that means all the objects created have been successfully released), but unfortunatelly, we can't make any query against the database if the connection hasn't been opened.
Can someone find a logical explanation to this and also, a workaround for effectively releasing the objects?
Thank you all folks.
Nico
Dispose has nothing to do with garbage collection. Garbage collection is exclusively about managed resources (memory). Dispose has no bearing on memory at all, and is only relevant for unmanaged resources (database connections, file handles, gdi resource, sockets... anything not memory). The only relationship between the two has to do with how an object is finalized, because many objects are often implemented such that disposing them will suppress finalization and finalizing them will call .Dispose(). Explicitly Disposing() an object will never cause it to be collected1.
Explicitly calling the garbage collector is almost always a bad idea. .Net uses a generational garbage collector, and so the main effect of calling it yourself is that you'll hold onto memory longer, because by forcing the collection earlier you're likely to check the items before they are eligible for collection at all, which sends them into a higher-order generation that is collected less often. These items otherwise would have stayed in the lower generation and been eligible for collection when the GC next ran on it's own. You may need to use GC.Collect() now for the profiler, but you should try to remove it for your production code.
You mention your app runs for two days before crashing, and are not profiling (or showing results for) your actual production code, so I also think the profiler is in part misleading you here. You've pared down the code to something that produced a memory leak, but I'm not sure it's the memory leak you are seeing in production. This is partly because of the difference in time to reproduce the error, but it's also "instinct". I mention that because some of what I'm going to suggest might not make sense immediately in light of your profiler results. That out of the way, I don't know for sure what is going on with your lost memory, but I can make a few guesses.
The first guess is that your real code has try/catch block. An exception is thrown... perhaps not on every connection, but sometimes. When that happens, the catch block allows your program to keep running, but you skipped over the connex.Dispose() line, and therefore leave open connections hanging around. These connections will eventually create a denial of service situation for the database, which can manifest itself in a number of ways. The correction here is to make sure you always use a finally block for anything you .Dispose(). This is true whether or not you currently have a try/catch block, and it's important enough that I would say the code you've posted so far is fundamentally wrong: you need a try/finally. There is a shortcut for this, via a using block.
The next guess is that some of your real commands end up fairly large, possibly with large strings or image (byte[]) data involved. In this case, items end up on a special garbage collector generation called the Large Object Heap (LOH). The LOH is rarely collected, and almost never compacted. Think of compaction as analogous to what happens when you defrag a hard drive. If you have items going to the LOH, you can end up in a situation where the physical memory itself is freed (collected), but the address space within your process (you are normally limited to 2GB) is not freed (compacted). You have holes in your memory address space that will not be reclaimed. The physical RAM is available to your system for other processes, but over time this still results in the same kind of OutOfMemory exception you're seeing. Most of the time this doesn't matter: most .Net programs are short-lived user-facing apps, or ASP.Net apps where the entire thread can be torn down after a page is served. Since you're building something like a service that should run for days, you have to be more careful. The fix may involve significantly re-working some code, to avoid creating the large objects at all. That may mean re-using a single or small set of byte arrays over and over, or using streaming techniques instead of string concatenation or string builders for very large sql queries or sql query data. It may also mean you find this easier to do as a scheduled task that runs daily and shuts itself down at the end of the day, or a program that is invoked on demand.
A final guess is that something you are doing results in your connection objects still being in some way reachable by your program. Event handlers are a common source of mistakes of this sort, though I would find it strange to have event handlers on your connections, especially as this is not part of your example.
1 I suppose I could contrive a scenario that would make this happen. A simple way would be to build an object assumes a global collection for all objects of that type... the objects add themselves to the collection at construction and remove themselves at disposal. In this way, the object could not be collected before disposal, because before that point it would still be reachable... but that would be a very flawed program design.
Thank you all guys for your very helpful answers.
Joel, you're right. This code produces 'a leak' which is not necesarily the same as 'the leak' problem I have on my real project, though they reproduce the same symptoms, that is, the number of unreleased objects keep growing (and eventually will exhaust the memory) on the code mentioned above. So I wonder what's wrong with it as everything seems to be properly coded. I don't understand why they are not disposed/collected. But according to the profiler, they are still in memory and eventually will prevent to create new objects.
One of your guesses about my 'real' project hit the nail on the head. I've realized that my 'catch' blocks didn't call for object disposal, and this has been now fixed. Thanks for your valuable suggestion. However, I implemented the 'using' clause in the code in my example above and didn't actually fix the problem.
Hans, you are also right. After posting the question, I've changed the libraries on the code above to make connections to MySQL.
The old libraries (in the example):
System.Data.Odbc
The new libraries:
System.Data
Microsoft.Data.Odbc
Whith the new ones, the profiler rendered a flat line, whithout any further changes on the code, which it was what I've been looking after. So my conclussion is the same as yours, that is there may be some internal error in the old ones that makes that thing to happen, which makes them a real 'troublemaker'.
Now I remember that I originally used the new ones on my project (the System.Data and Microsoft.Data.Odbc) but I soon changed for the old ones (the System.Data.Odbc) because the new ones doesn't allow Multiple Active Recordsets (MARS) opened. My application makes a huge amount of queries against the MySQL database, but unfortunately, the number of connections are limited. So I initially implemented my real code in such a way that it made only a few connections, but they were shared accross the code (passing the connection between functions as parameter). This was great because (for example) I needed to retrieve a recordset (let's say clients), and make a lot of checks at the same time (example, the client has at least one invoice, the client has a duplicated email address, etc, which involves a lot of side queries). Whith the 'old' libraries, the same connection allowed to create multiple commands and execute different queries.
The 'new' libraries don't allow MARS. I can only create one command (that is, to execute a query) per session/connection. If I need to execute another one, I need to close the previous recordset (which isn't actually possible as I am iterating over it), and then to make the new query.
I had to find the balance between both problems. So I end up using the 'new libraries' because of the memory problems, and I recoded my application to not share the connections (so each procedure will create a new one when needed), as well as reducing the number of connections the application can do at the same time to not exhaust the connection pool.
The solution is far to ideal as it introduces spurious logic on the application (the ideal case scenario would be to migrate to SQL server), but it is giving me better results and the application is being more stable, at least in the early stages of the new version.
Thanks again for your suggestions, I hope you will find mines usefult too.
Cheers.
Nico

Most optimized way to store crawler states?

I'm currently writing a web crawler (using the python framework scrapy).
Recently I had to implement a pause/resume system.
The solution I implemented is of the simplest kind and, basically, stores links when they get scheduled, and marks them as 'processed' once they actually are.
Thus, I'm able to fetch those links (obviously there is a little bit more stored than just an URL, depth value, the domain the link belongs to, etc ...) when resuming the spider and so far everything works well.
Right now, I've just been using a mysql table to handle those storage action, mostly for fast prototyping.
Now I'd like to know how I could optimize this, since I believe a database shouldn't be the only option available here. By optimize, I mean, using a very simple and light system, while still being able to handle a great amount of data written in short times
For now, it should be able to handle the crawling for a few dozen of domains, which means storing a few thousand links a second ...
Thanks in advance for suggestions
The fastest way of persisting things is typically to just append them to a log -- such a totally sequential access pattern minimizes disk seeks, which are typically the largest part of the time costs for storage. Upon restarting, you re-read the log and rebuild the memory structures that you were also building on the fly as you were appending to the log in the first place.
Your specific application could be further optimized since it doesn't necessarily require 100% reliability -- if you miss writing a few entries due to a sudden crash, ah well, you'll just crawl them again. So, your log file can be buffered and doesn't need to be obsessively fsync'ed.
I imagine the search structure would also fit comfortably in memory (if it's only for a few dozen sites you could probably just keep a set with all their URLs, no need for bloom filters or anything fancy) -- if it didn't, you might have to keep in memory only a set of recent entries, and periodically dump that set to disk (e.g., merging all entries into a Berkeley DB file); but I'm not going into excruciating details about these options since it does not appear you will require them.
There was a talk at PyCon 2009 that you may find interesting, Precise state recovery and restart for data-analysis applications by Bill Gribble.
Another quick way to save your application state may be to use pickle to serialize your application state to disk.