Difference between struct and association - oop

I was asked 2,3 questions in an interview which didn't make any sense.
Difference between deadlock and thread ( i couldnt compare them )
Difference between struct and association ( not comparable again)
Are these even valid questions? If yes what would be the answer.

well, these are unusual questions, but one could answer:
a thread (of execution) is the smallest unit of processing that can be scheduled by an operating system and
A deadlock is a situation wherein two or more competing actions are each waiting for the other to finish, and thus neither ever does.
or shorter: a deadlock is a situation that thread could be in. That's not really a difference...
I think if you can describe thread and deadlock that would be fine. It could be an intentionally misleading question so you must really sure what you know to tell the interviewer that there is no really difference, because one can't compare the tow things.
The same goes for struct and association.

Related

How often are values duplicated on the stack?

When you have a function that accepts an array as an argument and calls another function with that array and that calls another function with it and so forth the stack will contain many copies of the pointer to that array. I just thought of an interesting way to alleviate this problem but I'm wondering whether or not it is worth implementing.
Does anyone have any idea how often stacks contain duplicate pointers in practice?
EDIT
Just to clarify, I am not optimizing a given program but, rather, am considering writing a new kind of optimization pass for my VM. My benchmarks have indicated that my current solution causes up to 70% of the total running time to be spent in stack manipulations. The optimization pass I am thinking of would generate code at compile time that would perform the same actions but pointers would (potentially) be duplicated on the stack less often. I am interested in any prior studies that have measured the number of duplicates on the stack because this would help me to quantify my optimization's potential. For example, if it is known that real programs do not push pointers already on the stack in practice then my optimization is worthless.
Moreover, these stack manipulations are due to the code generated by my VM making sure locally-held pointers are visible to the garbage collector and not due only to function parameters as both answerers have currently assumed. And they are actually operations on a shadow stack rather than the main stack.
First of all, the answer will depend on your application.
Secondly, even with high duplication, I doubt there is much sense in implementing the mechanism you describe, or even that it is possible in a general case. If you call a method and you pass it parameters, you must do it either one way or another.
There may be advantages to doing it in some specific way - for example there are several function calling conventions and many C/C++ compilers (e.g. gcc) let you choose between passing parameters on the stack or via registers. In certain cases, the latter may be faster - you can try and benchmark if it helps your application.
But in a general case, the cost of detecting duplicated values on the stack and "reusing" them would probably much exceed any gains from having a smaller stack. The code for pushing and popping values is really simple (just a few CPU instructions in an optimized case), code for finding and reusing duplicates - hardly so. You would also have to somehow store the information about which values are already on the stack and how to find them - a nontrivial data structure. Except for some really weird cases, I don't think this would be smaller than the actual copied data itself.
What you could do, would be to rewrite your algorithm in such way that some function calls are eliminated. For example, if your function's result only depends on the input arguments, you could somehow cache or memoize the results, thus avoiding repeated calls with the same values. This may indeed bring some gains, though it's usually a memory vs CPU time tradeoff. Getting an advantage both in memory and in CPU time is rarely possible. Also, rewriting your algorithm is not really "avoiding duplication of data on the stack".
Any way, for the original question, I think the idea is not viable and you should look at optimizations elsewhere.
PS: You use case may somewhat resemble tail-call optimization, so perhaps that's a direction worth looking at - but if you implement it yourself, I would also consider this to fall into the "change your algorithm" category. Maybe changing from a recursive algorithm to an iterative one could help also.
Can I suggest getting some exposure to actual performance tuning?
(Here's my canonical example.)
Between the time a program starts and the time it ends, of the cycles it uses, it obviously uses 100% of those cycles.
If it goes in and out of functions, and passes pointers to an array, but does nothing else, then there's no surprise that a high percent of time goes into function entry and exit, and passing arguments.
If a program P is written to do task T, there are a multitude of other programs P' which could also do task T. Some of them take fewer cycles than all the others, and those are the optimal ones.
The way the optimal ones differ from the non-optimal ones is that the non-optimal ones are doing things that can be done without.
So, to optimize any program, find out what cycles are being spent that don't have to be, and get rid of those activities. That link shows in great detail how I do it.
Trying to pass fewer arguments to functions might or might not be necessary, depending on what your diagnostics tell you.

SQL generalization/specialization, data redundancy

I have three tables: actions, messages, likes. It defines inheritance, messages and likes are actions' childs (specialization).
Message and Like both have column userId and createdAt. Those should be of course moved to the parrent table Action and removed from Message and Likes. But there's only one case when I need to select both messages and likes from the database, in other cases I select only one of them, either messages or likes.
Is it ok to duplicate userId and createdAt in child and parrent table? It costs disk space but saves one join - I would have to join messages, likes with actions everytime I needed userId and createdAt. Whatsmore I would need to change my current code...
What would you suggest?
In my opinion this is a case of premature optimization (or premature denormalization, if you prefer). You're guessing that the join overhead will cause significant problems, so you're guessing that duplicating the userId and createdAt columns in the dependent tables will improve performance significantly.
I suggest that you should not duplicate columns until you know there's a real problem. I keep a few observations on performance optimization tacked up on the wall to remind myself of what I should do in similar cases:
It ain’t broke ‘til it’s broke.
You can’t improve what you haven’t measured.
Programs spend surprising amounts of time in the damnedest places.
Make it run. Make it run right. Make it run right fast.
optimization is literally the last thing you should be doing.
doing things wrong faster is no great benefit.
Also a few comments on denormalization:
You can’t denormalize that which is not normalized.
Most developers wouldn’t know third-normal form if it leapt out from behind their screen, screamed like a banshee, and cracked a baseball bat over their heads.
Denormalization is suggested as a panacea for database performance issues. The problem is that too often those recommending denormalization have never normalized anything.
“Denormalization for performance reasons” is an excuse for sloppy, “do what we’ve always done” thinking, especially when such denormalization is enshrined in the design.
In my experience, I am not able to identify where performance problems will occur before writing code. Problems always seem to occur in places where I would never have thought to look. Thus, I've found that my best choice is always to write the simplest, clearest code that I can and to design the database as simply as I can, following the normalization rules to the best of my ability, and then to deal with what turns up. There may still be performance issues which need attention (but, surprisingly, not really all that often), but in the end I'll end up with simple, clear, and easily understood/maintained code, running on a simple, well-designed database.
Share and enjoy.

FIFO semaphore test

I have implemented FIFO semaphores but now I need a way to test/prove that they are working properly. A simple test would be to create some threads that try to wait on a semaphore and then print a message with a number and if the numbers are in order it should be FIFO, but this is not good enough to prove it because that order could have occurred by chance. Thus, I need a better way of testing it.
If necessary locks or condition variables can be used too.
Thanks
What you describe with your sentence "but this is not good enough to prove it because that order could have occurred by chance" is somehow a known dilema.
1) Even if you have a specification, you can not ensure that the specification match your intention. To illustrate this I will take an example from "the limit of correctness". Let's consider a specification for a factorization function that is:
Compute A and B such as A * B = C
But it's not enough as you could have an implementation that returns A=1 and B=C. Adding A,B != 1 can still lead to A=-1 and B=-C, so the only correct specification must state A,B>1. That's just to illustrate how complicated it can be to write a specification that match the real intention.
2) Even having proved an algorithm, still doesn't mean the implementation is correct in practice. This is best illustrated with this quote from Donald Knuth:
Beware of bugs in the above code; I
have only proved it correct, not tried
it.
3) Testing can only reveal the presence of bug, not their absence. This quote goes back to Dijkstra:
Testing can be used to show the
presence of bugs but never to show
their absence.
Conclusion: you are doomed and you will never be 100% sure that your code is correct according to its intent! But stuff aren't that bad. Having a high confidence about the code is usually enough. For instance, if using multiple threads is still not enough for you, you can decide to use fuzzing as well so as to randomize the test execution even more. If your tests always pass, well, you can be pretty confident that your code is good.
because that order could have occurred by chance.
You can run the test a few times, e.g. 10, and test that each time the order was correct. This will ensure that it happened not by chance.
P.S. Multiple threads in a unit test is usually avoided

How far can you really go with "eventual" consistency and no transactions (aka SimpleDB)?

I really want to use SimpleDB, but I worry that without real locking and transactions the entire system is fatally flawed. I understand that for high-read/low-write apps it makes sense, since eventually the system becomes consistent, but what about that time in between? Seems like the right query in an inconsistent db would perpetuate havoc throughout the entire database in a way that's very hard to track down. Hopefully I'm just being a worry wart...
This is the pretty classic battle between consistency and scalability and - to some extent - availability. Some data doesn't always need to be that consistent. For instance, look at digg.com and the number of diggs against a story. There's a good chance that value is duplicated in the "digg" record rather than forcing the DB to do a join against the "user_digg" table. Does it matter if that number isn't perfectly accurate? Probably not. Then using something like SimpleDB might be a good fit. However if you are writing a banking system, you should probably value consistency above all else. :)
Unless you know from day 1 that you have to deal with massive scale, I would stick to simple more conventional systems like RDBMS. If you are working somewhere with a reasonable business model, you will hopefully see a big spike in revenue if there's a big spike in traffic. Then you can use that money to help solving the scaling problems. Scaling is hard and scaling is hard to predict. Most of the scaling problems that hurt you will be ones that you never expect.
I would much rather get a site off the ground and spend a few weeks fixing scale issues when traffic picks up then spend so much time worrying about scale that we never make it to production because we run out of money. :)
Assuming you're talking about this SimpleDB, you're not being a worrywart; there are real reasons not to use it as a real world DBMS.
The properties that you get from transaction support in a DBMS can be abbreviated by the acronym "A.C.I.D.": Atomicity, Consistency, Isolation, and Durability. The A and D have mostly to do with system crashes, and the C and I have to do with regular operation. They're all things people totally take for granted when working with commercial databases, so if you work with a database that doesn't have one or more of them, you might be in for any number of nasty surprises.
Atomicity: Any transaction will either complete fully or not at all (i.e. it will either commit or abort cleanly). This applies to single statements (like "UPDATE table ...") as well as longer, more complicated transactions. If you don't have this, then anything that goes wrong (like, the disk getting full, the computer crashing, etc.) might leave something half-done. In other words, you can't ever rely on the DBMS to really do the things you tell it to, because any number of real-world problems can get in the way, and even a simple UPDATE statement might get partially completed.
Consistency: Any rules you've set up about the database will always be enforced. Like, if you have a rule that says A always equals B, then nothing anybody does to the database system can break that rule - it'll fail any operation that tries. This isn't quite as important if all your code is perfect ... but really, when is that ever the case? Plus, if you're missing this safety net, things get really yucky when you lose ...
Isolation: Any actions taken on the database will execute as if they happened serially (one at a time), even if in reality they're happening concurrently (interleaved with each other). If more than one user is going to hit this database at the same time, and you don't have this, then things you can't even dream up will go wrong; even atomic statements can interact with each other in unforeseen ways and screw things up.
Durability: If you lose power or the software crashes, what happens to database transactions that were in progress? If you have durability, the answer is "nothing - they're all safe". Databases do this by using something called "Undo / Redo Logging", where every little thing you do to the database is first logged (typically on a separate disk for safety) in a way such that you can reconstruct the current state after a failure. Without that, the other properties above are sort of useless, because you can never be 100% sure that things will stay consistent after a crash.
Do any of these things matter to you? The answer has everything to do with the types of transactions you're doing, and what guarantees you want in a failure situation. There may well be cases (like a read-only database) where you don't need these, but as soon as you start doing anything non-trivial, and something bad happens, you'll wish you had 'em. Maybe it's OK for you to just revert to a backup anytime something unexpected happens, but my guess is that it isn't.
Also note that dropping all of these protections doesn't make it a given that your database will perform better; in fact, it's probably the opposite. That's because real-world DBMS software also has tons of code to optimize query performance. So, if you write a query that joins 6 tables on SimpleDB, don't assume that it'll figure out the optimal way to run that query - you might end up waiting hours for it to complete, when a commercial DBMS could use an indexed hash join and get it in .5 seconds. There are a zillion little tricks that you can do to optimize query performance, and believe me, you'll really miss them when they're gone.
None of this is meant as a knock on SimpleDB; take it from the author of the software: "Although it is a great teaching tool, I can't imagine that anyone would want to use it for anything else."

Do I really need to use transactions in stored procedures? [MSSQL 2005]

I'm writing a pretty straightforward e-commerce app in asp.net, do I need to use transactions in my stored procedures?
Read/Write ratio is about 9:1
Many people ask - do I need transactions? Why do I need them? When to use them?
The answer is simple: use them all the time, unless you have a very good reason not to (for instance, don't use atomic transactions for "long running activities" between businesses). The default should always be yes. You are in doubt? - use transactions.
Why are transactions beneficial? They help you deal with crashes, failures, data consistency, error handling, they help you write simpler code etc. And the list of benefits will continue to grow with time.
Here is some more info from http://blogs.msdn.com/florinlazar/
Remember in SQL Server all single statement CRUD operations are in an implicit transaction by default. You just need to turn on explict transactions (BEGIN TRAN) if you need to make multiple statements act as an atomic unit.
The answer is, it depends. You do not always need transaction safety. Sometimes it's overkill. Sometimes it's not.
I can see that, for example, when you implement a checkout process you only want to finalize it once you gathered all data, etc.. Think about a payment f'up, you can rollback - that's an example when you need a transaction. Or maybe when it's wise to use them.
Do you need a transaction when you create a new user account? Maybe, if it's across 10 tables (for whatever reason), if it's just a single table then probably not.
It also depends on what you sold your client on and who they are, and if they requested it, etc.. But if making a decision is up to you, then I'd say, choose wisely.
My bottom line is, avoid premature optimization. Build your application, keep in mind that you may want to go back and refactor/optimize later when you need it. Look at a couple opensource projects and see how they implemented different parts of their app, learn from that. You'll see that most of them don't use transactions at all, yet there are huge online stores that use them.
Of course, it depends.
It depends upon the work that the particular stored procedure performs and, perhaps, not so much the "read/write ratio" that you suggest. In general, you should consider enclosing a unit of work within a transaction if it is query that could be impacted by some other, simultaneously running query. If this sounds nondeterministic, it is. It is often difficult to predict under what circumstances a particular unit of work qualifies as a candidate for this.
A good place to start is to review the precise CRUD being performed within the unit of work, in this case within your stored procedure, and decide if it a) could be affected by some other, simultaneous operation and b) if that other work matters to the end result of this work being performed (or, even, vice versa). If the answer is "Yes" to both of these then consider wrapping the unit of work within a transaction.
What this is suggesting is that you can't always simply decide to either use or not use transactions, rather you should apply them when it makes sense. Use the properties defined by ACID (Atomicity, Consistency, Isolation, and Durability) to help decide when this might be the case.
One other thing to consider is that in some circumstances, particularly if the system must perform many operations in quick succession, e.g., a high-volume transaction processing application, you might need to weigh the relative performance cost of the transaction. Depending upon the size of the unit of work, a commit (or rollback) of a transaction can be resource expensive, perhaps negatively impacting the performance of your system unnecessarily or, at least, with limited benefit.
Unfortunately, this is not an easy question to precisely answer: "It depends."
Use them if:
There are some errors that you may want to test for and catch which won't be caught except by you going out and doing the work (looking things up, testing values, etc.), usually from within a transaction so that you can roll back the whole operation.
There are multi-step operations of any sort, which should, logically, be rolled back as a group if they fail.