Best way to read a repository multiple times at the same time using libgit2, performance/memory wise? [closed]

Best way to read a repository multiple times at the same time using libgit2, performance/memory wise? [closed] - libgit2

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm using libgit2, I'd like to read two different branches at the same time and put them in two different lists.
I'm worried about the performance/memory consumption in that case.
Quoting the git_revwalk_new API page on the libgit2 website:
This revision walker uses a custom memory pool and an internal commit cache, so it is relatively expensive to allocate.
For maximum performance, this revision walker should be reused for different walks.
This revision walker is not thread safe: it may only be used to walk a repository on a single thread; however, it is possible to have several revision walkers in several different threads walking the same repository.
My initial approach, was to use two walkers for the two lists, each on different thread. They would walk the repository at the same time, each walker would include the commits for the targeted branch of the list.
However, from my understanding of the quote from the website, and please correct me if I'm wrong, allocating new RevisionWalker is expensive so it might consume a lot of memory, in case of large repositories. Also that the walker uses it's internal cache so re-reading and looking up commits would be faster if we used the same walker multiple times.
So my second thought was to only using one walker synchronously on the same thread. in which I include the commits of the first branch, begin reading the commits, and put them in the first list. Then I reset the walker again, include the commits for the second branch, re-read the repository commits and put them in the second list.
I tried both approaches, memory wise, there wasn't much difference, both used approximately the same memory. And performance wise, there wasn't much difference too.
So what do you recommend in this case? or is there any better solution?

A git_revwalk shouldn't be that expensive to build, realistically. And it sounds like you've evaluated the two options and determined that really there's no functional difference. So I would encourage you to use whichever one is code-wise the simplest, easiest to maintain, and easy to reason about.
Usually that's the one without threads, but that's just a guess.
The other advantage to doing two calls serially is that you may have a case where you can optimize the second revision walk based on the results of the first. For example, if you don't want to work on the commits in the second branch if you already saw them in the first, or you can optimize away whatever you're doing with them based on the knowledge that they were in the first branch, then git_revwalk_hideing those commits may make the overall computation more efficient.

Related

Good architecture for desktop client application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I've already run several times into the issue of creating a desktop client app for working with some server, and every time I ended with ugly code, which becomes just impossible to support after couple of releases.
I have highlighted the following key points:
All operations must be asynchronous, without any dummy windows for relative fast operations (i.e. less than 30 seconds)
App has to periodically connect with the server and check, for example, user account
All heavy operations must be cancelable
But, most important, all of this must be "naturally" in code, without creating unnecessary difficulties (singletons, hacks, etc)... only really needed code with minimal overhead.
How would you design such kind of app? What pattern would you use? What open source project with good architecture you can recommend?

This seems a little too broad, but instead of flagging I'll try and give an answer as I find the question interesting. I invite you to add more details if they come to mind.
Even though your design concerns the design of the application, there are a number of languages, patterns and technologies that would suit your requirements.
Keeping it general,
If your want your operations to be asynchronous, you are going to
need multiple threads. Their implementation and use may vary
depending on the language that you are using, but the concept behind
is the same. So, just spawn a thread every time you need an
asynchronous task, and implement a way to be noticed when the task is
done (with or without errors). This can be done in a number of ways,
since you asked for pattern I suggest you have a look at
observer.
The second requirement is not completely clear to me, I assume you
want to periodically check that the client's data is aligned with the
server's, and maybe perform security checks ("Are session and
authentication credentials still valid?"). The first solution is to
actually ask the server every n seconds, again using another
thread. This kind of polling might not be the best option though: how
do you factor in the possibility of connectivity issues? Even if your
client cannot operate without a working connection to the server, it
might bother the user to be disconnected and lose his work just
because his Wi-Fi router rebooted. I would suggest you perform
alignment checks at I/O, perhaps distinguishing between critical and
non-critical ones. For example, if you decide the user's profile
has to be aligned, then you would retrieve updated data from the server upon viewing it. On the other hand, if your app offers the
user a list of cooking recipes and you don't care about him not
viewing the one that has been inserted on the server 10 minutes in
the past, you could simply cache these items and refresh them in a
background thread every minute, without even noticing the user in
case the update fails. Last but not least, if you are also
concerned with concurrent modifications of data, again based on your
requirements you can decide to implement locks on data being edited,
to performs checks on save operations to see if the data has
changed in the meanwhile, or to simply let the user overwrite the
data on the server no matter what. All in all, hoping I interpreted
your question correctly, this requirement is nontrivial and has to be
adjusted to your particular use case.
Assuming the data is eventually saved on some sort of database on
the server, one answer are transactions, which allow you to
treat even complex sequences of operations as "all or nothing",
atomic instructions. You might implement your own system to have the
same result, but I don't really see the point of not using this
powerful instrument when possible. Keep in mind one thing: I'm
assuming "cancelable" means "cancelable before some point in time,
and not after" (a sort of "undo"). If you're looking for complete
revertability of any operation of data, the requirement becomes far
more complex, and in general not possible to guarantee.
I believed I already answered in a way that helps you minimize "hacks" in code where possible. To recap:
You are going to need threads, and the observer pattern can help you
keep the code clean.
Again, you can use threads, or focus on check on I/O operations. In
the second case, you might consider an application layer
specifically for client-server synchronization, embed it in one or
more classes, and perform all your checks there. Have a look at the
proxy pattern.
Use transactions to revert operations, and issue a COMMIT only
when you are sure that the operation is confirmed, a ROLLBACK in
every other case. Encapsule this logic in your server's code so that
the client is not aware of the actual transaction system being used,
and your code should be quite clean.
Please comment if my answer is not satisfying or clear.

Is BDD the solution for planning a medium sized project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm starting project number 8,192. Like most of my projects, they are either throw-away projects or projects that get canceled either from boredom, time or lack of usefulness.
But there is a project that has been on the back-burners for a long time that I really want to finish. In my perfect world mind, it should take 3 months for first release.
Anyway, one of my biggest issues is taking a large project (or even a small to medium one) and break it down into manageable pieces. My error is to always jump right on the terminal, open Textmate and start coding. This almost always fails. I get lost in feature creep, learning newer methods, framework wars, etc. Then, two months have gone by and nothing to show for it.
So I was thinking if BDD (such as Cucumber) might be a solution to this? Could it be used to scope out the larger pieces, then the smaller pieces until I have a feature list that is most of the project. At that point, I just start coding the pieces right?
What are your suggestions on tackling this problem that I'm sure other developers share.
BTW, I'm using Rails 3 (sometimes Padrino).
Thanks

On which track? BDD doesn't define the track--it communicates the track.
BDD may be the only requirements you have (or need), but that doesn't address the issue of feeping creaturisms unless you have the discipline not to implement anything for which no spec exists.
Uncaptured features don't get implemented, period. If a feature is added, it gets a scope, and is prioritized with the rest of the features. It may usurp something less-desirable, it may not.
The product owner (you in this case) must decide how much can be implemented in the time allotted, and which features should be implemented. Still boils down to discipline, however, you just have a tool that (helps) make sure what you implemented is what you actually wanted.
It doesn't, however, make sure that what you get is only what you originally wanted--it won't make sure nothing else is implemented on top of the specs you bothered to implement.

Branching hell, where is the risk vs productivity tipping point? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
My company is floating the idea of extending our version numbers another notch (e.g. from major.minor.servicepack to major.minor.servicepack.customerfix) to allow for customer specific fixes.
This strikes me as a bad idea on the surface as my experience is the more branching a product does (and I believe the customer fixes are branches of the code base) the more overhead, the more dilution of effort and ultimately the less productive the development group becomes.
I've seen a lot of risk vs productivity discussions but just saying "I think this is a bad idea" isn't quite sufficient. What literature is there about the real costs of becoming too risk averse and adopting a heavy, customer specific, source code branching, development model?
A little clarification. I expect this model would mean the customer has control over what bug fixes go into their own private branch. I think they would rarely upgrade to the general trunk (it may not even exist in this model). I mean why would you if you could control your own private reality bubble?

Can't help with literature, but customer-specific branching is a bad idea. Been there, done that. Debugging the stuff was pure hell, because of course you had to have all those customer-specific versions available to reproduce the error... some time later, the company had to do a complete rewrite of the application because the code base had become utterly unmaintainable. (Moving the customer-specific parts into configuration files so every customer was on the same code line.)
Don't go there.

I agree its generally the overhead to handle customer fixes is high, but I wouldn't say don't do it.
I would say charge the customer an arm and a leg (and them some) if they want that much attention. Otherwise don't do customer branches.

You describe the changes that go into the customer branch as "fixes". Because they are fixes, I am assuming that they will also be made in the trunk and are really just advanced deliveries of future bug fixes. If this is the case, why not just create a new "servicepack" (from question: major.minor.servicepack) and give that version to the customer.
For example, you release version 1.2.3.
Customer #1 needs a fix, create version 1.2.4 and give it to Customer #1.
Customer #2 needs a fix, crate version 1.2.5, give it to Customer #2 and advertise that they also get interim fix "for free".

In my travels I haven't personally seen any definite literature for most of the good practices, although I suspect that there is a lot of stuff out there.
Versions numbers provide a really simple mechanism to tie back specific versions in the wild with specific sets of code changes. Technically, it doesn't matter how many levels are in the version number, so long as the developers are diligent in insuring that for every "unique" version released, there is a "unique" version number.
Logic dictates that to limit support costs (which are huge, often worse then development ones), a reasonable organization would prefer to have the least number of "unique" versions running around in the field. One would be awesome, however there are usually quite a few in the real world. It's a cost vs. convenience issue.
Usually, the first number indicates that this series of releases is not backward compatible. The next number says that it mostly is, but a few things have changed and the last number says some stuff was fixed, but the documents all hold true. Used that way, you don't need a fourth number, even if you've done some specific fixes at the request of a subset of your customers. The choice to become more client-driven shouldn't have any effect on your numbering scheme (and thus it's a bad idea).
Branching based on customer requests is absolute madness. One main trunk is essential, so each time you branch it creates massive technical debt. Branch enough, and you can't afford the interest anymore.

Not sure about the literature but... if there is even a chance that you are supporting customer specific fixes it seems sensible to at least have a branching and versioning strategy in place. Although I would be hoping for the strategy never to be used.
I guess the danger is you end up with a culture where customer specific fixes become acceptible and the norm, rather than addressing the true issue that resulted in the need for the fix.
I guess the real cost will largely be dependent on whether its just an interim bug fix to keep a customer happy prior to the next release or whether its more of a one-off customisation. If it is just the former, and the quantity isn't too high I wouldn't be too woried. However if its customisations i would be scared witless.

If you can find a way to compile your one product and turn on each client's features on/off in their "configuration" of a central build that might be something worth figuring out.
Something like this might best be done through a profile/config/role based setup.
You may have to secure one set of client's customizations from another, or maybe they can all benefit from it. That part is up to you.
This way you can build custom views, custom codes, custom roles, custom code, whatever. But, they're a part of one project.
Do not maintain multiple codebases of the same product at any cost. I did it once and doing an hour change takes at least 1 hour for each system if it's in the worst spot. It's suicide.
Do share what you end up doing!

In my experience, the tipping point is reached when it becomes difficult to explain how bugfixes should be propagated through the branches.
Branching hell is an issue because people lose track of what is in which branch. If propagation rules are too complex, people start making mistakes while propagating changes between branches, and that's how you create branching hell.
If the "Cisco" branch raised a defect and we fix it, should we propagate the fix to the current release of the "IBM" branch, or only to the next release of the "IBM" branch? What if IBM raised the same defect? What if IBM doesn't even use the feature that contains the defect? What if IBM later raises the same defect as high priority? With multiple customer branches propagation rules are never simple, so they pretty much guarantee branching hell.

How do I encourage code sharing and limit the bug tracking overhead while maintaining flexibility in my releases? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
How are you tracking changes, testing effort for bugs that impact multiple artifacts released separately?
Code sharing is good because it reduces the total number of paths through the code which means more impact for fewer changes and less bugs (or more bugs addressed with fewer changes). For example, we may build a search tool and an indexer that use the same file handling package or model package.
We need to be able to ensure that changes get tested in all the right components and track which changes were included with which released tools. We also don't want to be forced to release the change in all applications at the same time.
Goal: one bug to be tested, scheduled tracked independently against each released application. With automated systems that understand the architecture guiding us to make the right choices.
Bug Split Release Scenario:
We may release a patch of the search tool that contains a performance fix in a util library. Critical for the search tool, the fix is less visible in the indexer so it can wait until the next maintenance release. We want the one bug to be scheduled-tracked-released with the search patch and deferred for the indexer's next maintenance release.
So, when I create a bug in our tracking system (JIRA) I want it to magically become multiple objects.
primary issue describing the problem and tracking the development work
a set of tasks that allow me to track testing effort and for me to track how this issue has been released for each application it impacts.
How can we make the user experience of code sharing low effort to encourage more of it without becoming blind to what changes impacted which releases or forcing people to enter many duplicate bugs?
I'm sure that large scale projects from Eclipse to linux distros have faced this kind of problem and wonder how they have solved it (I'm going to poke around on them next).
Do any of you have experience with this kind of situation and how have you approached it?

In Jira you can allow sub-tasks so you could assign sub-tasks to the main task. You can also allow time tracking on the issues so you know how much time each task is taking and what the difference between estimated and actual is.
You can also enable versioning so you have a road map of what is being done in the next release with a change log. The problem with the road map is that it is only for one project so you can't have a road map that covers all of your projects.
Finally, you can create your own custom workflows to do almost anything you want to do. I've never tried this because we'd have to learn a new language to do it and the reason we got Jira was to decrease development overhead, not increase it by having to customise our bug tracker - but it is possible.

For jira, make use of the affects versions and fixed in versions (plus you can add multiple custom fields, like verified by QA in versions)

Allocating resources for project documentation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What would you suggest for the following scenario:
A dozen of developers need to build and design a complex system. This design needs to be documented for future developers and the design decisions must be noted. These reports need to be made about every two months. My question is how this project should be documented.
I see two possibilities. Each developer writes about the things they helped design and integrate and then one person combines each of these documents together. The final document will probably be incoherent or redundant at times since the person tasked of assembling everything won't have much time to adjust every part.
Assume that the documentation parts from each developer arrive just a few days before deadline. A collaborative system (i.e. wiki) wouldn’t work properly since there wouldn’t be anything to read until a few days before deadline.
Or should a few people (2-3) be tasked with writing the documentation while the rest of the team works on actually developing the system? The developers would need a way to transfer their design choices and conclusions to the technical writers. How could this be done efficiently?

We approach this from 2 sides, using a RUP style approach. In the first case, you'll have a domain expert who is responsible for roughing out the design of what you're going to deliver - with developers chipping in as necessary. In the second case, we use a technical author - they document the application, so they should have a good idea of how it hangs together, and you involve them right through the design and development process. In this case, they can help to polish the design, and to make sure that it matches what they thought was being developed.

We use confluence (atlassian's wiki-like-thing) and document all kinds of different "things". The developers do it continiously, and we push each other for docs - we let peer pressure decide what is necessary. Whenever someone new comes along he/she is tasked with reading through everything and to find out what still is correct. The incorrect stuff is either deleted or updated as a consequence of this. We're happy when we can delete stuff ;)
The nice thing about this process is that the relevant stuff stays and the irrelevant stuff is deleted. We always "got away" from the more formalized demands by claiming that we could always construct the word documents they wanted if "they" needed them. "They" never needed them.

I think alternative 2 is the less agile, because it means a new stage to the project (although it may be in parallel with tests).
If you are in an agile model, then just add documentation (following a guideline) as a story.
If you are in a staged approach, then I would nevertheless ask developers to work on documentation, following some guidelines, and review that documentation along the design and the code. Eventually, you may have a technical writer reviewing everything for proper English, but that would be a kind of "release" activity.

I think you can use Sand Castle to document your project.
Check it out
Sand Castle from Microsoft

It's not a complete documentation, but making sure that interfaces etc. are commented using Doxygen-style comments means writing code and documenting it are closer together.
That way, developers should document what they do. I still think a review by the architect(s) is needed to ensure consistent quality, but ensuring people document what they do is the best way to ensure they follow the architecture.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas