This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
Problem background
I am making a webapp. Users will be able to see their balance within the app at any given moment in time for any custom-set period. So they can print on screen their statement for last 90 days. There they will see the debit (services rendered by us) and credit (funds submitted by user). At the bottom of the table they will see the balance, which is say "-100$" (they owe us) or "+$460" (they have funds for future). But the tricky part is that before the statement there must be the preceding balance at the very start of statement. So that in this example, by the start of the output period they ALREADY owed us $1000 before. So at the bottom of the statement I need to sum the preceding balance and the current one for the selected period.
The challenge / the question
How do I start making this module? I am interested in an algorithm tips, db architecture etc. In short, what is the general algorithm (or some) to develop such a module?
P.S. my idea already is that I must record the current balance at any balance-related event from the user or the system.
Well, this is pretty straight forward.
As you mentioned, you need a database table, recording all your bookings. You actually don't need to record the current balance after each transaction, as this can be easily derived from the list of bookings (using the sum aggregate function).
You then select data based on user input, i.e. the time period they entered. To get the balance before that period, you just sum up all bookings up to the starting point (might do this in a separate query or as a sub query). You also might want to look into creating running totals, like described here for Ms SQL: Running Totals Ms SQL
Related
We need to perform certain calculations on a set of transactions using custom logic (will be written in Java or Python).
The calculations will be performed on transactions for specific period (e.g. 1st Jan to 31st 2017) and as at the time of calculation e.g. 31-Jan-2018.
It is possible for users to add (or cancel) back-dated transactions at any time. There will be hundreds of thousands transactions and calculation runs can be performed multiple times for the same time period.
Therefore, the business needs to know which transactions were used for which calculation run.
Does anyone know if there are any tools that can assist in in this data traceability to identify data that used for specific calculation?
I think it is difficult for any tool as our custom code knows the data it has used.
We are thinking of storing transactions (just identifiers) referred for each calculation in a database which can be used by data visualisation tools by the business. Given volume of transactions, this will take time (may be in hours) to insert those many records but it will be acceptable.
I will appreciate if anyone who faced similar problem can share their experience and how this was resolved. I am not sure if there is any standard pattern as it is probably not a common problem.
Thanks
I'm developing a solver for a VRPTW problem using the OptaPlanner and I have faced a problem when large number of customers need to be serviced. By the large number I mean up to 10,000 customers. I have tried running a solver for about 48 hours but no feasible solution was ever reached.
I use a highly customized VRPTW domain model that introduces additional planning entity so-called "Workbreak". Workbreaks are like customers but they can have a location that is actually another planning value - because every day a worker can return home or go to the hotel. Workbreaks have fixed time of departure (usually next day morning), and a variable time of arrival (because it depends on the previous entity within a chain). A hard constraint cares about not allowing to "arrive" to the Workbreak after certain point of time. There are other hard constraints too, like:
multiple service time windows per customer
every week the last customer in chain must be a special customer "storage space visit" (workers need to gather materials before the next week)
long jobs management (when a customer needs to be serviced longer than specified time it should be serviced before specific hour of a day)
max number of jobs per workday
max total job duration per workday (as worker cannot work longer than specified time)
a workbreak cannot have a location of a hotel that is too close to worker's home.
jobs can not be serviced on Sundays
... and many more - there is a total number of 19 hard constrains that have to be applied. There are 3 soft constraints too.
All the aforementioned constraints were initially written as Drools rules, but because of many accumulation-based constraints (max jobs per day, max hours per day, overtime hours per week) the overall speed of the solver (benchmarks) was about 400 step/sec.
At first I thought that solver's speed is too slow to reach a feasible solution in a reasonable time, so I have rewritten all rules into easy score calculator, and it had a decent speed - about 4600 steps/sec. I knew that is will only perform best for a really small number of customers, but I wanted to know if the Drools was the cause of that poor performance. Then I have rewritten all these rules into incremental score calculator (and survived the pain of corrupted score bugs until all of them were successfully fixed). Surprisingly incremental score calculation is a bit slower for a small number of customers, comparing to easy score calculator, but it is not an issue, because overall speed is about 4000 steps/sec - no matter how many entities I have.
The thing that bugs me the most is that above a certain number of customers (problems start at 1000 customers) the solver cannot reach feasible solution. Currently I'm using Late Acceptance and Step Counting algorithms, because they perform really good for this kind of a problem (at least for a less number of customers). I used Simulated Annealing too, but without success, mostly because I could not find good values for algorithm specific parameters.
I have implemented some custom moves too:
Composite move that changes workbreak's location when sibling entities are changed using other moves like change/swap moves (it helps escaping many score traps, as improving step usually needs at least two moves to be performed in a single step)
Move factory for better long jobs assignment (it generates moves that tries to put customers with longer service time in the front of a workday chain)
Workbreak assignment move factory (it generates moves that helps putting workbreaks in proper sequence)
Now I'm scratching my head, and wondering what I should do to diagnose the source of my problem. I suspected that maybe it was hitting a score trap, but I have modified the solver so it saves snapshots of best score each minute. After reading these snapshots I realized that the score was still decreasing. Can the number of hard constraints play the role? I suspect that many moves need to be performed to find out a move that improves the score. The fact is that maybe 48 hours isn't that much for this kind of a problem, and it should make computations a whole week? Unfortunately I have nothing to compare with.
I would like to know how to find out if it is solely a performance problem, or a solver (algorithm, custom moves, hard/soft score) configuration problem.
I really apologize for my bad English.
TL;DR but FWIW:
To scale above 1k locations you need to use NearBy selection.
To scale above 10k locations, add Partitioned Search too.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Suppose I have an online store application that contains millions of items that are maintained by the application. The application is so famous that millions of items get sold for each hour. I store all this information in a database, say Oracle DB.
Now if I want to show the top 5 items sold in the last 1 Hour then I can write a query something like :
Get the list of products that were sold in last 1 Hour.
Find the count of each product from above result and order by that count value, then display the top 5 records.
This seems to be a working query, but the problem is, for each 1 Hour if I am having millions of items sold then running this query against the table that contains all the transactional information will definitely hit performance issues. How can we fix such issues? Are there any other way of implementing it.
As a note, Amazon at its peak on Cyber Monday is selling a bit over a million items per hour. You must have access to an incredible data store.
Partitioning is definitely one solution, but it can be a little complicated. When you say "the last hour" that can go over a partitioning boundary. Not a big deal, but it would mean accessing multiple partitions for each query.
Even one million items and hour is just a few hundred items per second. This might give you enough leeway to add a trigger (or probably logic to an existing trigger) that would maintain a summary table of what you are looking for.
I offer this as food-for-thought.
I doubt that you are actually querying the real operational system. My guess is that any environment that is handling even a dozen sales per second is not going to have such queries running on the operational system. The architecture is more likely a feed into a decision support system. And, that gives you the leeway to implement an additional summary table as data goes into the system. This is not question of creating triggers on a load. It is, instead, a question of loading detailed data into one table and summary information into another table, based on how the information is being passed from the original operation system to the decision support system.
I think you should try the partitioning.
E.g. you can split the data for each month/week/whatever into different partitions using maybe range partitioning and then for the last hour it is quite easy to run the query only for a specific, last partition. See partitioning-wise joins to learn more about it.
Of course, you'll need to perform some specific implementation steps, but every war can require some sacrifice...
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
As a learning exercise (I am not in school - just an old guy trying to learn something new), I am trying to write a logic gate simulation that incorporates propagation delay. The user should also be able to group gates together to create higher-level objects.
I want to apply design patterns to my problem, but I am having a hard time.
I am reading Head First Design Patterns, and I see that the Command pattern is a good way to simulate electrical pulses through a circuit with a delay. I also see that the Composite pattern is a good way to simulate nested units. I just don't know how to mix the two.
In other words, as I loop through my gates, I see that gate 'x' should fire. It has a 15 nanosecond delay, so I create a command with a time stamp of 15 ns from current game time. Where is the dispatcher? In the example of the diner, with the command being the 'Order', the waitress and the cook each dispatch the command and have the option of introducing a delay. If I have a 'composite' gate, does it also have its own dispatcher? Do I need to use a Singleton to manage the queue?
I read what I could find, but I still need a push in the right direction:
Using Command Design pattern
Client Server Command Design pattern with variable delays
Composite of Commands Design Pattern
How can I calculate propagation delay through series of combinational circuits using Verilog and FPGA?
I attended a payroll software demo yesterday wherein the year dropdowns throughout the software ran from 2000 to 2200. Now, we've all been down this road before with 2 digit shortsight, but honestly - a 200 year service life for a Java & Oracle payroll system? Our Board of Directors would be thrilled if the company was even solvent for 1/4th that long.
When forced to use a dropdown year select, where do you draw the line?
It depends upon the usage. If you're trying to ascertain retirement dates for financial planning, you need to allow users to select years decades into the future. If you're asking for credit card expiration dates, current year + 10 should be more than sufficient. Either way, you would be populating these dropdowns dynamically, lest you desire touching up the user interface every year.
Why not make your app end-user-configurable? Give them a config screen, let them enter a cut-off year as 4 digits and refer to that in the code?
I like to make as much as possible end-user-configurable - it means I can ship one s/w to multiple customers, and it pushes off some tricky decisions to them :-)
The drawback of such a large range is that the dropdown becomes unwieldy - there will certainly be a scrollbar, and it becomes harder to find the year you're looking for.
If it has to handle retirement dates, I'd say 55 years into the future would be sufficient (an 18 year old will probably be retired by 73). My limited experience with such systems precludes me from knowing what a reasonable limit would be otherwise - perhaps you can enlighten us?
Who's forcing you to use a dropdown year select? They're annoying as all hell.
Do a research project showing that typing in a 4-digit date takes less time than using a pulldown big enough to have a scrollbar, multiply the time difference by a vastly inflated estimate of how many people will be using the software, multiply that by a vastly inflated estimate of the pay rate for data entry, and show the company how you can save $18.7 billion over the life of the software.
The oldest confirmed human age is 115. So my bet would be to set it to 120.