I'm wondering if there is a language that supports programming in the following style that I think of as "data driven":
Imagine a language like C or Python except that it's possible to define a function whose input parameters are bound to particular variables. Then, whenever one of those input variables changes, the function is re-run.
Alternatively, the function might only ever be run when its output is needed, regardless of when its inputs are changed.
The first option is useful when things need to be kept up to date whenever something changes. The second option is useful when the computation is expensive, and you want to run it only when needed.
In my mind, this type of programming paradigm would require many stacks, perhaps one for each function that was defined in the above manner. This would allow those functions to be run in any order, and it would allow their execution to be occassionally blocked. Execution on one stack would be blocked anytime it needs the output of a function whose inputs are not yet ready.
This would be a single-threaded application. The runtime system would take care of switching from one stack to another in an appropriate manner. Deadlocks would be possible.
I have a program that reads a JSON file, calculates, and outputs a JSON file on S3.
My question is how I should systematically check the output calculation seems okay?
I understand writing a unit test is a way I should do, but it doesn’t guarantee that the output file is safe. I’m thinking of making another program running on lambda that checks the output JSON.
For example, let’s say the program is calculating dynamic pricing in an area where has upper-bound value. Then I want to make sure all the calculation results in the JSON file don’t exceed the upper bound value or at least I’d like to monitor they are all safe or there are some anomalies.
I want to build an efficient and robust anomaly detection system so I don’t want to build the anomaly check in the same program to avoid single-point failures. Any suggestions are welcomed.
One option is to create a second lambda function with the S3 trigger to fire when the JSON file is written into S3 from the original function.
In this 2nd lambda, you can verify the data and if there is anomaly, you may trigger an SNS or EventBridge event which can be used to log/inform/alert about the issue or may be to trigger a separate process to auto-correct anomalies.
You should use Design by Contracts aka Contract Oriented Programming. Aka preconditions and postconditions.
If the output shall never exceed a certain value, then that is a postcondition of the code producing this value. The program should assert its postconditions.
If some other code relies on a value being bounded, then that is a precondition of that code. The code should assert this precondition. This is a type of Defensive Programming technique.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Context
By conformity check I mean eliminating queries that definitely are going to return nothing.
For example:
Consider table boxes, where one of the available columns is color CHAR(6);
A user sends this string 'abcdefg' to be queried against column color through his interaction with the front-end;
Then, the back-end would execute a query similar to SELECT * FROM boxes WHERE color = ?, using the same string mentioned above;
At least in my PostgreSQL installation I can execute this query, even knowing it's never going to return anything (the length of 'abcdefg' is 7).
Currently, both the front-end and the back-end perform conformity checks prior to accessing data from our DB (to avoid unnecessary calls).
As a matter of fact, the front-end is designed to forbid users from requesting invalid queries. But supposing that these checks didn't take place, especially at the back-end, how significant would that be to an application?
Question
How does PostgreSQL treats these queries, does it have any type of algorithm that instantly returns nothing if such a query is executed? Or would it be better to not call the DB and just send to the user something like not found or invalid request?
Further Context
We already sanitize all input acquired from our front-end interfaces, so this is not a question about the possible benefits/downsides regarding the safety gained after the execution of these checks.
The language used at our back-end is Go, which I believe to have no issues at performing these checks regularly (i.e. on most HTTP requests).
PS.: I know you can cast hexadecimal to ints in PostgreSQL, this is just a hypothetical problem which I used to ease the comprehension of the problem (I hope it did).
I would perform such checks either in the frontend or in the backend, wherever it is most convenient, but not in both. The second line of defense is the database, and two is enough.
It is a good thing to find incorrect data in the application, but don't go overboard: if you hard-code something like a maximal string length in both the database and the application, you'll have to modify that limit in two places whenever you do, and code redundancy is a bad thing.
What is still sane depends a lot on taste and opinion: I think it is fine to check length limits in the application rather than relying on errors from the database, but I think it is questionable to burden the application with complicated logic that guesses at the results of SQL statements.
What is important is to model all your important consistency checks in the database, then nothing much can go wrong as long as you catch and gracefully handle database errors. Everything beyond that can be considered performance tuning and should only be done if it offers a demonstrable benefit.
Let me explain this question a bit :)
I'm writing a bunch of stored procedures for a new product.
They will only ever be called by the c# application, written by the developers who are following the same tech spec I've been given.
I cant go into the real tech spec, so I'll give an close enough example:
In the tech spec, we're having to store file data in a couple of proprietary zip files, with a database storing the names and locations of each file within a zip (eg, one database for each zip file)
Now, lets say that this tech spec states that, to perform "Operation A", the following steps must be done:
1: Calculate the space requirements of the file to be added
2: Get a list of zip files and their database connection strings (call stored proc "GetZips")
2: Find a suitable location within the zip file to store the file (call stored proc "GetSuitableFileLocation" against each database connection, until a suitable one is found)
3: In step 2, you will be provided with a start/end point within the zip to add your file.
Call the "AllocateLocationToFile" stored proc, passing in these values, then add your file to the zip.
OK - so the question is, should "AllocateLocationToFile" re-check the specified start/end points are still "free", and if not, raise an exception?
There was a bit of a discussion about this in the office, and whilst I believe it should check and raise, others believe that it should not, as there is no need due to the developer calling "GetSuitableFileLocation" immediately beforehand.
Can I ask for some valued oppinions?
Generally, it is better to be as safe as possible. A calling code should never rely on an external code (the sps are kind of external). The idea is that you can not predict what would happen in the future. New guys come to the company... the sps are given to another team and so on...
Personally, the fact that B() is right after A() doesn't guarantee anything. To change this for whatever reason is not something to be considered impossible.
A team should never take decisions based on "we are going to maintain this, no problem at all" because they might get fired, the company may sell the product and so on..
My suggestion is to do the checking, profile the code and if it is really a bottleneck to remove it, but write somewhere that THIS CAN BREAK!.
Given that you're manipulating files, with all the potential havoc this can create, I'd say in this scenario the risk (damage component) is high enough to be cautious.
And Svetlozar's right: what if great success does cause re-use, or other added-on applications? Not everyone may be as well-behaved as your team is right now.
One reason why it might be a good idea would involve race conditions. Is it possible two users could call the process at the same time and get the same values? Please at least test this scenario with the currently designed process.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Quick question: I'd like to hear your thoughts on when to use "State" versus "Status" when naming both fields such as "Foo.currentState" vs "Foo.status" and types, like "enum FooState" vs "enum FooStatus". Is there a convention discussed out there? Should we only use one? If so which one, and if not, how should we choose?
IMO:
status == how are you? [good/bad]
state == what are you doing? [resting/working]
It depends on the context.
State generally refers to the entire state of an entity - all its values and relationships at a particular point in time (usually, current)
Status is more of a time-point, say, where something is at in a process or workflow - is it dirty (therefore requiring saving), is it complete, is it pending input, etc
Typically I will use State to mean the current condition of an object or the system as a whole. I use status to represent the outcome of some action. For example, the state of an object may be saved/unsaved, valid/invalid. The status (outcome) of a method is successful/unsuccessful/error. I think this jibes pretty well with the definition of status as "state or condition with respect to circumstances," the circumstances in this case being the application of an action/method.
Another (entirely pragmatic) reason to prefer state over status is that the plural is straightforward:
state -> states
status -> statuses
And believe me, you will sooner or later have a list or array or whatever of states in your code and will have to name the variable.
I think many people use "Status" to represent the state of an object if for no other reason than "State" refers to a political division of the United States.
I think you could add another perspective to the equation, namely 'sender-requester'.
From a senders perspective, I'd communicate my state with anyone willing to listen. While from a requesters perspective, I'd be asking for someone's status.
The above could also be interpreted from an uncertainty point of view:
Defined = state
Undefined = status
What's your status? I'm in a relaxed state.
I'm pretty sure this is just one interpretation, which may not apply to your particular situation.
A quick dictionary check reveals that status is a synonym for state, but has an additional interpretation of a position relative to that of others.
So I would use state for a set of states that don't have any implicit ordering or position relative to one another, and status for those that do (perhaps off-standby-on ?). But it's a fine distinction.
A lot of the entities I deal with (accounts, customers) may have a State (TX, VA, etc.) and a Status (Active, Closed, etc.)
So the point about the term being misleading is possible. We have a standardized database naming convention (not my personal choice) where a state is named ST_CD and a status would be ACCT_STAT_CD.
With an enum in an OO milieux, this issue is not as important, since if you have strict type safety, the compiler will ensure that no one attempts to do this:
theCustomer.State = Customer.Status.Active;
If you are in a dynamic environment, I would be more worried!
If you are dealing with a domain where state machines or other state information and that terminology is predominant, then I would think State is perfectly fine.
We had this exact debate on my current project a while back. I really don't have a preference, but consistency is an important consideration.
The first (there are several) definition of "state" in my Sharp PW-E550 (an awesome dictionary, I might add) is "the particular condition that someone or something is in at a specific time." The first definition of "status" is "the relative social, professional, or other standing of someone or something". Even the second (and last) definition of "status" is inferior to "state" in this context: "the position of affairs at a particular time, esp. in political or commercial contexts."
So if we wanted it to be as easy as possible for someone using my dictionary (it uses the New Oxford American Dictionary, 2001), "state" would be the best choice.
Furthermore, there is a design pattern described in the Gang of Four's book called the State Pattern, firmly establishing the term in the computing lexicon.
For these reasons I suggest "state".
P.S. Is that you DDM? Are you still bitter about "state" versus "status" ?!!!!!!! LMAO!
Not the same thing at all. Stopped and started are states. Stopping and starting are status.
If you make them them the same thing how do you describe the vehicle as stopped but is currently starting. Or an application as currently lodged but hasn't yet entered the approval process or is being approved but is currently on hold with an error condition of awaiting signature?
Well, they do mean the same thing. I don't think it's necessary to promulgate a great preference of one over the other, but I would generally go with "status", because I like things that sound Latinate and classicist. I mean, in my world, the plural of schema is schemata, so there's pretty much no other way for it to go, with me.
Sophistifunk, I'm sure you'll get arguments for both State and Status. The most important thing to do is that you pick one, and use only one. I'd suggest discussing this with your team and see what everyone agrees on.
That said, my suggestion is as follows.
Assuming you are using an object-oriented programming language, an object's "state" is represented by the object itself. SomeObject.state is misleading imo. I'm not sure what "status" represents in your example, but my natural intuition is to prefer this to state.