Should the back-end perform conformity checks before a SQL query? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Context
By conformity check I mean eliminating queries that definitely are going to return nothing.
For example:
Consider table boxes, where one of the available columns is color CHAR(6);
A user sends this string 'abcdefg' to be queried against column color through his interaction with the front-end;
Then, the back-end would execute a query similar to SELECT * FROM boxes WHERE color = ?, using the same string mentioned above;
At least in my PostgreSQL installation I can execute this query, even knowing it's never going to return anything (the length of 'abcdefg' is 7).
Currently, both the front-end and the back-end perform conformity checks prior to accessing data from our DB (to avoid unnecessary calls).
As a matter of fact, the front-end is designed to forbid users from requesting invalid queries. But supposing that these checks didn't take place, especially at the back-end, how significant would that be to an application?
Question
How does PostgreSQL treats these queries, does it have any type of algorithm that instantly returns nothing if such a query is executed? Or would it be better to not call the DB and just send to the user something like not found or invalid request?
Further Context
We already sanitize all input acquired from our front-end interfaces, so this is not a question about the possible benefits/downsides regarding the safety gained after the execution of these checks.
The language used at our back-end is Go, which I believe to have no issues at performing these checks regularly (i.e. on most HTTP requests).
PS.: I know you can cast hexadecimal to ints in PostgreSQL, this is just a hypothetical problem which I used to ease the comprehension of the problem (I hope it did).

I would perform such checks either in the frontend or in the backend, wherever it is most convenient, but not in both. The second line of defense is the database, and two is enough.
It is a good thing to find incorrect data in the application, but don't go overboard: if you hard-code something like a maximal string length in both the database and the application, you'll have to modify that limit in two places whenever you do, and code redundancy is a bad thing.
What is still sane depends a lot on taste and opinion: I think it is fine to check length limits in the application rather than relying on errors from the database, but I think it is questionable to burden the application with complicated logic that guesses at the results of SQL statements.
What is important is to model all your important consistency checks in the database, then nothing much can go wrong as long as you catch and gracefully handle database errors. Everything beyond that can be considered performance tuning and should only be done if it offers a demonstrable benefit.

Related

RDBMS results return, ordering and returning sets/hashmaps instead of arrays/lists [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Most/all SQL based RDBMS connector libraries I've come into contact with will return results in array form. Why? If the order is arbitrary (without a sorting SQL modifier), then could the natural data return be in the form of something like a Set or Hashmap? These data structures would, in some cases, be more computationally favorable at scale than a typical array/list return in languages like C++ (with standard template library usage), JavaScript/Node, Go, and any other language that supports associative data types or pure Sets.
In particular, do libraries such as knex.js offer such a feature in the form of a connection flag (I haven't found it yet)?
Do any of the major RDBMS systems (MySQL, PostgreSQL, ...), offer the ability to return results in a set/hashmap form?
Concretely, what I think would make sense using node.js and a library like knex.js, is to specify a flag like:
knex.forceMap('keycolumnpattern') Or, knex.forceSet()...
Again, the underlying assumption here would be that you are not imposing order on the SQL (or other) query by adding a sort directive i.e. ORDER BY
The justification for this is in environments were scaling and computational complexity are important concerns.
Good question.
This is by no means a comprehensive answer, but just my opinion on this curious question.
Usually databases return a series of rows, that most documentation refers to as "result set".
The database
Now, this result set is assembled when the query is executed and may take different forms. Most likely the database sends it as an "enumeration": this is, a list-like entity that produces rows when you request them. In order to save resources database will try not to materialize the whole result set at once, but to produce rows as you read them from your client application. Well, this happens as long as the query can be "pipelined".
When the query cannot be pipelined, then the whole data set (in the database side) is materialised.
The driver
You client driver does not retrieve rows one by one but in groups of them by the use of buffering. Even when the query cannot be pipelined your client driver will also retrieve the rows in groups according to the "fetch size" and "buffer size".
The client technology
Your application can use basic driver primitive operations, or a more sophisticated ORM. It's common that ORMs will hide all inner workings of the driver and will offer you a "simple" result like an array, list, or map, i.e., hiding the "streaming" an enumeration provides.
If you don't use an ORM, then you will probably call the driver primitives yourself and therefore you can get access to all inner, ugly details. The upside is that you can assemble the result set rows in any data structure you prefer.
In any case, the repertoire of data structures will depend on the specific query since a "map" or a "set" will require some king of unique identifier, while a list won't.

Is it still considered a bad practice to use cursor in sql? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've read some articles about cursors in sql and most of them says that it eats up a lot of memory and etc., but that was long ago, like 2008 or so? I want to know if is cursor still considered a bad practice up to now?
Bad practice? Good practice?
As with many aspects of many different languages, cursors have both positives and negatives. As a general rule, cursors make it much more difficult to optimize queries. So, if a query can be expressed as a set-based query, then it should be.
However, under many circumstances, cursors are necessary. There is no reasonable alternative and they are a bona fide, powerful part of the language.
It's not so much a cursor is bad practice (it quite often is!). Sometimes using cursors is a valid solution to a problem.
It's more to do with thinking procedurally is generally not as efficient as thinking in terms of sets and joins.
The classic example of this is termed 'N + 1' in the ORM world: this refers to making 1 query to get (say) a list of IDs and then N further queries to retrieve the rows for those IDs. This can often be done as a single join query.
If you find yourself solving a problem using a cursor(s), stop for a moment and consider whether it could be done in a set based manner.
From MSDN:Cursor Implementations
Using a cursor is less efficient than
using a default result set. In a
default result set the only packet
sent from the client to the server is
the packet containing the statement to
execute. When using a server cursor,
each FETCH statement must be sent from
the client to the server, where it
must be parsed and compiled into an
execution plan.
If a Transact-SQL statement will
return a relatively small result set
that can be cached in the memory
available to the client application,
and you know before executing the
statement that you must retrieve the
entire result set, use a default
result set. Use server cursors only
when cursor operations are required to
support the functionality of the
application, or when only part of the
result set is likely to be retrieved.
Cursor comes with both positive and side effects, if you use them for what they're designed for, we can't said that it is actually a bad practices.

Whitelisting user input instead of using prepared statement to prevent sql injection? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've read few articles about sql injection prevention. Most of them recommend using prepared statement to prevent sql injection and whitelisting is just an additional solution. I can't get their points.
IMHO, whitelisting user input is much better since it can also prevent XSS attack. Whitelisting is just not possible when no character is restricted. And this case is infrequent.
Let's consider this example in nodejs.
Prepared statement
DB.query("UPDATE user SET username=?",username,cb);
Whitelisting
//assume that username is alphabetic
if(!/^[a-z]+$/.test(username)){
throw new Error('Invalid user name');
}
DB.query("UPDATE user SET username='"+username+"'",cb);
What do you guys think? Whitelisting or prepared statements? Why don't you recommend whitelisting user input over prepared statements?
Prepared statements absolutely prevent SQL injection vulnerabilities in statements where they are used, period, paragraph.
Validating or "sanitizing" your input may appear similarly effective, but the unquestioned consensus among experts is that there is no effective alternative for prepared statements, and concatenated queries are simply unacceptable.
There are ways of getting bad data past validation attempts that you have not yet imagined, and likely never will, until your server is exploited. There are, for example, exploits involving alternate character sets, which can sail right through what seems like proper validation or escaping.
But the apparent effectiveness aside, there's also an overriding principle at play: the fundamental and vital aspect of prepared statements that you appear not to be considering is that they impose the correct separation between "code" and "data." (In other environments, a breach of the boundary between code and data is at the heart of "buffer overrun" vulnerabilities.) In SQL, the query is the code, and the values supplied are data. They are different kinds of "things," and should be kept segregated, as a matter of principle.
Prepared statements don't simply substitute the ? with the values. The query and the values are provided separately to the database server in different data structures. With this mechanism, it becomes literally impossible for the database server to get it wrong and blur the boundary.
An effective illustration of this fact is the fact that you can't use ? placeholders for database object identifiers, like table or column names, providing the value as an argument. That doesn't work, because it is not supposed to work. Table and column names are part of the code, not part of the data.
"Impossible" to get it wrong is a term that you can't apply to your attempts at input validation.
Usernames are also an overly simplistic example, since they are easily constrained to ascii alpha. Many or most other columns are not. Make a change, later, and "oops," you forgot to handle something. Perhaps it was something inconceivably remote and unlikely, but now it's just waiting to be exploited.
There have also been a number of excellent comments to your question, which you would do well to take into consideration.
Prepared statements are the correct mechanism for handing outside data to a database... but I would argue that experts and professionals don't even ask themselves whether the data source is trustworthy or not -- we use placeholders and prepared statements consistently and unconditionally, without regard to the origin of the data we're passing.
Of course, it should be obvious that my point is not to downplay protecting against other vulnerabilities, but that is not the role of prepared statements. However effective other mechanisms may appear at assisting with the task, and however useful they may be for their intended purpose, they are not a substitute for proper handling of data on its way to the dbms.

Are user-defined SQL datatypes used much? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My DBA told me to use a user-defined SQL datatype to represent addresses, and then use a single column of that new type in our users table instead of multiple address columns. I've never done this before and am wondering if this is a common approach.
Also, what's the best place to get information about this - is it product-specific?
As far as I can tell, at least in the SQL Server world, UDT aren't used very much.
Trouble with UDT is the fact you can't easily update them. Once created and used in databases, they're almost like set in stone.
There's no "CREATE OR ALTER (UDT)" command :-( So to change something, you have to do a lot of shuffling around - possibly copying away existing data, then dropping lots of columns from other tables, then dropping your UDT, re-creating it with the new structure and reapplying the data and everything.
That's just too much hassle - and you know : there will be change!
Right now, in SQL Server land, UDT are just a nice idea - but really badly implemented. I wouldn't recommend using them extensively.
Marc
There are a number of other questions on SO about how to represent addresses in a database. AFAICR, none of them suggest a user-defined type for the purpose. I would not regard it as a common approach; that is not to say it is not a reasonable approach. The main difficulties lie in deciding what methods to provide to manipulate the address data - those used for formatting the data to appear on an envelope, or in specific places on a printed form, or to update fields, worrying about the many ramifications of international addresses, and so on.
Defining user-defined types is very product specific. The ways you do it in Informix are different from the ways it is done in DB2 and Oracle, for example.
I would also rather avoid using User defined datatypes as their defination and usability will make your code dependant on a particular database.
Instead if you are using any object oriented language, create a composition relationship to define addresses for an employee (for example) and store the addresses in a separate table.
Eg. Employees table and Employee_Addresses table. One employee can have multiple addresses.
user-defined SQL datatype to represent addresses
User-defined types can be quite useful, but a mailing address doesn't jump out as one of those cases (to me, at least). What is a mailing address to you? Is it something you print on an envelope to mail someone? If so, text is about as good as it's going to get. If you need to know what state someone is in for legal reasons, store that separately and it's not a problem.
Other posts here have criticized UDTs, but I think they do have some amazing uses. PostgreSQL has had full text search as a plugin based on UDTs for a long time before full-text search was actually integrated into the core product. Right now PostGIS is a very successful GIS product that is entirely a plugin based on UDTs (it has GPL license, so will never be integrated into core).

Where do you store long/complete queries used in code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Here's my situation. I've noticed that code gets harder to maintain when you keep embedding queries in every function that will use them. Some queries tend to grow very fast and tend to lose readability after concatenating every line. Another issue that comes with all of the concatenation is when you must test a specific query and paste it. You are forced to remove all of the " that held your query string together.
So my question is, what methods are being used to separate queries from the code? I have tried searching but it doesn't look like it's the right thing because i'm not finding anything relevant.
I'd like to note that views and stored procedure are not possible since my queries fetch data from a production database.
Thank you.
If you follow an MVC pattern, then your queries should be all in the model - i.e. the objects representing actual data.
If not, then you could just put all your repetitive queries in script files, including only those needed in each request.
However, concatenating and that kind of stuff is hard to get rid of; that's why programmers exist :)
These two words will be your best friend: Stored Procedure
I avoid this problem by wrapping queries in classes that represent the entities stored in the table. So the accounts table has an Account object. It'll have an insert/update/delete query.
I've seen places where the query is stored in a file and templates are used to replace parts of the query.
Java had something called SQLJ - don't know if it ever took off.
LINQ might provide some way around this as an issue too.
Risking being accused of not answering the question, my suggestion to you would be simply don't. Use an O/RM of your choice and you'll see this problem disappear in no time.
I usually create a data class that represents the data requirements for objects that are represented in the database. So if I have a customer class, then I create a customerData class as well that houses all the data access logic in them. This keeps data logic out of your entity classes. You would add all you CRUD methods here as well as custom data methods.
Youc an also use Stored Proceedures or an ORM tool depending on your language.
The main key is to get your data logic away from your business and entity logic.
I use stored procedures in my production environment, and we process the rows too...