How to make dynamic logical statements in memsql? - sql

I am trying to figure out if there is a way of creating a query that is composed of dynamic logical statements (AND and OR operators) in a configurable and persistent manner.
say I want to make a set of events and bundle them under an entity called feature, so each feature is composed by events.
For example,
featureA is eventA and eventB,
featureB is (eventB and eventC) or eventD
I was suggesting:
making an S expression column, and save it under JSON column then parse it to query
creating the where clause by hand, then save it under a text column and run it later, with a view reflecting the data prettier
then, I realised I can't execute (like eval) stored strings as mentioned here.
so it comes down to what I was trying to avoid which is running and manipulating everything via client side querying. I needed a pure sql solution for further use by our data analysts.
Any suggestions?

You can execute dynamic SQL statements with https://docs.memsql.com/sql-reference/v6.7/execute-immediate/, see that page for some examples (prepared statements is a different topic, I don't think it is related to what you are looking for).
You may also be interested in https://docs.memsql.com/concepts/v6.7/persistent-computed-columns/, which allows you to define columns that are computed as sql expressions from other columns - so you could define your features this way.

Related

Why to use full 2-part table names in SQL Server

I asked a previous question about shortening the table names in SQL Server, and came across this article: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2012/dn148262(v=msdn.10), which suggests using the fully scoped table name, such as [dbo].[Title] instead of just Title.
What is the reason to use the fully-scoped table name? I've never come across a place where there have been multiple identical table names in a database (actually I didn't even think it's possible?), so for me that seems like it's unnecessary, but I'm very new to SQL Server, so was wondering why this is the preferred way to do it.
From the documentation that you gave link to:
"For ad-hoc queries and prepared statements, query reuse will not occur if object name resolution needs to occur. Queries that are contained within stored procedures, functions, and triggers do not have this limitation.
However, in general it should be noted that the use of two-part object names (that is, schema.object) provides more opportunities for plan reuse and should be encouraged."
query reuse will not occur if object name resolution needs to occur

Netezza - reuse common sql code across multiple queries in a DRY manner?

Is it possible with Netezza queries to include sql files (which contain specific sql code) or it is not the right way of usage ?
Here is an example.
I have some common sql code (lets say common.sql) which creates some temp table and needs to be used across multiple other queries (lets say analysis1.sql, analysis2.sql etc.) - . From a code management perspective it is quite overwhelming to maintain if the code in common.sql is repeated across the many other queries. Is there a DRY way to do this - something like #include <common.sql> from the other queries to call the reused code common.sql ?
Including sql files is not the right way to do it. If you wish to persist with this you could use a preprocessor like cpp or even php to assemble the files for you and have a build process to generate finished ones.
However from a maintainability perspective you are better off creating views and functions for reusable content. Note that this can pose optimization barriers so large queries are often the way to go.
I agree, views, functions (table values if needed) or more likely: stored procedures is the way to go.
We have had a lot of luck letting stored procedures generate complex but repeatable code patterns on the fly based on input parameters and metadata on the tables being processed.
An example: all tables has a 'unique constraint' (which is not really unique, but that doesn't matter since it isn't enforced in Netezza) that has a fixed name of UBK_[tablename]
UBK is used as a 'signal' to the stored procedure identifying the columns of the BusinessKey for a classic 'kimball style' type 2 dimension table.
The SP can then apply the 'incoming' rows to the target table just by being supplied With the name of the target table and a 'stage' table containing all the same column names and data types.
Other examples could be a SP that takes a tablename and three arguments each with a 'string,of,columns' and does a 'excel style pivot' with group-by on the columns in the first argument and uses the second argument as to do a 'select distinct' to generate new column names for the pivoted columns, and does a 'sum' on the column in the third argument into some target table you specify the name for...
Can you follow me?
I think that the nzsql command line tool may be able to do an 'include', but a combination of strong 'building block stored procedures' and perl/python and/or an ETL tool will most likely proove a better choice

SSRS: Patterns for conditional dataset definitions

I am developing SSRS reports that require a user selection (via a parameter) to retrieve either live data or historical data.
The sources for live and historical data are separate objects in a SQL Server database (views for live data; table-valued functions accepting a date parameter for historical data), but their schemas - the columns they return - are the same, so other than the dataset definition, the rest of the report doesn't need to know what its source is.
The dataset query draws from several database objects, and it contains joins and case statements in the select.
There are several approaches I can take to surfacing data from different sources based on the parameter selection I've described (some of which I've tested), listed below.
The main goal is to ensure that performance for retrieving the live data (primary use case) is not unduly affected by the presence of logic and harnessing to support the history use case. In addition, ease maintainability of the solution (including database objects and rdl) is a secondary, but important, factor.
Use an expression in the dataset query text, to conditionally return full SQL query text with the correct sources included using string concatenation. Pros: can resolve to a straight query that isn't polluted by the 'other' use case for any given execution. All logic for the report is housed in the report. Cons: awful to work with, and has limitations for lengthy SQL.
Use a function in the report's code module to do the same as 1. Pros: as per 1., but marginally better design-time experience. Cons: as per 1., but also adds another layer of abstraction that reduces ease of maintenance.
Implement multi-step TVFs on the database, that process the parameter and retrieve the correct data using logic in T-SQL. Pros: flexibility of t-SQL functionality, no string building/substitution involved. Can select * from its results and apply further report parameters in the report's dataset query. Cons: big performance hit when compared to in-line queries. Moves some logic outside the rdl.
Implement stored procedures to do the same as 3. Pros: as per 3, but without ease of select *. Cons: as per 3.
Implement in-line TVFs that union together live and history data, but using a dummy input parameter that adds something that resolves to 1=0 in the where clause of the source that isn't relevant. Pros: clinging on to the in-line query approach, other pros as per 3. Cons: feels like a hack, adds performance hit just for a query component that is known to return 0 rows. Adds complexity to the query.
I am leaning towards options 3 or 4 at this point, but eager to hear what would be a preferred approach (even if not listed here) and why?
What's the difference between live and historical? Is "live" data, data that changes and historical does not?
Is it not possible to replicate or push live/historical data into a Data Warehouse built specifically for reporting?

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.