I want to deploy multiple Lambda Functions with each one issuing Athena SQL queries. This query may change depending on schema changes of the table involved.
I'm considering to create a sql file in s3 or to redeploy this s3 lambda function every time the queries change. Is there any recommended approach for this use case?
It depends on which of the following is more important.
Speed of modifying SQL statements
Speed of Lambda function execution
You could have the Lambda function reach out to an S3 bucket to grab a SQL file on every invocation but that would be wildly inefficient and more expensive. You could slightly improve upon this by using a caching strategy to check if the file had changed (by hash/checksum) before pulling it down.
The better approach would be to include the SQL in the function and simply re-deploy when you want to change the SQL. Without, further context I can't say if this will suit your needs (perhaps you need to be able to swap the SQL very quickly.) Deploying Lambda functions can be very quick however if setup properly (<1 minute to update a single function).
There might be a third approach where you parameterize your Lambda function and SQL commands such that you don't need to change the SQL as frequently and simply pass different parameters to the function.
Related
I have a requirement to move some of the existing frontend applications running Teradata as the backend to Google BigQuery. One of the common pattern used in these frontend applications is to call a Macro in Teradata, based on different input selected by users. Considering BigQuery doesn't have a way to create a macro entity, how can I replace this and have the frontend calling BigQuery to execute something similar. Connection to BigQuery is through ODBC/JDBC or java services.
A macro in Teradata is just a way to execute multiple SQL statements as a single request, which is in turn treated as a single transaction. It also allows you to parameterize your query.
If your new DB backend supports it, you can convert the macros into stored procedures / functions. Otherwise, you can pull out the individual SQL statements from the macro and try to run them together as a single transaction.
These links may be helpful: Functions,
DML
Glancing at the documentation, it looks like writing a function may be your best bet: "There is no support for multi-statement transactions."
You can look at Bigquery scripting which is in Beta - https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#bigquery-scripting for migrating your macros from Teradata. With this release you can write procedures where you can define all your business logic and then execute the procedure using a CALL statement.
Thanks,
Jayadeep
As mentioned above:
A macro in Teradata is just a way to execute multiple SQL statements
as a single request, which is in turn treated as a single transaction.
It also allows you to parameterize your query.
Having said that, you just need to do the migrating part from teradata, here you can find the guide to do this, and answering your question, the connection is made through JDBC whose drivers are tdgssconfig.jar and terajdbc4.jar.
Im designing a UWP app that uses an SQLite database to store its information. From previous research I have blearnt that using the SQLite function SQLiteConnection.Update() and SQLiteConnetion.Insert() functions are safe to use as the inputs are sanitised before entering in the database.
The next step I need to do is sync that data with an online database - in this case SQL Server - using a service layer as my go between. Given that the data was previously sanitised by the SQLite database insert, do I still need to parameterise the object values using the service layer before they are passed to my SQL Server database?
The simple assumption says yes because, despite them being sanitised by the SQLite input, they are technically still raw strings that could have an effect on the main database if not parameterised when sending them there.
Should I just simply employ the idea of "If in doubt, parameterise" ?
I would say that you should always use SQL parameters. There are a few reasons why you should do so:
Security.
Performance. If you use parameters the reuse of execution plans could increase. For details see this article.
Reliability. It is always easier to make a mistake if you build SQL commands by concatenating strings.
I'm new to Spark and would like to know if there is any possibility to pass Spark an SQL script for processing.
My goal is to bring data from both mysql through jdbc and Cassandra into Spark and pass an SQL script file without having to modify it or minimal modifications applied to it. The reason why I'm saying minimal modifications is that I have a lot of SQL scripts (similar structure to stored procedures) which I don't want to convert them manually to RDD.
Main purpose is to process the data (execute these SQL scripts) through Spark, thus taking advantage of its capabilities and speed.
This guy found a pretty general way to run sql scripts, just pass in the connection to your database:
https://github.com/syncany/syncany/blob/15dc5344696a800061e8b363f94986e821a0b362/syncany-lib/src/main/java/org/syncany/util/SqlRunner.java
One limitation is that each of the statements in your SQL script has to be delimited with a semi-colon. It basically just parses the script like a text document and executes each statement as it goes. You could probably modify it to take advantage of Spark's SQLContext, instead of using a Connection.
In terms of performance, it probably won't be as fast as a stored procedure because you're bottle-necking with the InputStream. But it is a work-around.
If all SQL is doing is SELECT, is there an advantage to using a view vs a SPROC.
From my point of view, it's purely organizational, but I am wondering if there is a good reason for using views when all a SPROC is doing is SELECTs and has no writes to the DB.
I'm on Sql Server 2008 but this can probably apply to other SQL server products
Views are meant to abstract out the details of the underlying table and provide a window to the data, the way you want it to appear.
Stored procedures achieve a specific task and optionally take parameters that would be used during the task execution.
If you would like to run a specific task by taking arguments from the users, then you can create a stored procedure.
If you just want to expose data in a given way and leave further filtering, if required, to the users, you can create a view.
Other than the security advantage of encapsulating specific data for specific roles, there's also the advantage of being able to create an index on the view.
Here are some specific performance advantages, from the MSDN link:
Aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
Tables can be prejoined and the resulting data set stored.
Combinations of joins or aggregations can be stored.
Given a small set of entities (say, 10 or fewer) to insert, delete, or update in an application, what is the best way to perform the necessary database operations? Should multiple queries be issued, one for each entity to be affected? Or should some sort of XML construct that can be parsed by the database engine be used, so that only one command needs to be issued?
I ask this because a common pattern at my current shop seems to be to format up an XML document containing all the changes, then send that string to the database to be processed by the database engine's XML functionality. However, using XML in this way seems rather cumbersome given the simple nature of the task to be performed.
It depends on how many you need to do, and how fast the operations need to run. If it's only a few, then doing them one at a time with whatever mechanism you have for doing single operations will work fine.
If you need to do thousands or more, and it needs to run quickly, you should re-use the connection and command, changing the arguments for the parameters to the query during each iteration. This will minimize resource usage. You don't want to re-create the connection and command for each operation.
You didn't mention what database you are using, but in SQL Server 2008, you can use table variables to pass complex data like this to a stored procedure. Parse it there and perform your operations. For more info, see Scott Allen's article on ode to code.
Most databases support BULK UPDATE or BULK DELETE operations.
From a "business entity" design standpoint, if you are doing different operations on each of a set of entities, you should have each entity handle its own persistence.
If there are common batch activities (like "delete all older than x date", for instance), I would write a static method on a collection class that executes the batch update or delete. I generally let entities handle their own inserts atomically.
The answer depends on the volume of data you're talking about. If you've got a fairly small set of records in memory that you need to synchronise back to disk then multiple queries is probably appropriate. If it's a larger set of data you need to look at other options.
I recently had to implement a mechanism where an external data feed gave me ~17,000 rows of dta that I needed to synchronise with a local table. The solution I chose there was to load the external data into a staging table and call a stored proc that did the synchronisation completely within the database.