How to execute a macro in bigquery - google-bigquery

I have a requirement to move some of the existing frontend applications running Teradata as the backend to Google BigQuery. One of the common pattern used in these frontend applications is to call a Macro in Teradata, based on different input selected by users. Considering BigQuery doesn't have a way to create a macro entity, how can I replace this and have the frontend calling BigQuery to execute something similar. Connection to BigQuery is through ODBC/JDBC or java services.

A macro in Teradata is just a way to execute multiple SQL statements as a single request, which is in turn treated as a single transaction. It also allows you to parameterize your query.
If your new DB backend supports it, you can convert the macros into stored procedures / functions. Otherwise, you can pull out the individual SQL statements from the macro and try to run them together as a single transaction.
These links may be helpful: Functions,
DML
Glancing at the documentation, it looks like writing a function may be your best bet: "There is no support for multi-statement transactions."

You can look at Bigquery scripting which is in Beta - https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#bigquery-scripting for migrating your macros from Teradata. With this release you can write procedures where you can define all your business logic and then execute the procedure using a CALL statement.
Thanks,
Jayadeep

As mentioned above:
A macro in Teradata is just a way to execute multiple SQL statements
as a single request, which is in turn treated as a single transaction.
It also allows you to parameterize your query.
Having said that, you just need to do the migrating part from teradata, here you can find the guide to do this, and answering your question, the connection is made through JDBC whose drivers are tdgssconfig.jar and terajdbc4.jar.

Related

Is parameterisation needed in this instance?

Im designing a UWP app that uses an SQLite database to store its information. From previous research I have blearnt that using the SQLite function SQLiteConnection.Update() and SQLiteConnetion.Insert() functions are safe to use as the inputs are sanitised before entering in the database.
The next step I need to do is sync that data with an online database - in this case SQL Server - using a service layer as my go between. Given that the data was previously sanitised by the SQLite database insert, do I still need to parameterise the object values using the service layer before they are passed to my SQL Server database?
The simple assumption says yes because, despite them being sanitised by the SQLite input, they are technically still raw strings that could have an effect on the main database if not parameterised when sending them there.
Should I just simply employ the idea of "If in doubt, parameterise" ?
I would say that you should always use SQL parameters. There are a few reasons why you should do so:
Security.
Performance. If you use parameters the reuse of execution plans could increase. For details see this article.
Reliability. It is always easier to make a mistake if you build SQL commands by concatenating strings.

SQL: Automatically copy records from one database to another database

I am trying to find out an ideal way to automatically copy new records from one database to another. the databases have different structure! I achieved it by writing VBS scripts which copy the data from one to another and triggered the scripts from another application which passes arguments to the script. But I faced issues at points where there were more than 100 triggers. i.e. 100wscript processes trying to access the database and they couldn't complete the task.
I want to find out a simpler solution inside SQL, I read about setting triggers, Stored procedure and running them from SQL agent, replication etc. The requirement is that I have to copy records to another database periodically or when there is a new record into another database.
Which method will suit me the best?
You can use CDC to do this activity. Create a SSIS package using CDC and run that package periodically through SQL Server Agent Job. CDC will store all the changes of that table and will do all those changes to the destination table when you run the package. Please follow the below link.
http://sqlmag.com/sql-server-integration-services/combining-cdc-and-ssis-incremental-data-loads
The word periodically in your question suggests that you should go for Jobs. You can schedule jobs in SQL Server using Sql Server agent and assign a period. The job will run your script as per assigned frequency.
PrabirS: Change Data Capture
This is a good option. Because it uses the truncation-log to create something similar to the Command Query Segregation Pattern (CQRS).
Alok Gupta: A SQL Job that runs in the SQL Agent
This too is a good option, given that you have something like a modified date thus you can filter the altered data. You can create a Stored Procedure and let it run regularly in the SQL Agent.
A third option could be triggers (the change will happen in the same transaction).
This option is useful for auditing and logging. But you should definitely avoid writing business logic in triggers, as triggers are more or less hidden and occur without directly calling them (similar to CDC actually). I have actually created a trigger about half a year ago that captured the data and inserted it somewhere else in xml-format as the columns in the original table could change over time (multiple projects using the same database(s)).
-Edit-
By the way, your question more or less suggest a lack of a clear design pattern and that the used technique is not the main problem. You could try to read how an ETL-layer is build, or try to implement a "separations of concerns". Note; it is hard to tell if this is the case, but given how you formulated your question, an unclear design is something that pops up in my mind as possible problem.

Is there a way to reference a SQL statement to the C# EF code which generated the SQL?

When I troubleshoot a large .NET app which uses only stored procedures, I capture the sql which includes the SP name from SQL Server Profiler and then it's easy to do a global search for the SP in the source files and find the exact line which produced the SQL.
When using Entity Framework, this is not possible due to the dynamic creation of SQL statements. However there are times when I capture some problematic sql statements from production and want to know where in the code they were generated from.
I know one can have EF generate logs and tracing on demand. This probably would be taxing for a busy server and produces too much logs. I read some stuff about using mini profiler but not sure if it fits my needs as I don't have access to the production server. I do however have access to attach SQL Server Profiler to the database server.
My idea is to find a way to have EF attach/inject a unique code to the generated SQL but it doesn't affect the outcome of the SQL. I can then use it to cross reference it to the line of code which injected it into the SQL. The unique code is static which means a unique static code is used for every EF linq statement. Maybe sent as a dummy sql or a comment along with the sql statement.
I know this will add some extra traffic but in my case, it will add extra flexibility and cut a lot of troubleshooting time.
Any ideas of how to do this or any alternatives?
One very simple approach would be to execute something via ExecuteStoreCommand(): Refresh data from stored procedure. I'm not sure if you can "execute" just a comment, but at the very least you should be able to do something like:
ExecuteStoreCommand("DECLARE #MyTag VARCHAR(100) = 'some_unique_id';");
This is very simple, but you would have to find the association in two steps:
Get the SessionID (i.e. SPID) from poorly performing query in SQL Server Profiler
Search the Profiler entries for the prior SQL statement for that same SPID
Another option that might be a little more complicated but would remove that additional step when it comes to making that association is to "intercept" the commands before they get executed and inject a comment with your unique id. Please see the following S.O. Answer for details. You shouldn't need the full extent of what they did, but even if you do, it seems like all of the code (or all the relevant stuff) is there:
Adding a query hint when calling Table-Valued Function
By the way, this situation is a point in favor of using Stored Procedures instead of an ORM. And, what do you expect to be able to do in terms of performance tuning once you do find the offending app code? (another point in favor of using Stored Procedures instead of an ORM ;-).

Spark SQL - SQL scripts processing

I'm new to Spark and would like to know if there is any possibility to pass Spark an SQL script for processing.
My goal is to bring data from both mysql through jdbc and Cassandra into Spark and pass an SQL script file without having to modify it or minimal modifications applied to it. The reason why I'm saying minimal modifications is that I have a lot of SQL scripts (similar structure to stored procedures) which I don't want to convert them manually to RDD.
Main purpose is to process the data (execute these SQL scripts) through Spark, thus taking advantage of its capabilities and speed.
This guy found a pretty general way to run sql scripts, just pass in the connection to your database:
https://github.com/syncany/syncany/blob/15dc5344696a800061e8b363f94986e821a0b362/syncany-lib/src/main/java/org/syncany/util/SqlRunner.java
One limitation is that each of the statements in your SQL script has to be delimited with a semi-colon. It basically just parses the script like a text document and executes each statement as it goes. You could probably modify it to take advantage of Spark's SQLContext, instead of using a Connection.
In terms of performance, it probably won't be as fast as a stored procedure because you're bottle-necking with the InputStream. But it is a work-around.

Oracle 8i trace of sql statements

I am investigating a legacy app that uses an Oracle 8i database in a test environment, specifically trying to find out what tables are accessed for read, insert, update or delete when the user performs an app function.
What is the best/easiest way to do this? Can I simply get a list of all sql statements sent to the database? Can I see when stored procedures are called?
Having little experience with Oracle but getting help from a DBA, I'm thinking I should either use a trace or look at the redo log with LogMiner, but how?
Thanks!
What you could do is to harvest the sql's from v$sql. If the SQL's are properly written - using bind variables - you should be able to catch most of the statements in a table for this. I currently have no running v8 at hand but this should be possible.
In order to get most of them, you probably need to repeat the harvesting during the various workloads that run on the database.