Unit Testing tables with SSDT

Unit Testing tables with SSDT - sql

I am looking to do some row counts for a handful of tables after our deployment in our lower level environments. I have a project that deploys a DB to SQL, loads some data into it. I want to validate that the table is now populated with data. I have read the MSDN on creating unit tests but I have a few outstanding questions.
Can I only create unit tests against Stored Procs and Functions, or can I simply get a row count from a table or view and Test against that?
Can I run multiple "tests" at once? For example, if I want to get the row count for 6 tables, do I need to create a separate test for each table, or can I batch them all together?
Sorry if I missed a large part of the walk through, but the documentation on this was not very helpful pertaining to these questions.

To test a procedure or function, you simplify call that procedure or function and verify result. There is no difference between SELECT COUNT(*) FROM xxx statement and EXEC dbo.Procedure statement.
Yes. In test conditions you can specify which Result Set to verify. You can also union all row count in single query and use a checksum test condition.

Related

Generate serie of queries in PENTAHO

How do I build a general query based in a list of tables and run/export the results?
Here is my conceptual structure
conceptual structure
Basically the from clause in query would be inserted by the table rows.
After that, each schema.table with true result returned must be stored in a text archive.
Could someone help me?

As pentaho doesn't support loops, it's a bit complicated. Basically you read a list of tables and copy the rows to the result and then you have a foreach-job that runs for every row in the result.
I use this setup for all jobs like this: How to process a Kettle transformation once per filename
In the linked example, it is used to process files, but it can be easily adapted to your use.

Finding table and package references in oracle

I have a list of tables and a list of packages. I need to come up with the below two lists
What are the packages that uses the given set of tables
List of tables that are referenced by each of the given package
The packages uses dynamic sql, hence I may not be able to depend only on dba_reference table.
The other way I could think of is using a LIKE clause against the dba_source table. But, I will have to write a OR condition for each of the tables that I need (or of course a function or procedure which can loop through each table)
Is there any better way of doing this?
Any help is greatly appreciated.
Edit: rephrasing the question -
I have a package which select/inserts/updates several tables. This has dynamic sql. One example is provided below.
I want to identify all the tables referred in this package. What is the best way to achieve this?
In the below example I want to capture both table1 and table2.
if flag = 'Y'
then final_sql := 'insert into table1 (...)';
else final_sql := 'insert into table2 (...)';
end if;
execute immediate final_sql;

For systems using a lot of dynamic SQL I suggest two approaches.
First one is to apply strict coding standards so you know what to look for and can then reliably parse out the table names from the rest of the code. I mean, always have the table same string written to a known variable name, and search for that variable.
This is not always easy to do, especially if you have mountains of code that do not follow the standard. It only takes a couple of folk to not adhere to the standard and it all falls down. However it can be made to work, but probably never going to be 100% reliable.
Another approach is to write test scripts that exercise the whole code base and logic paths. Write them in such a way that they log the procedure name. Enable SQL Trace and capture the trace files from the tests. With clever scripting you should be able to tie the trace to the procedure. This will give you the "raw" SQL, which you can then grep for matches with you list of tables. You might be able to get the same info by harvesting V$SQL tying to V$SESSION.
This is an old school way of doing this, but one that I have used and works.
On one of the largest systems I worked on I wrote a CRUD parser which tokenised the code and produced a CRUD matrix by source file and table access. For dynamic SQL we processed SQL Trace/tkprof files.
If the code has good amount of debugging which dumps out these the table names you could again run the test scripts and harvest the debug logs.

SQL query design with configurable variables

I have a web app that has a large number of tables and variables that the user can select (or not select) at run time. Something like this:
In the DB:
Table A
Table B
Table C
At run time the user can select any number of variables to return. Something like this:
Result Display = A.field1, A.Field3, B.field19
There are up to 100+ total fields spread across 15+ tables that can be returned in a single result set.
We have a query that currently works by creating a temp table to select and aggregate the desired fields then selecting the desired variables from that table. However, this query takes quite some time to execute (30 seconds). I would like to try and find a more efficient way to return the desired results while still allowing the ability for the user to configure the variables to see. I know this can be done as I have seen it done in other areas. Any suggestions?

Instead of using a temporary table, use a view and recompile the view each time your run the query (or just use a subquery or CTE instead of a view). SQL Server might be able to optimize the view based on the fields being selected.
The best reason to use a temporary table would be when intra-day updates are not needed. Then you could create the "temporary" table at night and just select from that table.
The query optimization approach (whether through a view, CTE, or subquery) may not be sufficient. This is a rather hard problem to solve in general. Typically, though, there are probably themes of variables that come from particular subqueries. If so, you can write a stored procedure to generate dynamic SQL that just has the requisite joins for the variables chosen for a given run. Then use that SQL for fetching from the database.
And finally, perhaps there are other ways to optimize the query regardless of the fields being chosen. If you think that might be the case, then simplify the query for human consumption and ask another question

SSRS Best practice for multiple result sets from Stored Procedure

I'm using SSRS 2008 and wanted some advice on the best practices for handling multiple result sets.
I have a stored procedure which returns 4 result sets of related data, however each result set has a different number of records returned. Of course in SSRS only the first result set is processed, so I'm left with 2 options, as far as I can tell:
Create 4 different stored procedures which return the 4 different data sets
Insert all 4 result sets into a temporary table and return the results from that table.
The problem with the first option is that the 4 results are all derived from the same basic data (into a temp table) and then joined/grouped with other tables/data. So to split them out into separate stored procedures seems like it would cause more stress to the DB than a single sproc.
The problem with the second method is that I would have to include the same dataset into SSRS 4 times, pulling different pieces of the result set each time and filtering out the nulls on the correct columns.
For instance, let's say I have 4 result sets that return 4 columns each and 4 records each. The first 4 columns and the first 4 records are related to the first result set (the rest of the columns are null). The second result set only populates columns 5-8 and records 5-8, etc.
The question is, which way is more efficient? Multiple stored procedures or 1 stored procedure used multiple times in SSRS? Thanks!

Sounds like the data retrieval is the more expensive operation, so you probably would want to do that only once. (Although I'm not sure you would "stress" SQL Server, if that's the data source: the basic data and joined/grouped data would likely be in memory at that point, so additional physical reads probably would not happen if different stored procedures re-read the data.)
Consider an Entity Framework LINQ-to-SQL query with several entities INCLUDEd: the shape of a result set is very wide, with a column for each field of each entity. But this avoids a trip to the database for each included entity.
Unless your report has hundreds of pages or is otherwise rendering- or processing-intensive, I think option #2 probably is more efficient.
Are you able to look at the execution statistics in the SSRS execution log? I always use the ExecutionLog3 view, and check the times of TimeDataRetrieval, versus TimeProcessing and TimeRendering.
Edit
I know including just links is frowned upon here, but I don't want to include possibly copyrighted stuff that I didn't create.
Please see the section Examining Queries Sent to the Database here http://www.asp.net/web-forms/tutorials/continuing-with-ef/maximizing-performance-with-the-entity-framework-in-an-asp-net-web-application for an example of a LINQ-to-SQL query along the lines of ...Departments.Include("Person").Include("Courses"). Note especially the picture of the Text Visualizer, which shows the query constructed: it is similar to your option #2 in that multiple entities are included in the same result set.

Spring Batch SQL Query with in clause

I am new to Spring batch.
I am currently developing a test project to learn Spring batch and I have run into an issue.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids. Currently I have roughly 300 ids.
I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table.
I am also open to other suggestions to solve this issue.
Thanks,
Nik

I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table
You can create a:
first step (tasklet) that gets those IDs and puts them in the execution context
second step (chunk-oriented) that reads those IDs from the execution context and use them in the in clause of the reader's query
Passing data between steps in explained in details in the Passing Data to Future Steps section of the reference documentation.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids
I am also open to other suggestions to solve this issue.
I suggest to use a common pattern called the Driving Query Pattern because I think it is a good fit for your requirement. The idea is that the reader gets only the IDs and a processor asks for the details of each ID from other tables.
Hope this helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas