Is there a "Code Coverage" equivalent for SQL databases?

Is there a "Code Coverage" equivalent for SQL databases? - sql

I have a database with many tables that get used, and many tables that are no longer used. While I could sort through each table manually to see if they are still in use, that would be a cumbersome task. Is there any software/hidden feature that can be used on a SQL Server/Oracle database that would return information like "Tables x,y,z have not been used in the past month" "Tables a,b,c have been used 17 times today"? Or possibly a way to sort tables by "Date Last Modified/Selected From"?
Or is there a better way to go about doing this? Thanks
edit: I found a "modify_date" column when executing "SELECT * FROM sys.tables ORDER BY modify_date desc", but this seems to only keep track of modifications to the table's structure, not its contents.

replace spt_values with the tablename you are interested in, the query will give the the last time it was used and what it was used by
From here: Finding Out How Many Times A Table Is Being Used In Ad Hoc Or Procedure Calls In SQL Server 2005 And 2008
SELECT * FROM(SELECT COALESCE(OBJECT_NAME(s2.objectid),'Ad-Hoc') AS ProcName,execution_count,
(SELECT TOP 1 SUBSTRING(s2.TEXT,statement_start_offset / 2+1 ,
( (CASE WHEN statement_end_offset = -1
THEN (LEN(CONVERT(NVARCHAR(MAX),s2.TEXT)) * 2)
ELSE statement_end_offset END) - statement_start_offset) / 2+1)) AS sql_statement,
last_execution_time
FROM sys.dm_exec_query_stats AS s1
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 ) x
WHERE sql_statement like '%spt_values%' -- replace here
AND sql_statement NOT like 'SELECT * FROM(SELECT coalesce(object_name(s2.objectid)%'
ORDER BY execution_count DESC
Keep in mind that if you restart the box, this will be cleared out

In Oracle you can use the ASH (Active Session History) to find info about SQL that was used. You can also perform code coverage tests with the Hierarchical profiler, where you can find which parts of the stored procedures is used or not used.
If you wonder about the updates on table data, you can also use DBA_TAB_MODIFICATIONS. This shows how many inserts, updates, deletes are done on a table or table partition. As soon as new object statistics are generated, the row for the specified table is removed from DBA_TAB_MODIFICATIONS. You still have help here, since you could also have a peek in the table statistics history. This does not show anything about tables that are queried only. If you really need to know about this, you are to use the ASH.
Note, for both ASH and statistics history access, you do need the diagnostics or tuning pack license. (normally you would want this anyway).

If you use trigger you can detect update insert or delete on table.
Access is problably more difficult.

I use a combination of static analysis in the metadata to determine tables/columns which have no dependencies and runtime traces in SQL Server to see what activity is happening.

Some more queries that might be useful for you.
select * from sys.dm_db_index_usage_stats
select * from sys.dm_db_index_operational_stats(db_id(),NULL,NULL,NULL)
select * from sys.sql_expression_dependencies /*SQL Server 2008 only*/
The difference betweeen what the first 2 DMVs report is explained well in this blog post.

Ed Elliott's open source tool, SQL Cover, is a good bet and has built-in support for the popular unit testing tool, tSQLt.

Related

How to fix MS SQL linked server slowness issue for MariaDB

I'm using SQL Link server for fetching data from MariaDB.
But I fetching issue with slowness when i used MariaDB from link server.
I used below scenarios to fetch result (also describe time taken by query)
Please suggest if you have any solutions.
Total number of row in patient table : 62520
SELECT count(1) FROM [MariaDB]...[webimslt.Patient] -- 2.6 second
SELECT * FROM OPENQUERY([MariaDB], 'select count(1) from webimslt.patient') -- 47ms
SELECT * FROM OPENQUERY([MariaDB], 'select * from webimslt.patient') -- 20 second

This isn’t really a fair comparison...
SELECT COUNT(1) is only returning a single number and will probably be using an index to count rows.
SELECT * is returning ALL data from the table.
Returning data is an expensive (slow) process, so it will obviously take time to return your data. Then there is the question of data transfer, are the servers connected using a high speed connection? That is also a factor in this. It will never be as fast to query over a linked server as it is to query your database directly.
How can you improve the speed? I would start by only returning the data you need by specifying the columns and adding a where clause. After that, you can probably use indexes in Maria to try to speed things up.

SQL multiple tables - very slow

I am trying to fasten up a SQL Server report regarding the IBM OS/400 operating system for my sales department.
A colleague of mine (which left the company) did this report and used a ton of sub selects.
The report usually takes about 30 min to process and often just fails to be displayed. I already tried to cut out some tables/rows in hopes of fastening up the process without success (all is needed by the sales department).
It works over all relevant data (orders, customers, articles, our order at the manufacturer, the manufacturer and so on). Any ideas?
I can't index it, due to the OS/400 system; guess it would be a new programming task for our contractor which leads to costs.
Can I use some clever joins? or somehow reduce the amount of subselects?

Are you using 4 part names in your query? That's probably your problem...
From SQL server...
-- Pull all rows from the table(s) back to MS SQL server and do the where locally on the MS SQL server
select * from LINKEDSVR.MYIBMI.MYLIB.MYTBL where locnbr = '00335';
-- Sends the statement to IBM i server for processing, only results are returned..
select * from openquery(LINKEDSVR, 'select * from MYTBL where locnbr = ''00335''');

Try running the subselects first, sending the output of each to its own table.
Update statistics on the tables. Then run the rest of the SQL, replacing what were originally subselects with the tables created in the first step.
Treat multiple layers of nesting the same way: each layer is its own insert into another table.
I've found that query optimizers have a hard time with complex SQL. Breaking-out the subqueries into separate steps often resolves this.
Between runs my preference is to leave the data intact as a reference in case debugging is needed, then truncate the tables as the first step of a run.
Responding to eraser's comments
Assuming your original query takes this general form:
select [columns] from
(-- subquery
select [columns] from TableA
) as Subquery
from TableB
where mainquery_where_clause
Re-write:
-- Create a table to handle results for your subquery:
Create Table A ;
-- Update the data distribution statistics:
update stats (TableA) ;
-- Now run the subquery:
insert into SubQTable select [columns] from TableA
-- Now run the re-written main query:
Select [columns]
from TableA, TableB
where TableA.joincol = TableB.joincol
and mainquery_where_clause ;
I noticed some syntax issues with the SQL you posted. Looks like something got left out. But the principle of my answer remains the same. Please note that applying my suggestion may not help, as there are potentially many variables to your scenario; you mentioned subqueries, so I chose to address that.
Halfer's suggestion is a great one: edit your original question, adding the SQL code, and putting it in the "{}" supplied by the text editing tool.
I strongly suggest that you obtain the SQL execution plan and post the results.

Using a query that is stored in a table

My database has two tables: t_computers and t_queries.
This query shows me which computers are laptops
select *
from t_computers
where type = 'Laptop'
In the table t_queries I have stored dynamic SQL queries.
SELECT QuerySQL
from t_query
where QueryName = 'Clients that have not been started in 30 days'
The first result is the SQL query that would give me this information.
Now for the complicated part, I want to only select computers that have the type 'Laptop' and are returned if I run the query that is stored in the table.
So something like this
select *
from t_computers
where type = 'Laptop' and
(computer is returned for (SELECT QuerySQL
from query
where QueryName = 'Clients that have not been started in 30 days'))
Is this even possible? I am using SQL Server 2008 R2
I have used a very simplified example.
Some background information on why I want to use the query saved in the table: With our Client Management System (similar to SCCM) Administrators can easily create "views" of Clients. For example Filtering out all Computers that have an IP starting with 10.*. As soon as they save the view, a SQL query is created and saved in the table t_queries. This one query that I want to compare against changes quite often.

Yes it is possible, but as said by the commenters, I strongly unadvise you to execute arbitrary code coming from your users, no matter how much you trust them. You would have very little possibilities to enforce security rules and may open yourself to devastating security breaches.
The way it is properly done in other systems is to use a specific query language (custom or not) that you interpret and "translate" to SQL if needed. That allows you to limit the possible operations to what is strictly necessary.
After that disclaimer, here is an answer to your question (untested, I don't have SQL Server on this laptop so I may have messed up a bit with the quotes) :
exec('select *
from t_computers
where type = ''Laptop'' and
(computer is returned for ('+SELECT TOP(1) QuerySQL
from query
where QueryName = 'Clients that have not been started in 30 days'+'))');

How does Tableau run queries on Redshift? (And/or why can't Redshift display Tableau queries?)

I'm kicking tires on BI tools, including, of course, Tableau. Part of my evaluation includes correlating the SQL generated by the BI tool with my actions in the tool.
Tableau has me mystified. My database has 2 billion things; however, no matter what I do in Tableau, the query Redshift reports as having been run is "Fetch 10000 in SQL_CURxyz", i.e. a cursor operation. In the screenshot below, you can see the cursor ids change, indicating new queries are being run -- but you don't see the original queries.
Is this a Redshift or Tableau quirk? Any idea how to see what's actually running under the hood? And why is Tableau always operating on 10000 records at a time?

I just ran into the same problem and wrote this simple query to get all queries for currently active cursors:
SELECT
usr.usename AS username
, min(cur.starttime) AS start_time
, DATEDIFF(second, min(cur.starttime), getdate()) AS run_time
, min(cur.row_count) AS row_count
, min(cur.fetched_rows) AS fetched_rows
, listagg(util_text.text)
WITHIN GROUP (ORDER BY sequence) AS query
FROM STV_ACTIVE_CURSORS cur
JOIN stl_utilitytext util_text
ON cur.pid = util_text.pid AND cur.xid = util_text.xid
JOIN pg_user usr
ON usr.usesysid = cur.userid
GROUP BY usr.usename, util_text.xid;

Ah, this has already been asked on the AWS forums.
https://forums.aws.amazon.com/thread.jspa?threadID=152473
Redshift's console apparently doesn't display the query behind cursors. To get that, you can query STV_ACTIVE_CURSORS: http://docs.aws.amazon.com/redshift/latest/dg/r_STV_ACTIVE_CURSORS.html

Also, you can alter your .TWB file (which is really just an xml file) and add the following parameters to the odbc-connect-string-extras property.
UseDeclareFetch=0;
FETCH=0;
You would end up with something like:
<connection class='redshift' dbname='yourdb' odbc-connect-string-extras='UseDeclareFetch=0;FETCH=0' port='0000' schema='schm' server='any.redshift.amazonaws.com' [...] >
Unfortunately there's no way of changing this behavior trough the application, you must edit the file directly.
You should be aware of the performance implications of doing so. While this greatly enhances debugging there must be a reason why Tableau chose not to allow modification of these parameters trough the application.

Documentum Query Language Syntax

I would like to know if there is a way in DQL to fetch the rows based on the start and end row values. (Like row number 1 - 1000, 1001 - 2000). ( Similar to what rownumber in oracle queries).
This input will be of great help.

For Documentum DQL query Pagination you can (should) use RETURN RANGE hint, like this
select * from dm_document where object_name like 'ABC%' enable(RETURN_RANGE 1001 2000 1000 'object_name ASC' )
it will sort documents by object_name and then return up to 1K rows, starting from the row number 1001 ending to 2000, optimized for 1K top (sorted) rows.
Syntax is RETURN_RANGE starting_row ending_row [optimize_top_row] 'sorting_clause'
It works since Content Server CS 6.6 with any underlying database.
Documentum Community Ref

I do not believe this is possible using DQL. However, you can consult the DQL Reference Guide (check Powerlink), which contains information about DQL hints (there is a section on them). There is a discussion of passthrough hints that allow you to pass hints through to the underlying RDBMS. The hints available depend on whether it is Oracle, SQL Server, DB2, etc.
This is an excerpt from that section:
Passthrough hints are hints that are passed to the RDBMS server. They
are not handled by Content Server.
SQL Server and Sybase have two
kinds of hints: those that apply to individual tables and those that
apply globally, to the entire statement. To accommodate this, you can
include passthrough hints in either a SELECT statement’s source list
or at the end of the statement. The hints you include in the source
list must be table‑specific hints. The hints you include at the end of
the statement must be global hints. For example, the following
statement includes passthrough hints for Sybase at the table level and
the statement level:
SELECT "r_object_id" FROM "dm_document" WITH
(SYBASE('NOHOLDLOCK')) WHERE "object_name"='test' ENABLE (FORCE_PLAN)
For DB2 and Oracle, include passthrough hints only at the end of the
SELECT statement.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas