I am trying to fasten up a SQL Server report regarding the IBM OS/400 operating system for my sales department.
A colleague of mine (which left the company) did this report and used a ton of sub selects.
The report usually takes about 30 min to process and often just fails to be displayed. I already tried to cut out some tables/rows in hopes of fastening up the process without success (all is needed by the sales department).
It works over all relevant data (orders, customers, articles, our order at the manufacturer, the manufacturer and so on). Any ideas?
I can't index it, due to the OS/400 system; guess it would be a new programming task for our contractor which leads to costs.
Can I use some clever joins? or somehow reduce the amount of subselects?
Are you using 4 part names in your query? That's probably your problem...
From SQL server...
-- Pull all rows from the table(s) back to MS SQL server and do the where locally on the MS SQL server
select * from LINKEDSVR.MYIBMI.MYLIB.MYTBL where locnbr = '00335';
-- Sends the statement to IBM i server for processing, only results are returned..
select * from openquery(LINKEDSVR, 'select * from MYTBL where locnbr = ''00335''');
Try running the subselects first, sending the output of each to its own table.
Update statistics on the tables. Then run the rest of the SQL, replacing what were originally subselects with the tables created in the first step.
Treat multiple layers of nesting the same way: each layer is its own insert into another table.
I've found that query optimizers have a hard time with complex SQL. Breaking-out the subqueries into separate steps often resolves this.
Between runs my preference is to leave the data intact as a reference in case debugging is needed, then truncate the tables as the first step of a run.
Responding to eraser's comments
Assuming your original query takes this general form:
select [columns] from
(-- subquery
select [columns] from TableA
) as Subquery
from TableB
where mainquery_where_clause
Re-write:
-- Create a table to handle results for your subquery:
Create Table A ;
-- Update the data distribution statistics:
update stats (TableA) ;
-- Now run the subquery:
insert into SubQTable select [columns] from TableA
-- Now run the re-written main query:
Select [columns]
from TableA, TableB
where TableA.joincol = TableB.joincol
and mainquery_where_clause ;
I noticed some syntax issues with the SQL you posted. Looks like something got left out. But the principle of my answer remains the same. Please note that applying my suggestion may not help, as there are potentially many variables to your scenario; you mentioned subqueries, so I chose to address that.
Halfer's suggestion is a great one: edit your original question, adding the SQL code, and putting it in the "{}" supplied by the text editing tool.
I strongly suggest that you obtain the SQL execution plan and post the results.
Related
I'm using SQL Link server for fetching data from MariaDB.
But I fetching issue with slowness when i used MariaDB from link server.
I used below scenarios to fetch result (also describe time taken by query)
Please suggest if you have any solutions.
Total number of row in patient table : 62520
SELECT count(1) FROM [MariaDB]...[webimslt.Patient] -- 2.6 second
SELECT * FROM OPENQUERY([MariaDB], 'select count(1) from webimslt.patient') -- 47ms
SELECT * FROM OPENQUERY([MariaDB], 'select * from webimslt.patient') -- 20 second
This isn’t really a fair comparison...
SELECT COUNT(1) is only returning a single number and will probably be using an index to count rows.
SELECT * is returning ALL data from the table.
Returning data is an expensive (slow) process, so it will obviously take time to return your data. Then there is the question of data transfer, are the servers connected using a high speed connection? That is also a factor in this. It will never be as fast to query over a linked server as it is to query your database directly.
How can you improve the speed? I would start by only returning the data you need by specifying the columns and adding a where clause. After that, you can probably use indexes in Maria to try to speed things up.
have a series of databases on the same server which i am wishing to query. I am using the same code to query the database and would like the results to appear in a single list.
I am using 'USE' to specify which database to query, followed by creating some temporary tables to group my data, before using a final SELECT statement to bring together all the data from the database.
I am then using UNION, followed by a second USE command for the next database and so on.
SQL Server is showing a syntax error on the word 'UNION' but does not give any assistance as to the source of the problem.
Is it possible that I am missing a character. At present I am not using ( or ) anywhere.
The USE statement just redirects your session to connect to a different database on the same instance, you don't actually need to switch from database to database in this matter (there are a few rare exceptions tho).
Use the 3 part notation to join your result sets. You can do this while being connected to any database.
SELECT
SomeColumn = T.SomeColumn
FROM
FirstDatabase.Schema.TableName AS T
UNION ALL
SELECT
SomeColumn = T.SomeColumn
FROM
SecondDatabase.Schema.YetAnotherTable AS T
The engine will automatically check for your login's users on each database and validate your permissions on the underlying tables or views.
UNION adds result sets together, you can't issue another operation (like USE) other than SELECT between UNION.
You should use the database names before the table name:
SELECT valueFromBase1
FROM `database1`.`table1`
WHERE ...
UNION
SELECT valueFromBase2
FROM `database2`.`table2`
WHERE ...
I'm kicking tires on BI tools, including, of course, Tableau. Part of my evaluation includes correlating the SQL generated by the BI tool with my actions in the tool.
Tableau has me mystified. My database has 2 billion things; however, no matter what I do in Tableau, the query Redshift reports as having been run is "Fetch 10000 in SQL_CURxyz", i.e. a cursor operation. In the screenshot below, you can see the cursor ids change, indicating new queries are being run -- but you don't see the original queries.
Is this a Redshift or Tableau quirk? Any idea how to see what's actually running under the hood? And why is Tableau always operating on 10000 records at a time?
I just ran into the same problem and wrote this simple query to get all queries for currently active cursors:
SELECT
usr.usename AS username
, min(cur.starttime) AS start_time
, DATEDIFF(second, min(cur.starttime), getdate()) AS run_time
, min(cur.row_count) AS row_count
, min(cur.fetched_rows) AS fetched_rows
, listagg(util_text.text)
WITHIN GROUP (ORDER BY sequence) AS query
FROM STV_ACTIVE_CURSORS cur
JOIN stl_utilitytext util_text
ON cur.pid = util_text.pid AND cur.xid = util_text.xid
JOIN pg_user usr
ON usr.usesysid = cur.userid
GROUP BY usr.usename, util_text.xid;
Ah, this has already been asked on the AWS forums.
https://forums.aws.amazon.com/thread.jspa?threadID=152473
Redshift's console apparently doesn't display the query behind cursors. To get that, you can query STV_ACTIVE_CURSORS: http://docs.aws.amazon.com/redshift/latest/dg/r_STV_ACTIVE_CURSORS.html
Also, you can alter your .TWB file (which is really just an xml file) and add the following parameters to the odbc-connect-string-extras property.
UseDeclareFetch=0;
FETCH=0;
You would end up with something like:
<connection class='redshift' dbname='yourdb' odbc-connect-string-extras='UseDeclareFetch=0;FETCH=0' port='0000' schema='schm' server='any.redshift.amazonaws.com' [...] >
Unfortunately there's no way of changing this behavior trough the application, you must edit the file directly.
You should be aware of the performance implications of doing so. While this greatly enhances debugging there must be a reason why Tableau chose not to allow modification of these parameters trough the application.
I have a table with about 5 million records and I only need to move the last 1 million to production (as the other 4 million are there). What is the best way to do this so I don't have to recopy the entire table each time?
A little faster will probably be:
Insert into prod.dbo.table (column1, column2....)
Select column1, column2.... from dev.dbo.table d
where not exists (
select 1 from prod.dbo.table pc where pc.pkey = d.pkey
)
But you need to tell us if these tables are on the same server or not
Also how frequently is this run and how robust does it need to be? There are alternative solutions depending on your requirements.
Given this late arriving gem from the OP: no need to compare as I know the IDs > X , then you do not have to do an expensive comparison. You can just use
Insert into prod.dbo.table (column1, column2....)
Select column1, column2.... from dev.dbo.table d
where ID > x
This will be far more efficient as you are only transferring the rows you need.
Edit: (Sorry for revising so much. I'm understanding your question better now)
Insert into TblProd
Select * from TblDev where
pkey not in (select pkey from tblprod)
This should only copy over records that aren't already in your target table.
Since they are on a separate server that changes everything. In short: in order to know what isn't in dev you need to compare everything in DEV to everything in PROD anyway, so there is no simple way to avoid comparing huge datasets.
Some different strategies used for replication between PROD and DEV systems:
A. Backup and restore the whole database across and apply scripts afterwards to clean it up
B. Implement triggers in the PROD database that record the changes then copy only changed records accross
C. Identify some kind of partition or set of records that you know don't change (i.e. 12 months ago), and only refresh those that aren't in that dataset.
D. Copy ALL of prod into a staging table on the DEV server using SSIS. Use a very similar query to above to only insert new records across the database. Delete the staging table.
E. You might be able to find a third party SSIS component that does this efficiently. Out of the box, SSIS is inefficient at comparative updates.
Do you actually have an idea of what those last million records are? i.e. are the for a location or a date or something? Can you write a select to identify them?
Based on this comment:
no need to compare as I know the IDs > X will work
You can run this on the DEV server, assuming you have created a linked server called PRODSERVER on the DEV server
INSERT INTO DB.dbo.YOURTABLE (COL1, COL2, COL3...)
SELECT COL1, COL2, COL3...
FROM PRODSERVER.DB.dbo.YOURTABLE
WHERE ID > X
Look up 'SQL Server Linked Servers' for more information on how to create one.
This is fine for a one off but if you do this regularly you might want to make something more robust.
For example you could create a script that exports the data using BCP.EXE to a file, copies it across to DEV and imports it again. This is more reliable as it does it in one batch rather than requiring a network connection the whole time.
If the tables are on the same server, you can do something like this
I am using MySQL, so may be the syntax will be a little bit different, but in my opinion everything should be the same.
INSERT INTO newTable (columnsYouWantToCopy)
SELECT columnsYouWantToCopy
FROM oldTable WHERE clauseWhichGivesYouOnlyRecodsYouNeed
If on another server, you can do something like this:
http://dev.mysql.com/doc/refman/5.0/en/select-into.html
I have a database with many tables that get used, and many tables that are no longer used. While I could sort through each table manually to see if they are still in use, that would be a cumbersome task. Is there any software/hidden feature that can be used on a SQL Server/Oracle database that would return information like "Tables x,y,z have not been used in the past month" "Tables a,b,c have been used 17 times today"? Or possibly a way to sort tables by "Date Last Modified/Selected From"?
Or is there a better way to go about doing this? Thanks
edit: I found a "modify_date" column when executing "SELECT * FROM sys.tables ORDER BY modify_date desc", but this seems to only keep track of modifications to the table's structure, not its contents.
replace spt_values with the tablename you are interested in, the query will give the the last time it was used and what it was used by
From here: Finding Out How Many Times A Table Is Being Used In Ad Hoc Or Procedure Calls In SQL Server 2005 And 2008
SELECT * FROM(SELECT COALESCE(OBJECT_NAME(s2.objectid),'Ad-Hoc') AS ProcName,execution_count,
(SELECT TOP 1 SUBSTRING(s2.TEXT,statement_start_offset / 2+1 ,
( (CASE WHEN statement_end_offset = -1
THEN (LEN(CONVERT(NVARCHAR(MAX),s2.TEXT)) * 2)
ELSE statement_end_offset END) - statement_start_offset) / 2+1)) AS sql_statement,
last_execution_time
FROM sys.dm_exec_query_stats AS s1
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 ) x
WHERE sql_statement like '%spt_values%' -- replace here
AND sql_statement NOT like 'SELECT * FROM(SELECT coalesce(object_name(s2.objectid)%'
ORDER BY execution_count DESC
Keep in mind that if you restart the box, this will be cleared out
In Oracle you can use the ASH (Active Session History) to find info about SQL that was used. You can also perform code coverage tests with the Hierarchical profiler, where you can find which parts of the stored procedures is used or not used.
If you wonder about the updates on table data, you can also use DBA_TAB_MODIFICATIONS. This shows how many inserts, updates, deletes are done on a table or table partition. As soon as new object statistics are generated, the row for the specified table is removed from DBA_TAB_MODIFICATIONS. You still have help here, since you could also have a peek in the table statistics history. This does not show anything about tables that are queried only. If you really need to know about this, you are to use the ASH.
Note, for both ASH and statistics history access, you do need the diagnostics or tuning pack license. (normally you would want this anyway).
If you use trigger you can detect update insert or delete on table.
Access is problably more difficult.
I use a combination of static analysis in the metadata to determine tables/columns which have no dependencies and runtime traces in SQL Server to see what activity is happening.
Some more queries that might be useful for you.
select * from sys.dm_db_index_usage_stats
select * from sys.dm_db_index_operational_stats(db_id(),NULL,NULL,NULL)
select * from sys.sql_expression_dependencies /*SQL Server 2008 only*/
The difference betweeen what the first 2 DMVs report is explained well in this blog post.
Ed Elliott's open source tool, SQL Cover, is a good bet and has built-in support for the popular unit testing tool, tSQLt.