Generic SQL that both Access and ODBC/Oracle can understand - sql

I have a MS Access query that is based on a linked ODBC table (Oracle).
I'm troubleshooting the poor performance of the query here: Access not properly translating TOP predicate to ODBC/Oracle SQL.
SELECT ri.*
FROM user1_road_insp AS ri
WHERE ri.insp_id = (
select
top 1 ri2.insp_id
from
user1_road_insp ri2
where
ri2.road_id = ri.road_id
and year(insp_date) between [Enter a START year:] and [Enter a END year:]
order by
ri2.insp_date desc,
ri2.length desc,
ri2.insp_id
);
The documentation says:
When you spot a problem, you can try to resolve it by changing the local query. This is often difficult to do successfully, but you may
be able to add criteria that are sent to the server, reducing the
number of rows retrieved for local processing.
In many cases you will find that, despite your best efforts, Office Access still retrieves some entire tables unnecessarily and
performs final query processing locally.
However, it's occurred to me that I don't really understand what sort of SQL I should be writing to make both Access and ODBC/Oracle happy.
Should I be writing some sort of generic SQL that Access can understand in a local query AND that can be easily translated to ODBC/Oracle SQL? Is generic SQL a real thing?

What kind of SQL does the ODBC driver use? It depends as typically MS Access has three types of external data connections that interfaces with different SQL dialects each with the ODBC API.
Linked tables that acts like local tables but are ODBC connected data sources and not stored locally. Once they are incorporated in an Access app, these tables can only use MS Access' SQL dialect. They can be joined with local or even other backend tables from other sources.
Hence, why TOP is available in MS Access and not Oracle. You are essentially using Access SQL to manipulate Oracle data. ODBC serves as the origin point of data while Access' Jet/ACE SQL engine does the processing and resultset viewing in cached memory.
Pass-through queries that do not see local tables or anything else in local app's environment. Such queries use the SQL dialect of the connected database here being Oracle.
Hence, why TOP is NOT available in Oracle and double quotes are allowed in column identifiers. Such quoting would fail in MS Access. Essentially, you are using Oracle SQL to manipulate Oracle data in an Access app. You can take the output of the sqlout.txt log and run it in a pass-through query ODBC-connected to your Oracle database.
ADO/DAO Recordsets that are run entirely via code such as VBA and are direct connections to data sources and uses the connecting database's dialect.
Here, you using Oracle SQL to manipulate Oracle data in an Access app via the ODBC API.
In each one of these types, you will have to connect to a backend ODBC data source. You do not even need to use the GUI but can use Access' object library to create linked tables (see DoCmd.TransferDatabase) and pass through querydefs (see QueryDef.Connect or .Execute).
I suspect the sqlout.txt log you see are translations of the ODBC calls to its native dialect.

To build on #Parfait's point #1:
From Microsoft Access Developer's Guide to SQL Server by Mary Chipman and Andy Baron:
Optimizing Access Queries:
There's a common misconception that the Jet engine always retrieves all the data in linked SQL Server tables and then processes the data locally. This is not usually true. Jet is perfectly capable of sending efficient queries to SQL Server over ODBC and retrieving only the rows required. However, in some cases, Jet will in fact be forced to fetch all the data in certain tables first and then process it. You should be aware of when you are forcing Jet to do this and be sure that it is justified. The following are some general guidelines to follow when creating your Access queries:
Using expressions that can't be evaluated by the server will cause Jet to retrieve all the data required to evaluate those expressions locally. The impact of using Access-specific expressions, such as domain aggregate functions, Access financial functions, or custom VBA functions will vary depending on where in your query the expressions are used. Using such an expression in the SELECT clause will usually not cause a problem because no extra data will be returned. However, if the expression is in the WHERE clause, that criterion cannot be applied on the server, and all the data evaluated by the expression will have to be returned.
With multiple criteria, as many as possible will be processed on the server. This means that even if you use criteria that you know include functions that will need to be processed by Jet, adding other criteria that can be handled by the server will reduce the number of records that Jet has to process. Adding criteria on indexed columns is especially helpful.
Query syntax that includes an Access-specific extension to SQL, not supported by the ODBC driver, may force processing to be done on the client by Access. For example, even though SELECT TOP 5 PERCENT is now supported by SQL Server, it is not supported by the ODBC driver. If you use that syntax in an Access query, Jet will need to retrieve all the records and calculate which ones are in the top 5 percent. On the other hand, even though crosstab queries are specific to Access, Jet will translate them into simple GROUP BY queries and fetch just the required data in one trip to the server unless problematic criteria is used.
Heterogeneous joins between local and remote tables or between remote tables that are in different data sources will, of course, have to be processed by Jet after the source data is retrieved. However, if the remote join field is indexed and the table is large, Jet will often use the index to retrieve only the required rows by making multiple calls to the remote table, one fore each row required.
Jet allows you to mix data types within [typo - fix later] of UNION queries and within expressions, but SQL server doesn't. Such mixing of data types will force processing to be done locally.
Multiple outer joins in one query will be processed locally.
The most important factor is reducing the total number of records being fetched. Jet will retrieve multiple batches of records in the background until the result set is complete, so even though you may seem to get results back immediately, a continuing load is being placed on the server for large result sets.
Note: this book is quite old (published in 2000) and is in reference to Jet Engine. I imagine things might be slightly different in newer versions of Access which use ACE, although I don't have a source to back this up.

Related

Running SQL via SQLWorkbench versus via Tableau Prep

I have developed some SQL that reads from a redshift table, does some manipulation (esp listagg some fields), and then writes to another redshift table.
When I run the SQL using SQLWorkbench it executes successfully. When I embed it in a Tableau Prep flow (as "Complex SQL") I get several of these errors: "System error: AqlProcessor evaluation failed: [Amazon][Support] (40550) Invalid character value for cast specification." Presumably these relate to my treatment of data types. What I don't is what is so difference in the environment that would cause different results like this? Is it because SQLWorkbench and Tableau Prep use different SQL interpreters? Or is my question too broad to even speculate without going through the actual code?
Best guess is that Tableau, which has knowledge of DDL, is add some CAST() operations to the SQL. SQLWorkbench is simpler and is pushing the SQL to Redshift as written. This is based on there being no explicit CASTs in your SQL but an error message that identifies a CAST().
Look at stl_querytext for these two queries and see if they are being given to Redshift differently by the two benches. I suspect this will give you some clues to go on.
If there are no differences in the SQL then the issue may be with user / connection differences and more info will likely be needed about the issue.

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

Do databases besides Postgres have features comparable to foreign data wrappers?

I'm very excited by several of the more recently-added Postgres features, such as foreign data wrappers. I'm not aware of any other RDBMS having this feature, but before I try to make the case to my main client that they should begin preferring Postgres over their current cocktail of RDBMSs, and include in my case that no other database can do this, I'd like to verify that.
I've been unable to find evidence of any other database supporting SQL/MED, and things like this short note stating that Oracle does not support SQL/MED.
The main thing that gives me doubt is a statement on http://wiki.postgresql.org/wiki/SQL/MED:
SQL/MED is Management of External Data, a part of the SQL standard that deals with how a database management system can integrate data stored outside the database.
If FDWs are based on SQL/MED, and SQL/MED is an open standard, then it seems likely that other RDBMSs have implemented it too.
TL;DR:
Does any database besides Postgres support SQL/MED?
IBM DB2 claims compliance with SQL/MED (including full FDW API);
MySQL's FEDERATED storage engine can connect to another MySQL database, but NOT to other RDBMSs;
MariaDB's CONNECT engine allows access to various file formats (CSV, XML, Excel, etc), gives access to "any" ODBC data sources (Oracle, DB2, SQLServer, etc) and can access data on the storage engines MyIsam and InnoDB.
Farrago has some of it too;
PostgreSQL implements parts of it (notably it does not implement routine mappings, and has a simplified FDW API). It is usable as readeable since PG 9.1 and writeable since 9.3, and prior to that there was the DBI-Link.
PostgreSQL communities have a plenty of nice FDW like noSQL FDW (couchdb_fdw, mongo_fdw, redis_fdw), Multicorn (for using Python output instead of C for the wrapper per se), or the nuts PGStrom (which uses GPU for some operations!)
SQL Server has the concept of Linked Servers (http://technet.microsoft.com/en-us/library/ms188279.aspx), which allows you to connect to external data sources (Oracle, other SQL instances, Active Directory, File system data via the Indexing Service provider, etc.) and, if you really needed to, you can create your own Providers that can be used by a SQL Server Linked Server.
Another option within SQL Server is the CLR, in which you can write code to retrieve data from web services or other data sources as needed.
While this may not technically be "SQL/MED", it seems to accomplish the same thing.
Distributed query using local table joined to 4-part linked server query. I think case the remotetable filter might not be applied until after the entire table is pulled local (documentation is fuzzy on this and I've found article with conflicting opinions):
SELECT *
FROM LocalDB.dbo.table t
INNER JOIN LinkedServer1.RemoteDB.dbo.remotetable r on t.val = r.val
WHERE r.val < 1000
;
Using OpenQuery, remotetable filter is applied on the remote server, as long as the filter is passed into the OpenQuery 2nd parameter:
SELECT *
FROM LocalDB.dbo.table t
INNER JOIN OPENQUERY(LinkedServer1, 'SELECT * FROM RemoteDB.dbo.remotetable r WHERE r.val < 1000') r on t.val = r.val

SQL Server 2008, Sybase - large select queries over low bandwidth

I need to pull a large amount of data from various tables across a line that has very low bandwidth. I need to minimize the amount of data that gets sent too and fro.
On that side is a Sybase database, on this side SQL Server 2008.
What I need is to pull all the tables from the Sybase database that have to do with this office. Lets say I have the following tables as an example:
Farm
Tree
Branch
etc.
(one farm has many trees, one tree has many branches etc.)
Lets say the "Farm" table has a field called "CountryID", and I only want the data for where CountryID=12. The actual table structures I am looking at are very complex (and I am also not very familiar with them) so I want to try to keep the queries simple.
So I am thinking of setting up a series of views:
CREATE VIEW vw_Farm AS
SELECT * from Farm where CountryID=12
CREATE VIEW vw_Tree AS
SELECT * from Tree where FarmID in (SELECT FarmID FROM vw_Farm)
CREATE VIEW vw_Branch AS
SELECT * from Tree where BranchID in (SELECT BranchID FROM vw_Branch)
etc.
To then pull the actual data across I would then do:
SELECT * from vw_Farm into localDb.Farm
SELECT * from vw_Tree into localDb.Tree
SELECT * from vw_Branch into localDb.Branch
etc.
Simple enough to set up. I am wondering how this will perform though? Will it perform all the SELECT statements on the Sybase side and then just send back the result? Also, since this will be an iterative process, is it possible to index the views for subsequent calls?
Any other optimisation suggestions would also be welcome!
Thanks
Karl
EDIT: Just to clarify, the views will be set up in SQL Server. I am using a linked server using Sybase ASE to set up those views. What is worrying me in particular is whether the fact that the view is in SQL Server on this side and not on Sybase on that side will mean that for each iteration the data from the preceeing view will get pulled across to SQL Server first before the calculations get executed. I want Sybase to do all the calcs and just pass the results across.
It's difficult to be certain without testing, but my somewhat-relevant experience (using linked servers to platforms other than Sybase, and on SQL Server 2005) has been that using subqueries (such as your code for vw_Tree and vw_Branch) more or less guarantees that SQL Server will pull all the data for the outer table into a local temp table, then match it to the results of the inner query.
The problem is that SQL Server has no access to the linked server's table statistics, so can make no meaningful decisions about how to optimise the query.
If you want to be sure to have the work done on the Sybase server, your best bet will be to write code (could be views or stored procedures) on the Sybase side and reference them from SQL Server.
Linked server connections are, in my experience, not particularly resilient over flaky networks. If it's available, you could consider using Integration Services rather than linked-server queries - but even that may not be much better. You may need to consider falling back on moving text files with robocopy and bcp.

SQL Server: is it possible to get data from another SQL server without setting linked server?

I need to do the following query (for example):
SELECT c1.CustomerName FROM Customer as c1
INNER JOIN [ExternalServer].[Database].[dbo].[Customer] as c2
ON c2.RefId = c1.RefId
For some security reason my client doesn't allow me to create a linked server. The user under whom I execute this query has access to both tables. Is it possible to make it work without using linked server? Thanks.
You could use OPENROWSET, which'll require the connection info, username & password...
While I understand that the client believes that having an always-on connection to their data is risky, that's why you lock down the account. OPENROWSET means including the connection info in plain text.
'Linked Server' is a very specific thing -- basically, a permanent connection between servers. I can think of all sorts of reasons not to want that, while at the same time having no problem with folks writing queries that combine data from the two different data sources.
Anyway, depending on your requirement -- if this is just for ad hoc querying, OPENROWSET is good if inside of SQL-Server, or if you want to do this in MS Access, just link to the two tables, and your Access query won't care that one comes from one server, and one comes from another.
Alternatively, with a web or windows front-end, you could indpendently query each table into a data object, and then build a separate query on top of that.
Http Endpoints...
WebServices...
There's a million ways. I wouldn't be so quick to assume, as #Lasse suggests, that any form of 'linking' this data together would make you some kind of rougue data linker.