sql temporary tables in rstudio notebook's sql chunks? - sql

I am trying to use temp tables in an sql codechunk in rstudio.
An example: When I select one table and return it into an r object things seem to be working:
```{sql , output.var="x", connection='db' }
SELECT count(*) n
FROM origindb
```
When I try anything with temp tables it seems like the commands are running but returns an empty r data.frame
```{sql , output.var="x", connection='db' }
SELECT count(*) n
INTO #whatever
FROM origindb
SELECT *
FROM #whatever
```
My impression is that the Rstudio notebook sql chunks are just set to make one single query. So my temporary solution is to create the tables in a stored procedure in the database. Then I can get the results I want with something simple. I would prefer to have a bit more flexibility in the sql code chunks.
my db connection looks like this:
```{r,echo=F}
db <- DBI::dbConnect(odbc::odbc(),
driver = "SQL Server",
server = 'sql',
database = 'databasename')
```

Like this question, it will work if you put
set nocount on
at the top of your chunk. R seems to get confused when it's handed back the rowcount for the temp table.

I accomplished my goal using CTEs. As long as you define your CTEs in the order that they will be used it works. It is just like using temp tables with one big exception. The CTEs are gone after the query finishes where temp tables exist until you spid is kill (typically via a disconnect).
WITH CTE_WHATEVER AS (
SELECT COUNT(*) n
FROM origindb
)
SELECT *
FROM CTE_WHATEVER
You can also do this for multiple temp table examples
WITH CTE1 AS (
SELECT
STATE
,COUNTY
,COUNT(*) n
FROM origindb
GROUP BY
STATE
,COUNTY
),
CTE2 AS (
SELECT
STATE
,AVG(n)
,COUNTY_AVG
FROM CTE1
GROUP BY
STATE
)
SELECT *
FROM CTE2
WHERE COUNTY_AVG > 1000000
Sorry for the formatting. I couldn't figure out how to get the carriage returns to work in the code block.
I hope this helps.

You could manage a transaction within the SQL chunk defining a BEGIN and COMMIT clauses. For example:
BEGIN ;
CREATE TABLE foo (id varchar) ;
COMMENT ON TABLE foo IS 'Foo';
COMMIT ;

Related

BigQuery - Create a table from results of a query that uses complex CTEs?

I have a multi CTE query with large underlying datasets that is run too frequently. I could just create a table of the results of that query for people to use instead, and refresh that daily. But I'm lost on the syntax to create such a table.
CREATE OR REPLACE TABLE dataset.target_table
AS
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
FROM final
Is basically what I have. This actually creates the target table with the right schema even, but doesn't insert any rows...Any hints? Thanks
If you really want to do this in one step, you can just do SELECT INTO...
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
INTO dataset.target_table
FROM final
That said, since this isn't just a once-off need I recommend creating the landing table once initially and then scheduling a daily flush and fill (TRUNCATE + INSERT) to update the data. It will give you more explicit control over the data types and also lets you work with a persistent object rather than something built from scratch daily.

Can you force SQL Server to send the WHERE clause to Linked Server?

I'm trying to determine if a table in my SQL Server 2012 database has any records that don't exist in a table that's on a linked Oracle 11g database.
I tried to do this with the following:
select 1
from my_order_table ord
where not exists (select 1
from LINK_ORA..[SCHEMA1].[ORDERS]
where doc_id = ord.document_id)
and document_id = 'N2324JKL3511'
The issue is that it never completes because the ORDERS table on the linked server has about 100 million rows and as per the explain plan on SQL Server, it is trying to pull back the entire ORDERS table from the linked server and then apply the WHERE clause.
As per the explain plan, it views the remote table as having an estimated 10000 rows - I assume that's some kind of default if it is unable to get statistics..?
Even running something as simple as this:
select 1 from LINK_ORA..[SCHEMA1].[ORDERS] where doc_id = 'N2324JKL3511'
causes SQL Server to not send the WHERE clause and the query never completes.
I tried to use OPENQUERY however it won't let me add the doc_id to concatenate into the WHERE clause of the query string.
Then I tried to build a select FROM OPENQUERY string in a function but I can't use sp_executesql in a function to run it.
Any help is greatly appreciated.
I think this would logically work for you, but it may take too long as well.
SELECT sql_ord.*
FROM my_order_table sql_ord
LEFT JOIN LINK_ORA..[SCHEMA1].[ORDERS] ora_ord ON sql_ord.document_id = ora_ord.doc_id
WHERE sql_ord.document_id = 'N2324JKL3511'
AND ora_ord.doc_id IS NULL
Since you have problem with something as simple as select 1 from LINK_ORA..[SCHEMA1].[ORDERS] where doc_id = 'N2324JKL3511' have you try to create a table on the remote server that will hold the doc_id that you want to look at. So your SELECT will include a table that contain only 1 row. I'm just not sure about the INSERT since I can't test it for now. I'm assuming that everything will be done on the remote server.
So something like :
CREATE TABLE LINK_ORA..[SCHEMA1].linked_server_doc_id (
doc_id nvarchar(12));
INSERT INTO LINK_ORA..[SCHEMA1].linked_server_doc_id (doc_id)
SELECT doc_id
FROM LINK_ORA..[SCHEMA1].[ORDERS] WHERE doc_id = 'N2324JKL3511';
select 1
from my_order_table ord
where not exists (select 1
from LINK_ORA..[SCHEMA1].[linked_server_doc_id]
where doc_id = ord.document_id)
and document_id = 'N2324JKL3511';
DROP TABLE LINK_ORA..[SCHEMA1].linked_server_doc_id

Openquery statement in SQL Server

I am fairly new to SQL, and I am hoping someone can help me with a problem I'm having. I haven't been able to find any answers helping me figure out this exact problem.
I have two tables in two SQL Server databases on two different servers that I want to compare using the column ItemID. I want to find records from Table1 that have an ItemID that does not exist in Table2 and insert those into a table variable. I have the following code:
--Create table variable to hold query results
DECLARE #ItemIDTable TABLE
(
[itemid][NVARCHAR](20) NULL
);
--Query data and insert results into table variable
INSERT INTO #ItemIDTable
([itemid])
SELECT a.[itemid]
FROM database1.dbo.table1 a
WHERE NOT EXISTS (SELECT 1
FROM [Database2].[dbo].[table2]
WHERE a.itemid = [Database2].[dbo].[table2].[itemid])
ORDER BY itemid
This works on a test server where the two databases are on the same server, but not in real life where they are on different servers. I tried the following using OPENQUERY, but I know I haven't got it quite right.
--Create table variable to hold query results
DECLARE #ItemIDTable TABLE
(
[ItemID][nvarchar](20) NULL
);
--Query data and insert results into table variable
INSERT INTO #ItemIDTable
([ItemID])
SELECT a.[ItemID]
FROM Database1.dbo.Table1 a
WHERE NOT EXISTS (SELECT 1
FROM OPENQUERY([Server2], SELECT * FROM [Database2].[dbo].[Table2]')
WHERE a.ItemID = [Database2].[dbo].[Table2].[ItemID])
ORDER BY ItemID
I'm pretty sure I need to do something in the WHERE clause, where I have the two databases on two servers, I'm just not quite sure how to structure it. Could anyone help?
You can't create an OPENQUERY that is correlated to an outer query. You could populate a temp table with the results of an OPENQUERY and do your WHERE NOT EXISTS against the temp table, or you might want to look into Synonyms.
Openquery works like this:
select *
from openquery
(LINKED_SERVER_NAME,
'select query goes here'
)
Note that the sql portion is single quoted. That means you might have to quote the quotes if necessary. For example:
select *
from openquery
(LINKED_SERVER_NAME,
'
select SomeTextField
from SomeTable
where SomeDateField = ''20141014''
'
)

Selecting a sequence NEXTVAL for multiple rows

I am building a SQL Server job to pull data from SQL Server into an Oracle database through a linked server. The table I need to populate has a sequence for the name ID, which is my primary key. I'm having trouble figuring out a way to do this simply, without some lengthy code. Here's what I have so far for the SELECT portion (some actual names obfuscated):
SELECT (SELECT NEXTVAL FROM OPENQUERY(MYSERVER,
'SELECT ORCL.NAME_SEQNO.NEXTVAL FROM DUAL')),
psn.BirthDate, psn.FirstName,
psn.MiddleName, psn.LastName, c.REGION_CODE
FROM Person psn
LEFT JOIN MYSERVER..ORCL.COUNTRY c ON c.COUNTRY_CODE = psn.Country
MYSERVER is the linked Oracle server, ORCL is obviously the schema. Person is a local table on the SQL Server database where the query is being executed.
When I run this query, I get the same exact value for all records for the NEXTVAL. What I need is for it to generate a new value for each returned record.
I found this similar question, with its answers, but am unsure how to apply it to my case (if even possible): Query several NEXTVAL from sequence in one statement
put it in a SQL scalar function. Example:
CREATE function [dbo].SEQ_PERSON()
returns bigint as
begin
return
( select NEXTVAL
from openquery(oraLinkedServer, 'select SEQ_PERSON.NEXTVAL FROM DUAL')
)
end
I ended up having to iterate through all the records and set the ID value individually. Messy and slow, but it seems to be the only option in this scenario.
Very easy Just Use a CURSOR to Iterate with the code :
SELECT NEXTVAL AS SQ from OPENQUERY(MYSERVER, 'SELECT AC2012.NAME_SEQNO.NEXTVAL FROM DUAL')
So you can embed this select statement in any Sql statement, and Iterate by the CURSOR.
PS:
DECLARE SQCURS CURSOR
FOR SELECT (SELECT NEXTVAL AS SQ FROM OPENQUERY(MYSERVER,
'SELECT ORCL.NAME_SEQNO.NEXTVAL FROM DUAL')),
psn.BirthDate, psn.FirstName, psn.MiddleName, psn.LastName, c.REGION_CODE
FROM Person psn
LEFT JOIN MYSERVER..ORCL.COUNTRY c ON c.COUNTRY_CODE = psn.Country
OPEN SQCURS
FETCH NEXT FROM SQCURS ;
I hope that help

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique