Running SQL script within R - sql

I set-up a connection from R to (Microsoft SQL Server Studio) using the RODBC package. While I am able to run simple SQL queries directly from R, I find that the more complex sql queries containing special characters such as "#" for declaring a table while creating a temp table tend to return an error from R. I have tried to escape this within R itself (by placing it in quotes) , however, this is failing, as SQL could not interpret this escape characters (I guess).
My goal is to perform soundex/fuzzy matching of some client records again the clients in the database (~3M rows). I have tried getting this done directly using the stringdist package in R but the matching process is blowing out my RAM (16GB), hence, the reason why I have resulted to matching the data from within SQL itself. I could have easily done this in SQL, however, I need to set-up this in R so that non-technical individuals can easily run the R script and query, database and perform further work on the resulting dataset.
I have tried the suggestion in this post but did not find it helpful to resolve this issue
Any tips on how to escape SQL special characters like the # symbol would be useful.
I get this error in R:
1" 42000 102 [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near 'go'."
2 "[RODBC] ERROR: Could not SQLExecDirect '\nSET DATEFORMAT dmy; \ngo\nDECLARE #VerifyClientID TABLE (firstname varchar(100), middlename varchar(100), lastname varchar(100)~
The script:
SET DATEFORMAT dmy;
go
DECLARE #VerifyClientID TABLE (firstname varchar(100), middlename varchar(100), lastname varchar(100), dob date, mobile varchar(100), ID int)
INSERT INTO #VerifyClientID (firstname, lastname, mobile, ID)
VALUES
('JOHN','DOE','0444 444 444',1)
drop table if exists #clientTABLE
select v.ID, p.ABC_NCMID, P.ABC_RowID, P.ABC_FirstName, P.ABC_GuardianName, P.ABC_LastName, P.ABC_SexCode_ID, P.ABC_CellPhone, P.ABC_DOB, ABC_StreetAddress, ABC_City, ABC_Zip, ABC_SSN, ABC_HomePhone
into #clientTABLE
from #VerifyClientID V
inner join dbo.DV_Person P on soundex(V.firstname) = soundex(P.ABC_FirstName)
and soundex(V.lastname) = soundex(P.ABC_LastName)
and (convert(varchar,replace(replace(P.ABC_CELLPHONE,' ',''),0,'')) = convert(varchar,replace(replace(V.mobile,' ',''),0,''))
or convert(varchar,replace(replace(P.ABC_HomePhone,' ',''),0,'')) = convert(varchar,replace(replace(V.mobile,' ',''),0,''))
)
where 1=1
and p.ABC_NCMID in (select Per.PER_CLIENTID from AtlasPublic.View_UODS_Person Per)
and P.ABC_IsPatient = 1
select distinct ID
,firstname
,middlename
,lastname
,dob
,mobile
, (SELECT TOP 1 ABC_NCMID FROM #clientTABLE WHERE id = v.id) as MatchedID
, (SELECT TOP 1 convert(varchar, ABC_DOB, 103) FROM #clientTABLE WHERE id = v.id) as MatchedDOB
, (SELECT TOP 1 case when ABC_SexCode_ID = 8089 then 'Male'
when ABC_SexCode_ID = 8088 then 'Female'
else null end FROM #clientTABLE WHERE id = v.id) as MatchedSEX
, (SELECT TOP 1 ABC_StreetAddress FROM #clientTABLE WHERE id = v.id) as MatchedStreet
, (SELECT TOP 1 ABC_City FROM #clientTABLE WHERE id = v.id) as MatchedCity
, (SELECT TOP 1 ABC_Zip FROM #clientTABLE WHERE id = v.id) as MatchedZip
from #VerifyClientID V

Related

Loop a sql query with different value set to a variable in a query?

I have a query which compares the old database with new database on Customers table who belongs to a specific department and retrieves the difference between those database tables. I have a query as below
DECLARE #departmentid int = 2001
SELECT Distinct DB.[CUSTOMER_ID],DB.[CUSTOMER_AGE]
FROM [PROD\SQL01].[PRD_Live].[dbo].[Customers] DB
WHERE DB.[DEPARTMENT_ID]= #departmentid and
DB.[CUSTOMER_ID] NOT IN (SELECT Distinct [CUSTOMER_ID]
FROM [NEWPRD_Live].[dbo].[Customers]
WHERE [DEPARTMENT_ID]=#departmentid)
There are 40 departments id values (like 2001,2002,...,2040) that has to be set in the variable #departmentid and currently I am executing the above query by modifying the department id every time manually and executing the query for 40 times for 40 departments. Is it possible to set all the departments to a one variable and execute the query by setting each department id at a time in a loop and get all the results at a time?
Try with inserting departments ID in a #temp table and use it in you where clause :
Where DEPARTMENT_ID In (Select ID From #tmpDepartment)
Edit :
Since you cannot use # tables here is a loop :
DECLARE #DEPARTMENT_ID int = 2001,
#limit int = 2040
While #DEPARTMENT_ID <= #limit
Begin
SELECT Distinct DB.[CUSTOMER_ID],DB.[CUSTOMER_AGE]
FROM [PROD\SQL01].[PRD_Live].[dbo].[Customers] DB
WHERE DB.[DEPARTMENT_ID]= #departmentid and
DB.[CUSTOMER_ID] NOT IN (SELECT Distinct [CUSTOMER_ID]
FROM [NEWPRD_Live].[dbo].[Customers]
WHERE [DEPARTMENT_ID]=#departmentid)
Set #DEPARTMENT_ID = #DEPARTMENT_ID + 1
End
But it is not really good design
If i understand your requirement correctly, you want the list of CUSTOMER_ID for each of the DEPARTMENT_ID. Add DEPARTMENT_ID to part of the DISTINCT and you will get the result you wanted
SELECT Distinct
DB.[DEPARTMENT_ID],
DB.[CUSTOMER_ID],
DB.[CUSTOMER_AGE]
FROM [PROD\SQL01].[PRD_Live].[dbo].[Customers] DB
where DB.[DEPARTMENT_ID] IN (2001, 2002, 2040)
and DB.[CUSTOMER_ID] NOT IN
(
SELECT Distinct [CUSTOMER_ID]
FROM [NEWPRD_Live].[dbo].[Customers] n
where n.[DEPARTMENT_ID] = DB.[DEPARTMENT_ID]
)
ORDER BY DB.[DEPARTMENT_ID], DB.[CUSTOMER_ID]
Place your IDs on a string, separated by commas, so you can use the Split String function to get all those values and use them on searching CUSTOMER_ID :
declare #ids varchar(max) = '2001,2002,2003'
SELECT Distinct DB.[DEPARTMENT_ID], DB.[CUSTOMER_ID], DB.[CUSTOMER_AGE]
FROM [PROD\SQL01].[PRD_Live].[dbo].[Customers] DB
WHERE DB.[DEPARTMENT_ID]= (select value from string_split(#ids, ',') and
DB.[CUSTOMER_ID] NOT IN (SELECT Distinct [CUSTOMER_ID]
FROM [NEWPRD_Live].[dbo].[Customers]
WHERE [DEPARTMENT_ID]=DB.[DEPARTMENT_ID])
Note: String_split requires at least SQL Server 2016, if you are using a previous version then you must define your own Split String function, like this one T-SQL split string

How to write if exist statement that would run a different select depending if it exists or not

I am trying to convert a sql if exists statement into a SSRS valid format to run a report on CRM.
CRM report doesn't accept the report on upload if I have a if exists method, I'm having troubles figuring out what I can use in its place.
IF EXISTS(select * from dbo.FC where dbo.FC.ContactID in (select dbo.AV.so_contactid from dbo.AV))
begin
select [STATEMENT 1]
from dbo.AV CRMAF_so_AV join
dbo.FC c
on CRMAF_so_AV.so_contactid = c.ContactID;
end
else
begin
select [STATEMENT 2]
from dbo.AV CRMAF_so_AV join
dbo.FA c
on CRMAF_so_AV.so_contactid = c.AccountID;
end;
I want to be able to either run the select [STATEMENT 1] if the condition is true else I want to run select [STATEMENT 2]
I have managed to get this to work by doing a LEFT JOIN instead of a JOIN.
select [STATEMENT 1 + 2 all columns needed]
from dbo.AV CRMAF_so_AV
left join dbo.FC c on CRMAF_so_AV.so_contactid = c.ContactID;
left join dbo.FA a on CRMAF_so_AV.so_contactid = a.AccountID;
This now runs if its an account or a contact.
Try this -
You have to put your entire statement in #select1 and #select1.
declare #statement1 as varchar(max);
declare #statement2 as varchar(max);
SET #statement1 = 'SELECT 1'
SET #statement2 = 'SELECT 2'
IF EXISTS(select * from dbo.FC where dbo.FC.ContactID in (select dbo.AV.so_contactid from dbo.AV))
BEGIN
EXEC (#statement1)
END
ELSE
BEGIN
EXEC (#statement2)
END
Instead of using if exists can you not get a count of records that meet the criteria and then if its 1 or greater run a different query as apposed to if it was equal to 0.
let me know if I am missing something what you are trying to achieve.
sorry i am unable to put comments due to having a new account so my reputation is low.
I think you need something like this:
WITH PreSelection AS (
SELECT
AV.ID AS AVID,
(SELECT TOP(1) c.ContactID FROM dbo.FC c WHERE c.ContactID = AV.so_contactid) AS ContactID,
(SELECT TOP(1) c.ContactID FROM dbo.FA c WHERE c.AccountID = AV.so_contactid) AS AccountID
FROM dbo.AV
)
SELECT
AVID,
ISNULL(
CASE WHEN ContactID IS NULL
THEN (SELECT TOP(1) AccountName FROM dbo.FA WHERE FA.AccountID = AccountID)
ELSE (SELECT TOP(1) LTRIM(RTRIM(ISNULL(FirstName, '') + ' ' + ISNULL(LastName, ''))) FROM dbo.FC WHERE FC.ContactID = ContactID)
END, '') AS ContactName
FROM PreSelection
A few things to note:
When SSRS evaluates query it expects the resluts to always have the same structure in terms of column names and types.
So you CANNOT do something like this..
IF #x=#y
BEGIN
SELECT Name, Age FROM employees
END
ELSE
BEGIN
SELECT DeptID, DeptName, DeptEMpCOunt FROM departments
END
... as this will return different types and column names and column counts.
What you CAN DO is this..
DECLARE #t TABLE(resultType int, colA varchar(128), colB int, colC varchar(128), colD int)
IF #x=#y
BEGIN
INSERT INTO #t(resultType, colA, ColB)
SELECT 1 as resultType, Name, Age FROM employees
END
ELSE
BEGIN
INSERT INTO #t(resultType, colB, colC, colD)
SELECT 2 AS resultType, DeptID, DeptName, DeptEmpCount FROM departments
END
SELECT * FROM #t
Al we are doing is creating a table that can handle all variations of the data and putting the results into whatever columns can accommodate that data type.
This will always return the same data structure so SSRS will be happy, then you will need to handle which columns to display your data from based on what gets returned, hence why I added the result type to the results so you can test that from within the report.

Stored Proc Query Timming Out in .net but not sql server

I have the following query and its timing out in .net but executing fine in sql server does .net not like temp tables or something ??? I run it in .net and i get a timeout error I dont understand what is going on at all.
SET DATEFORMAT dmy
declare #AbsenceReasonRestrictions varchar(500)
set #AbsenceReasonRestrictions ='1'
create table #absence
(
record_id INT,
emp_no int,
staff_no varchar(max),
emp_name varchar(max),
details text,
leave_reason int,
leave_reason_desc varchar(30),
current_status int,
date_added datetime,
dept int,
dept_desc varchar(100),
location int,
location_desc varchar(100),
division int,
division_desc varchar(100),
emptype int,
emptype_desc varchar(100),
contype int,
contype_desc varchar(100),
conclass int,
conclass_desc varchar(100),
line_manager int,
line_manager_name varchar(510)
)
INSERT INTO #absence (record_id,
emp_no,
staff_no,
emp_name,
details,
leave_reason,
leave_reason_desc,
current_status,
date_added,
dept,
dept_desc,
location,
location_desc,
division,
division_desc,
emptype,
emptype_desc,
contype,
contype_desc,
conclass,
conclass_desc,
line_manager,
line_manager_name)
select ua.record_id,
ua.emp_no,
e.staff_no,
rtrim(e.surname)+', '+rtrim(e.forename1),
ua.details,
ua.absence_reason,
ar.desc_,
ua.current_status,
ua.date_added,
c.dept,
rtrim(dept.desc_),
c.location,
rtrim(loc.desc_),
c.division,
rtrim(div.desc_),
c.emptype,
rtrim(emptype.desc_),
c.type,
rtrim(contype.desc_),
c.classification,
rtrim(conclass.desc_),
ua.manager_user_id,
(select rtrim(e.surname) + ', ' + rtrim(e.forename1) as emp_name from employee e
inner join userlist_mss um on e.emp_no = um.pams_id
where um.record_id = ua.manager_user_id)
from ess_absence_requests ua
inner join employee e on e.emp_no=ua.emp_no
inner join absreas ar on ar.code=ua.absence_reason
inner join contract c on ua.emp_no = c.emp_no
join dept on c.dept=dept.code
join location loc on c.location=loc.code
join division div on c.division=div.code
join emptype on c.emptype=emptype.code
join contype on c.type=contype.code
join conclass on c.classification=conclass.code
where e.emp_no like '%'
AND c.main_contract=1
AND ua.current_status in (1,2,3,4)
AND (dbo.fn_XmlElementDateValue(ua.details, 'start_date') >='1/10/2013')
AND (dbo.fn_XmlElementDateValue(ua.details, 'end_date') <='31/10/2013')
AND e.active_leaver like 'ACTIVE'
AND e.emp_no like '%'
and c.dept like '%'
and c.location like '%'
and c.division like '%'
and c.emptype like '%'
and c.classification like '%'
and c.type like '%'
and ua.emp_no in (select employee_id from userlist_mss_employee_access_rights where manager_id=2)
order by (dbo.fn_XmlElementDateValue(ua.details, 'start_date'))
if #AbsenceReasonRestrictions!=''
begin
set #AbsenceReasonRestrictions=','+#AbsenceReasonRestrictions+','
delete #absence where charindex(','+cast(leave_reason as varchar(10))+',', #AbsenceReasonRestrictions) = 0
end
select record_id,
emp_no,
staff_no,
emp_name,
details,
leave_reason,
leave_reason_desc,
current_status,
date_added,
dept_desc,
location_desc,
division_desc,
emptype_desc,
contype_desc,
conclass_desc,
line_manager,
line_manager_name from #absence
drop table #absence
select * from emp_anal
I think that the problem is that management studio has no timeout limit for your query, but when you run it within .net code you are limited connection timeout. Try to change the timeout directly in you connection string or in connection initialization place.
More info:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectiontimeout(v=vs.110).aspx
Changing SqlConnection timeout
I guess it's because your direct sql call (using management studio?) uses a different query plan than your .net code ... and the plan .net uses seems to be no good ...
to resolve this try updating table statistics (update statistics ) for the tables used by your join and dbcc freeproccache to clear the cached query plans ...

What part of this short SQL script run on a production is not optimal?

I am running a script on our production database reffering two tables : our table of users (3700 of them) and the table of quotes that they have made (280000 of them). Quote is the main object in our application, a very large object, for whom many data tables are created and filled. My goal is to clean database from all quotes but those made of a small group of users.
I first create a temp table containing ids of those users (it is used else in the script also) and then a cursor that runs through the main table for the quotes, where they are listed, and for those quotes created from the user group does the necessary cleansing.
I see that this script is going to be executed for 26 hours approximately, which I consider peculiar since I need about 15 minutes for the database restoring in general, and I guess the heaviest sql is executed there. The db, though, weighs more than 100GB.
Is there some part of the script that I am making terribly non-optimal, or you have some suggestion how this could be done with much shorter execution.
We are running SQL Server 2008 R2.
Here's the sketch of the script.
CREATE table #UsersIdsToStay(user_id int)
INSERT INTO #UsersIdsToStay
select user_id
from users
where user_name like '%SOMESTRING '
-----
declare #QuoteId int
declare #UserId int
declare QuoteCursor cursor for
select DISTINCT QuoteId, UserId
from QuotesTable
where UserId not in
(
select * from #UsersIdsToStay
)
open QuoteCursor
while 1=1
begin
fetch QuoteCursor into #QuoteId, #UserId
if ##fetch_status != 0 break
-- all the deletions from related tables are executed here using #QuoteId and #UserId
exec('delete from QuoteHistory where QuoteId = ' + #QuoteId + ' and UserId = ' + #UserId )
exec('delete from QuoteRevisions where QuoteId = ' + #QuoteId + ' and UserId = ' + #UserId )
exec('delete from QuoteItems where QuoteId = ' + #QuoteId + ' and UserId = ' + #UserId )
....
end
close QuoteCursor;
deallocate QuoteCursor
The cursor restricts you to only delete a single User_Id/Quote_Id combination at a time on each related table. By using joins you will be able to delete in mass.
You could also switch out the temp table with a Common Table Expression(CTE). If this is a one off script the temp table should be ok, but for production code I would create a CTE.
if OBJECT_ID('tempdb..#quotesToDelete') is not null
drop table #quotesToDelete
select distinct
ut.user_id,
qt.quote_id
into #quotesToDelete
from dbo.QuotesTable qt (nolock)
inner join dbo.UsersTable ut (nolock)
on qt.user_id = ut.user_id
where ut.user_name not like '%SOMESTRING '
-- all the deletions from related tables are executed here using #QuoteId and #UserId
-- relatedtableA
delete a
from relatedtableA a
inner join #quotesToDelete b
on a.user_id = b.user_id
and a.quote_id = b.quote_id
-- relatedtableB
...
Since you don't show the deletes cannot show you how to avoid a cursor.
But could do this without a temp pretty easy
select DISTINCT QuoteId, UserId
from QuotesTable
where UserId not in
(
select user_id
from users
where user_name like '%SOMESTRING '
)
or
select DISTINCT QuoteId, UserId
from QuotesTable
left join UserId
on UserId.user_id = QuotesTable.UserId
and user_name like '%SOMESTRING '
where UserId.user_id is null
The problem is the cusor and you don't need it
CREATE table #QuotesToDelete(QuoteId int, UserID int)
insert into #QuotesToDelete
select DISTINCT QuoteId, UserId
from QuotesTable
left join UserId
on UserId.user_id = QuotesTable.UserId
and user_name like '%SOMESTRING '
where UserId.user_id is null
delete QH
from QuoteHistory QH
join #QuotesToDelete
on #QuotesToDelete.QuoteId = QH.QuoteId
and #QuotesToDelete.UserID = QH.UserID
delete QR
from QuoteRevisions QR
join #QuotesToDelete
on #QuotesToDelete.QuoteId = QR.QuoteId
and #QuotesToDelete.UserID = QR.UserID

Invalid table or object when doing query on temp table (Pervasive SQL)

I have a SP that inserts records into a temp table, then selects the records and returns them. The SQL is this.
I troubleshot it by removing the INSERT INTO statement, and minimizing the SQL. The culprit is the SELECT * FROM #Worklist1. No idea why this does not work. I upgraded (just now) to latest version of Pervasive server ver 10 if that helps, but this issue was in 10.3 and its still there. Must be missing something.
CREATE PROCEDURE "Connect_Workflow"(
:StartDate DATETIME, :EndDate DATETIME)
RETURNS(Patient varchar(100) ,
AccessionNo varchar(25)
);
BEGIN
CREATE TABLE #WorkFlow1
(Patient varchar(100) null,
AccessionNo varchar(25) null
);
INSERT INTO #Workflow1(
SELECT
rtrim(p.LastName),--+ '^' + rtrim(p.FirstName) + isnull('^' + rtrim(p.Initial), ''),
v.VisitID -- equiv to EncounterID
FROM visit v
join patient p on v.patientnumber = p.patientnumber
WHERE v.VisitYY = '99'
);
SELECT * FROM #WorkFlow1;
DROP TABLE #Workflow1;
END
Update: After commenting out the SELECT * FROM #Worklist1; it still gives a invalid table error. If I remove the INSERT INTO and the SELECT * then finally the error is gone. Must be error in referencing the table.
Remove the DROP TABLE #Workflow1; from your query.
I believe it's dropping the table before the SP returns the data.
Okay i figured it out. Although the procedure should work fine, in fact Pervasive recommends something like this. use SELECT INTO
CREATE PROCEDURE "Connect_Workflow"(
:StartDate DATETIME, :EndDate DATETIME)
RETURNS(Patient varchar(100) ,
AccessionNo varchar(25)
);
BEGIN
SELECT
rtrim(p.LastName),--+ '^' + rtrim(p.FirstName) + isnull('^' + rtrim(p.Initial), ''),
v.VisitID -- equiv to EncounterID
FROM visit v
INTO #Workflow1
join patient p on v.patientnumber = p.patientnumber
WHERE v.VisitYY = '99'
);
SELECT * FROM #WorkFlow1;
END