I have a fairly long bit of SQL code that creates a number of temp tables. Within the different creations there are some functions that occur multiple times. The functions are constant but they have an int at the end to change the result range, eg.
WHERE getdate() between mfg_ww_begin_datetime and mfg_ww_end_datetime) -2
When I want to change my overall query, I have to go in and manually change each of these ints - is there a way to set these ints at the top of my query so that I can change just one value and each time it is used in the rest, it references that value I have control of at the top?
Well I'm not the smartest, but this works after some more searching.
DECLARE #CurrentWW INT, #SampleSize INT, #RollingAvg INT
SET #CurrentWW = 7
SET #SampleSize = 25
SET #RollingAvg = 10
And using those variable names in the rest of the query. They can be referenced multiple times.
Related
I am trying to update multiple rows with random 9 digit number using the following code.
UPDATE SGT_EMPLOYER
SET SSN = (CONVERT(NUMERIC(10,0),RAND() * 899999999) + 100000000)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;
Expected result: the query should update 300 rows with 300 random 9 digit numbers.
Actual: query is updating 300 rows with same number as the ran() function is executing only once.
Please help. Thank You.
As you already figured out yourself, RAND is a run-time constant function in SQL Server. It means that it is called once per statement and the generated value is used for each affected row.
There are other functions that are called for each row. Often people use NEWID usually together with CHECKSUM as a substitute for a random number, but I would not recommend it because the distribution of such random numbers is likely to be poor.
There is a good function specifically designed to generate random numbers: CRYPT_GEN_RANDOM. It is available since at least SQL Server 2008.
It generates a given number of random bytes.
In your case it would be convenient to have a random number as a float value in the range of [0;1], same as the value returned by RAND.
So, CRYPT_GEN_RANDOM(4) generates 4 random bytes as varbinary.
Convert them to int, divide by the maximum value of 32-bit integer (4294967295) and add 0.5 to shift the range from [-0.5;+0.5] to [0;1]:
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5)
Your query becomes:
UPDATE SGT_EMPLOYER
SET SSN =
CONVERT(NUMERIC(10,0),
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 899999999.0 + 100000000.0)
WHERE EMPLOYER_ACCOUNT_ID = 123456789;
Yes, the rand() line will only be executed once, before the rows are being updated, not every time a row is updated.
You can use a Stored Procedure to update every row with (CONVERT(NUMERIC(10,0),RAND() * 899999999) + 100000000).
Sean Lange is 100% correct. However, if you want to quickly mask your SSN, perhaps the following using HashBytes() may help.
Example
Declare #Table table (SSN varchar(25))
Insert into #Table values
('070-99-12345'),
('123-45-67890')
Select SSN
,AsInt = abs(cast(HashBytes('MD5', SSN) as int))
From #Table
Returns
SSN AsInt
070-99-12345 508860145
123-45-67890 843256257
I am using SQL Server Management Studio 2012. I work with medical records and need to de-identify reports. The reports are structured in a table with columns Report_Date, Report_Subject, Report_Text, etc... The string I need to update is in report_text and there are ~700,000 records.
So if I have:
"patient had an EKG on 04/09/2012"
I need to replace that with:
"patient had an EKG on [DEIDENTIFIED]"
I tried
UPDATE table
SET Report_Text = REPLACE(Report_Text, '____/___/____', '[DEIDENTIFED]')
because I need to replace anything in there that looks like a date, and it runs but doesn't actually replace anything, because apparently I can't use the _ wildcard in this command.
Any recommendations on this? Advance thanks!
You can use PATINDEX to find the location of Date and then use SUBSTRING and REPLACE to replace the dates.
Since there may be multiple dates in the Text you have to run a while loop to replace all the dates.
Below sql will work for all dates in the form of MM/DD/YYYY
WHILE EXISTS( SELECT 1 FROM dbo.MyTable WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0 )
BEGIN
UPDATE t
SET Report_Text = REPLACE(Report_Text, DateToBeReplaced, '[DEIDENTIFIED]')
FROM ( SELECT * ,
SUBSTRING(Report_Text,PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text), 10) AS DateToBeReplaced
FROM dbo.MyTable AS a
WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0
) AS t
END
I have tested the above sql on a dummy table with few rows.I don't know how it will scale for your data but recommend you to give it a try.
To keep it simple, assume that a number represents an identifying element in the string so look for the position of the first number in the string and the position of the last number in the string. Not sure if this will apply to your entire set of records but here is the code ...
I created two test strings ... the one you supplied and one with the date at the beginning of the string.
Declare #tstString varchar(100)
Set #tstString = 'patient had an EKG on 04/09/2012'
Set #tstString = '04/09/2012 EKG for patient'
Select #tstString
-- Calculate 1st Occurrence of a Number
,PATINDEX('%[0-9]%',#tstString)
-- Calculate last Occurrence of a Number
,LEN(#tstString) - PATINDEX('%[0-9]%',REVERSE(#tstString))
,CASE
-- No numbers in the string, return the string
WHEN PATINDEX('%[0-9]%',#tstString) = 0 THEN #tstString
-- Number is the first character to find the last position and remove front
WHEN PATINDEX('%[0-9]%',#tstString) = 1 THEN
CONCAT('[DEIDENTIFIED]',SUBSTRING(#tstString, LEN(#tstString)-PATINDEX('%[0-9]%',REVERSE(#tstString))+2,LEN(#tstString)))
-- Just select string up to the first number
ELSE CONCAT(SUBSTRING(#tstString,1,PATINDEX('%[0-9]%',#tstString)-1),'[DEIDENTIFIED]')
END AS 'newString'
As you can see, this is messy in SQL.
I would rather achieve this with a parser service and move the data with SSIS and call the service.
If I have a select statement with a scalar function in it used in various calculations, does that scalar function get called multiple times? If it does, is there a way to optimize this so it only calls the funciton once per select, as in my real query it will be called thousands of times, X 6 times per select.
For example:
SELECT
[dbo].[fn_Days](#Account) + u.[DayRate],
[dbo].[fn_Days](#Account) / u.[WorkDays]
FROM [dbo].[tblUnit] u
All fn_days does is return an int of days worked.
Yes the scalar gets called multiple times the way that you have coded it. One way to make it work would be to wrap it into a subquery like this:
SELECT t.[days] + t.[DayRate],
t.[days] / t.[WorkDays]
FROM (
SELECT
[dbo].[fn_Days](#Account) as days,
u.[DayRate],
u.[WorkDays]
FROM [dbo].[tblUnit] u) as t
This way fn_Days only gets called once per row, rather than twice, or six times like you mentioned.
Hope this helps.
Functions are deterministic which means that it will always return the same value for a given parameter. You are using a variable as the parameter so you can call the function once before executing the query and use the result in the query instead of calling the function.
DECLARE #Days int
SET #Days = [dbo].[fn_Days](#Account)
SELECT
#Days + u.[DayRate],
#Days / u.[WorkDays]
FROM [dbo].[tblUnit] u
declare #fieldForceCounter as int
declare #SaleDate as dateTime
declare #RandomNoSeed as decimal
set #fieldForceCounter = 1
set #SaleDate = '1 Jan 2009'
set #RandomNoSeed = 0.0
WHILE #fieldForceCounter <= 3
BEGIN
while #SaleDate <= '1 Dec 2009'
begin
INSERT INTO MonthlySales(FFCode, SaleDate, SaleValue) VALUES(#fieldForceCounter, #SaleDate, RAND(#RandomNoSeed))
set #saleDate = #saleDate + 1
set #RandomNoSeed = Rand(#RandomNoSeed) + 1
end
set #SaleDate = '1 Jan 2009'
set #fieldForceCounter = #fieldForceCounter + 1
END
GO
This T-SQL command was supposed to insert random values in the 'SaleValue'-column in the 'MonthlySales'-table.
But it is inserting '1' every time .
What can be the problem?
Two problems:
Firstly, the rand() function returns a number between 0 and 1.
Secondly, when rand() is called multiple times in the same query (e.g. for multiple rows in an update statement), it usually returns the same number (which I suspect your algorithm above is trying to solve, by splitting it into multiple calls)
My favourite way around the second problem is to use a function that's guaranteed to return a unique value each time, like newid(), convert it to varbinary, and use it as the seed :)
Edit: after some testing, it seems you'll need to try using a different datatype for #RandomNoSeed; float behaves somewhat different to decimal, but still approaches a fixed value, so I'd recommend avoiding the use of #RandomNoSeed altogether, and simply use:
INSERT INTO MonthlySales(FFCode, SaleDate, SaleValue)
VALUES(#fieldForceCounter, #SaleDate, RAND(convert(varbinary,newid())))
You have major issues here...
Decimal issues
The default precision/scale for decimal is 38,0. So you aren't storing any decimal part.
So you are only using RAND(0) for 1st iteration and RAND(1) for all subsequent iterations, which is 0.943597390424144 and 0.713591993212924
I can't recall how rounding/truncation applies, and I don't know what datatype SalesValue is, but rounding would give "1" every time.
Now, if you fix this and declare decimal correctly...
Seeding issues
RAND takes an integer seed. Seeding with 1.0001 or 1.3 or 1.999 gives the same value (0.713591993212924).
So, "Rand(1.713591993212924) + 1" = "RAND(1) + 1" = "1.713591993212924" for every subsequent iteration.
Back to square one...
To fix
Get rid of #RandomNoSeed
Either: Generate a random integer value using CHECKSUM(NEWID())
Or: generate a random float value using RAND() * CHECKSUM(NEWID()) (Don't care about seed now)
Just a guess, but often rand functions generate a number from 0-1. Try multiplying your random number by 10.
Is it possible, by using a stored procedure, to fetch an integer column value from resultset into a local variable, manipulate it there and then write it back to the resultset's column?
If so what would the syntax look like?
Something along the following lines should do the trick.
DECLARE #iSomeDataItem INT
SELECT #iSomeDataItem = TableColumName
FROM TableName
WHERE ID = ?
--Do some work on the variable
SET #iSomeDataItem = #iSomeDataItem + 21 * 2
UPDATE TableName
SET TableColumName = #iSomeDataItem
WHERE ID = ?
The downside to an implementation of this sort is that it only operates on a specific record however this may be what you are looking to achieve.
What you are looking for is probably more along the lines of a user-defined function that can be used in SQL just like any other built in function.
Not sure how this works in DB2, but for Oracle it would be something like this:
Create or replace Function Decrement (pIn Integer)
return Integer
Is
Begin
return pIn - 1;
end;
You could use this in a SQL, e.g.
Select Decrement (43)
From Dual;
should return the "ultimate answer" (42).
Hope this helps.
Thanks for the replies, i went another way and solved the problem without using a procedure. The core problem was to calculate a Date using various column values, the column values ahd to to converted to right format. Solved it by using large "case - when" statements in the select.
Thanks again... :-)
Why not just do the manipulation within the update statement? You don't need to load it into a variable, manipulate it, and then save it.
update TableName
SET TableColumnName=TableColumnName + 42 /* or what ever manipulation you want */
WHERE ID = ?
also,
#iSomeDataItem + 21 * 2
is the same as:
#iSomeDataItem + 42
The function idea is an unnecessary extra step, unless most of the following are true:
1) you will need to use this calculation in many places
2) the calculation is complex
3) the calculation can change