comparable varchar "arrays" in seperate fields but on same row - sql

I have a table that looks like this:
memberno(int)|member_mouth (varchar)|Inspected_Date (varchar)
-----------------------------------------------------------------------------
12 |'1;2;3;4;5;6;7' |'12-01-01;12-02-02;12-03-03' [7 members]
So by looking at how this table has been structured (poorly yes)
The values in the member_mouth field is a string that is delimited by a ";"
The values in the Inspected_Date field is a string that is delimited by a ";"
So - for each delimited value in member_mouth there is an equal inspected_date value delimited inside the string
This table has about 4Mil records, we have an application written in C# that normalizes the data and stores it in a separate table. The problem now is because of the size of the table it takes a long time for this to process. (the example above is nothing compared to the actual table, it's much larger and has a couple of those string "array" fields)
My question is this: What would be the best and fastest way to normilize this data in MSSQL proc? let MSSQL do the work and not a C# app?

The best way will be SQL itself. The way followed in the below code is something which worked for me well with 2-3 lakhs of data.
I am not sure about the below code when it comes to 4 Million, but may help.
Declare #table table
(memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
Insert into #table Values
(12,'1;2;3;4;5;6;7','12-01-01;12-02-02;12-03-03;12-04-04;12-05-05;12-07-07;12-08-08'),
(14,'1','12-01-01'),
(19,'1;5;8;9;10;11;19','12-01-01;12-02-02;12-03-03;12-04-04;12-07-07;12-10-10;12-12-12')
Declare #tableDest table
(memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
The table will be like.
Select * from #table
See the code from here.
------------------------------------------
Declare #max_len int,
#count int = 1
Set #max_len = (Select max(Len(member_mouth) - len(Replace(member_mouth,';','')) + 1)
From #table)
While #count <= #max_len
begin
Insert into #tableDest
Select memberno,
SUBSTRING(member_mouth,1,charindex(';',member_mouth)-1),
SUBSTRING(Inspected_Date,1,charindex(';',Inspected_Date)-1)
from #table
Where charindex(';',member_mouth) > 0
union
Select memberno,
member_mouth,
Inspected_Date
from #table
Where charindex(';',member_mouth) = 0
Delete from #table
Where charindex(';',member_mouth) = 0
Update #table
Set member_mouth = SUBSTRING(member_mouth,charindex(';',member_mouth)+1,len(member_mouth)),
Inspected_Date = SUBSTRING(Inspected_Date,charindex(';',Inspected_Date)+1,len(Inspected_Date))
Where charindex(';',member_mouth) > 0
Set #count = #count + 1
End
------------------------------------------
Select *
from #tableDest
Order By memberno
------------------------------------------
Result.

You can take a reference here.
Splitting delimited values in a SQL column into multiple rows

Do it on SQl server side, if possible a SSIS package would be great.

Related

SQL UPDATE specific characters in string

I have a column with the following values (there is alot more):
20150223-001
20150224-002
20150225-003
I need to write an UPDATE statement which will change the first 2 characters after the dash to 'AB'. Result has to be the following:
20150223-AB1
20150224-AB2
20150225-AB3
Could anyone assist me with this?
Thanks in advance.
Use this,
DECLARE #MyString VARCHAR(30) = '20150223-0000000001'
SELECT STUFF(#MyString,CHARINDEX('-',#MyString)+1,2,'AB')
If there is a lot of data, you could consider to use .WRITE clause. But it is limited to VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) data types.
If you have one of the following column types, the .WRITE clause is easiest for this purpose, example below:
UPDATE Codes
SET val.WRITE('AB',9,2)
GO
Other possible choice could be simple REPLACE:
UPDATE Codes
SET val=REPLACE(val,SUBSTRING(val,10,2),'AB')
GO
or STUFF:
UPDATE Codes
SET val=STUFF(val,10,2,'AB')
GO
I based on the information that there is always 8 characters of date and one dash after in the column. I prepered a table and checked some solutions which were mentioned here.
CREATE TABLE Codes(val NVARCHAR(MAX))
INSERT INTO Codes
SELECT TOP 500000 CONVERT(NVARCHAR(128),GETDATE()-CHECKSUM(NEWID())%1000,112)+'-00'+CAST(ABS(CAST(CHECKSUM(NEWID())%10000 AS INT)) AS NVARCHAR(128))
FROM sys.columns s1 CROSS JOIN sys.columns s2
I run some tests, and based on 10kk rows with NVARCHAR(MAX) column, I got following results:
+---------+------------+
| Method | Time |
+---------+------------+
| .WRITE | 28 seconds |
| REPLACE | 30 seconds |
| STUFF | 15 seconds |
+---------+------------+
As we can see STUFF looks like the best option for updating part of string. .WRITE should be consider when you insert or append new data into string, then you could take advantage of minimall logging if the database recovery model is set to bulk-logged or simple. According to MSDN articleabout UPDATE statement: Updating Large Value Data Types
According to the OP Comment:-
Its always 8 charachters before the dash but the characters after the
dash can vary. It has to update the first two after the dash.
use the next simple code:-
DECLARE #MyString VARCHAR(30) = '20150223-0000000001'
SELECT REPLACE(#MyString,SUBSTRING(#MyString,9,3),'-AB')
Result:-
20150223-AB00000001
try,
update table set column=stuff(column,charindex('-',column)+1,2,'AB')
Declare #Table1 TABLE (DateValue Varchar(50))
INSERT INTO #Table1
SELECT '20150223-000000001' Union all
SELECT '20150224-000000002' Union all
SELECT '20150225-000000003'
SELECT DateValue,
CONCAT(SUBSTRING(DateValue,0,CHARINDEX('-',DateValue)),
REPLACE(LEFT(SUBSTRING(DateValue,CHARINDEX('-',DateValue)+1,Len(DateValue)),2),'00','-AB'),
SUBSTRING(DateValue,CHARINDEX('-',DateValue)+1,Len(DateValue))) AS ExpectedDateValue
FROM #Table1
OutPut
DateValue ExpectedDateValue
---------------------------------------------
20150223-000000001 20150223-AB000000001
20150224-000000002 20150224-AB000000002
20150225-000000003 20150225-AB000000003
To Update
Update #Table1
SEt DateValue= CONCAT(SUBSTRING(DateValue,0,CHARINDEX('-',DateValue)),
REPLACE(LEFT(SUBSTRING(DateValue,CHARINDEX('-',DateValue)+1,Len(DateValue)),2),'00','-AB'),
SUBSTRING(DateValue,CHARINDEX('-',DateValue)+1,Len(DateValue)))
From #Table1
SELECT * from #Table1
OutPut
DateValue
-------------
20150223-AB000000001
20150224-AB000000002
20150225-AB000000003

Customized Primary Key on SQL Server 2008 R2

I have several days trying to solve this problem, but my lack of knowledge is stopping me, I don’t know if is possible what I am trying to accomplish.
I need to have a table like this:
The first field should be a custom primary key ID (auto incremented):
YYYYMMDD-99
Where YYYMMDD is the actual day and “99” is a counter that should be incremented automatically from 01 to 99 in every new row added and need to be automatically restarted to 01 the next day.
The second field is a regular NVARCHAR(40) text field called: Name
For example, I add three rows, just introducing the “Name” of the person, the ID is automatically added:
ID Name
---------------------------
20160629-01 John
20160629-02 Katie
20160629-03 Mark
Then, the next day I add two new rows:
ID Name
-------------------------
20160630-01 Bob
20160630-02 Dave
The last two digits should be restarted, after the day changes.
And, what is all this about ?
Answer: Customer requirement.
If is possible to do it in a stored procedure, it will works for me too.
Thanks in advance!!
This is pretty easy to achieve, but a bit complicated to do so it is safe with multiple clients.
What you need is a new table (for example named IndexHelper) that actually stores the parts of the index as it should be using two columns: One has the current date properly formatted as you want it in your index and one is the current index as integer. Example:
DateString CurrentIndex
-------------------------------
20160629 13
Now you need some code that helps you get the next index value atomically, i.e. in a way that also works when more than one client try to insert at the same time without getting the same index more than once.
T-SQL comes to the rescue with its UPDATE ... OUTPUT clause, which allows you to update a table, at the same time outputting the new values as an atomic operation, which can not be interrupted.
In your case, this statement could look like this:
DECLARE #curDay NVARCHAR(10)
DELCARE #curIndex INT
DECLARE #tempTable TABLE (theDay NVARCHAR(10), theIndex INT)
UPDATE IndexHelper SET CurrentIndex = CurrentIndex + 1 OUTPUT INSERTED.DateString, INSERTED.CurrentIndex INTO #temptable WHERE CurrentDate = <code that converts CURRENT_TIMESTAMP into the string format you want>
SELECT #curDay = theDay, #curIndex = theIndex FROM #tempTable
Unfortunately you have to go the temporary table way, as it is demanded by the OUTPUT clause.
This increments the CurrentIndex field in IndexHelper atomically for the current date. You can combine both into a value like this:
DECLARE #newIndexValue NVARCHAR(15)
SET #newIndexValue = #curDay + '-' + RIGHT('00' + CONVERT(NVARCHAR, #curIndex), 2)
Now the question is: How do you handle the "go back to 01 for the next day" requirement? Also easy: Add entries into IndexHelper for 2 days in advance with the respective date and index 0. You can do this safely everytime your code is called if you check that an entry for a day is actually missing. So for today your table might look like this:
DateString CurrentIndex
-------------------------------
20160629 13
20160630 0
20160701 0
The first call tomorrow would make this look like:
DateString CurrentIndex
-------------------------------
20160629 13
20160630 1
20160701 0
20160702 0
Wrap this up into a stored procedure that does the entire INSERT process into your original table, what you get is:
Add missing entries for the next two days to IndexHelper table.
Get the next ID atomically as described above
Combine date string and ID from the UPDATE command into a single string
Use this in the INSERT command for your actual data
This results in the following stored procedure you can use to insert your data:
-- This is our "work date"
DECLARE #now DATETIME = CURRENT_DATETIME
-- These are the date strings that we need
DECLARE #today NVARCHAR(10) = CONVERT(NVARCHAR, #now, 112)
DECLARE #tomorrow NVARCHAR(10) = CONVERT(NVARCHAR, DATEADD(dd, 1, #now), 112)
DECLARE #datomorrow NVARCHAR(10) = CONVERT(NVARCHAR, DATEADD(dd, 2, #now), 112)
-- We will need these later
DECLARE #curDay NVARCHAR(10)
DELCARE #curIndex INT
DECLARE #tempTable TABLE (theDay NVARCHAR(10), theIndex INT)
DECLARE #newIndexValue NVARCHAR(15)
-- Add entries for next two days into table
-- NOTE: THIS IS NOT ATOMIC! SUPPOSED YOU HAVE A PK ON DATESTRING, THIS
-- MAY EVEN FAIL! THAT'S WHY IS USE BEGIN TRY
BEGIN TRY
IF NOT EXISTS (SELECT 1 FROM IndexHelper WHERE DateString = #tomorrow)
INSERT INTO IndexHelper (#tomorrow, 0)
END TRY
BEGIN CATCH
PRINT 'hmpf'
END CATCH
BEGIN TRY
IF NOT EXISTS (SELECT 1 FROM IndexHelper WHERE DateString = #datomorrow)
INSERT INTO IndexHelper (#datomorrow, 0)
END TRY
BEGIN CATCH
PRINT 'hmpf again'
END CATCH
-- Now perform the atomic update
UPDATE IndexHelper
SET
CurrentIndex = CurrentIndex + 1
OUTPUT
INSERTED.DateString,
INSERTED.CurrentIndex
INTO #temptable
WHERE CurrentDate = #today
-- Get the values after the update
SELECT #curDay = theDay, #curIndex = theIndex FROM #tempTable
-- Combine these into the new index value
SET #newIndexValue = #curDay + '-' + RIGHT('00' + CONVERT(NVARCHAR, #curIndex), 2)
-- PERFORM THE INSERT HERE!!
...
One way to achieve customised auto increment is using INSTEAD OF trigger in SQL Server.
https://msdn.microsoft.com/en-IN/library/ms189799.aspx
I have tested this using below code.
This might be helpful.
It is written with the assumption that maximum 99 records will be inserted in a given day.
You will have to modify it to handle more than 99 records.
CREATE TABLE dbo.CustomerTb(
ID VARCHAR(50),
Name VARCHAR(50)
)
GO
CREATE TRIGGER dbo.InsertCustomerTrigger ON dbo.CustomerTb INSTEAD OF INSERT
AS
BEGIN
DECLARE #MaxID SMALLINT=0;
SELECT #MaxID=ISNULL(MAX(RIGHT(ID,2)),0)
FROM dbo.CustomerTb
WHERE LEFT(ID,8)=FORMAT(GETDATE(),'yyyyMMdd');
INSERT INTO dbo.CustomerTb(
ID,
Name
)
SELECT FORMAT(GETDATE(),'yyyyMMdd')+'-'+RIGHT('00'+CONVERT(VARCHAR(5),ROW_NUMBER() OVER(ORDER BY Name)+#MaxID),2),
Name
FROM inserted;
END
GO
TEST CASE 1
INSERT INTO dbo.CustomerTb(NAME) VALUES('A'),('B');
SELECT * FROM dbo.CustomerTb;
TEST CASE 2
INSERT INTO dbo.CustomerTb(NAME) VALUES('P'),('Q');
SELECT * FROM dbo.CustomerTb;

creating a SQL table with multiple columns automatically

I must create an SQL table with 90+ fields, the majority of them are bit fields like N01, N02, N03 ... N89, N90 is there a fast way of creating multiple fileds or is it possible to have one single field to contain an array of values true/false? I need a solution that can also easily be queried.
There is no easy way to do this and it will be very challenging to do queries against such a table. Create a table with three columns - item number, bit field number and a value field. Then you will be able to write 'good' succinct Tsql queries against the table.
At least you can generate ALTER TABLE scripts for bit fields, and then run those scripts.
DECLARE #COUNTER INT = 1
WHILE #COUNTER < 10
BEGIN
PRINT 'ALTER TABLE table_name ADD N' + RIGHT('00' + CONVERT(NVARCHAR(4), #COUNTER), 2) + ' bit'
SET #COUNTER += 1
END
TLDR: Use binary arithmetic.
For a structure like this
==============
Table_Original
==============
Id | N01| N02 |...
I would recommend an alternate table structure like this
==============
Table_Alternate
==============
Id | One_Col
This One_Col is of varchar type which will have value set as
cast(n01 as nvarchar(1)) + cast(n02 as nvarchar(1))+ cast(n03 as nvarchar(1)) as One_Col
I however feel that you'd use C# or some other programming language to set value into column. You can also use bit and bit-shift operations.
Whenever you need to get a value, you can use SQL or C# syntax(treating as string)
In sql query terms you can use a query like
SELECT SUBSTRING(one_col,#pos,1)
and #pos can be set like
DECLARE #Colname nvarchar(4)
SET #colname=N'N32'
-- ....
SET #pos= CAST(REPLACE(#colname,'N','') as INT)
Also you can use binary arithmetic too with ease in any programming language.
Use three columns.
Table
ID NUMBER,
FIELD_NAME VARCHAR2(10),
VALUE NUMBER(1)
Example
ID FIELD VALUE
1 N01 1
1 N02 0
.
1 N90 1
.
2 N01 0
2 N02 1
.
2 N90 1
.
You can also OR an entire column for a fieldname (or fieldnameS):
select DECODE(SUM(VALUE), 0, 0, 1) from table where field_name = 'N01';
And even perform an AND
select EXP(SUM(LN(VALUE))) from table where field_name = 'N01';
(see http://viralpatel.net/blogs/row-data-multiplication-in-oracle/)

SQL select multiple rows of data then compare

What would be the best approach in SQL Server 2008 to select something that can contain 10 list of data, then compare that data with a specific value in one of it's columns
So something like this below
SELECT bType FROM WORK_STATION WHERE nFileId = 123456789
Which could return either 1 - 10 values MAX (will return at least one value). Then to compare the data from that SQL statement above that we just selected to a specific value to something like
if bType = 1
--DO something
What is the best approach of doing something like this?
declare #table as table(btype int)
declare #btype int
insert into #table
SELECT bType FROM WORK_STATION WHERE nFileId = 123456789
while(exists(select top 1 'x' from #table)) --as long as #table contains records continue
begin
select top 1 #btype = btype from #table
if(#btype = 10)
print 'something'
delete top (1) from #table --remove the previously processed row. also ensures no infinite loop
end
I think you can use SP to declare variables and then compare it with the resultset, if you know that you have only 10 values you can use temp table and insert 10 values.
I hope this is helpful.

sql server cursor

I want to copy data from one table (rawdata, all columns are VARCHAR) to another table (formatted with corresponding column format).
For copying data from the rawdata table into formatted table, I'm using cursor in order to identify which row is affected. I need to log that particular row in an error log table, skip it, and continue copying remaining rows.
It takes more time to copying. Is there any other way to achieve this?
this is my query
DECLARE #EntityId Varchar(16) ,
#PerfId Varchar(16),
#BaseId Varchar(16) ,
#UpdateStatus Varchar(16)
DECLARE CursorSample CURSOR FOR
SELECT EntityId, PerfId, BaseId, #UpdateStatus
FROM RawdataTable
--Returns 204,000 rows
OPEN CursorSample
FETCH NEXT FROM CursorSample INTO #EntityId,#PerfId,#BaseId,#UpdateStatus
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
--try insertting row in formatted table
Insert into FormattedTable
(EntityId,PerfId,BaseId,UpdateStatus)
Values
(Convert(int,#EntityId),
Convert(int,#PerfId),
Convert(int,#BaseId),
Convert(int,#UpdateStatus))
END TRY
BEGIN CATCH
--capture Error EntityId in errorlog table
Insert into ERROR_LOG
(TableError_Message,Error_Procedure,Error_Log_Time)
Values
(Error_Message()+#EntityId,’xxx’, GETDATE())
END CATCH
FETCH NEXT FROM outerCursor INTO #EntityId, #BaseId
END
CLOSE CursorSample
DEALLOCATE CursorSampler –cleanup CursorSample
You should just be able to use a INSERT INTO statement to put the records directly into the formatted table. INSERT INTO will perform much better than using a cursor.
INSERT INTO FormattedTable
SELECT
CONVERT(int, EntityId),
CONVERT(int, PerfId),
CONVERT(int, BaseId),
CONVERT(int, UpdateStatus)
FROM RawdataTable
WHERE
IsNumeric(EntityId) = 1
AND IsNumeric(PerfId) = 1
AND IsNumeric(BaseId) = 1
AND IsNumeric(UpdateStatus) = 1
Note that IsNumeric can sometimes return 1 for values that will then fail on CONVERT. For example, IsNumeric('$e0') will return 1, so you may need to create a more robust user defined function for determining if a string is a number, depending on your data.
Also, if you need a log of all records that could not be moved into the formatted table, just modify the WHERE clause:
INSERT INTO ErrorLog
SELECT
EntityId,
PerfId,
BaseId,
UpdateStatus
FROM RawdataTable
WHERE
NOT (IsNumeric(EntityId) = 1
AND IsNumeric(PerfId) = 1
AND IsNumeric(BaseId) = 1
AND IsNumeric(UpdateStatus) = 1)
EDIT
Rather than using IsNumeric directly, it may be better to create a custom UDF that will tell you if a string can be converted to an int. This function worked for me (albeit with limited testing):
CREATE FUNCTION IsInt(#value VARCHAR(50))
RETURNS bit
AS
BEGIN
DECLARE #number AS INT
DECLARE #numeric AS NUMERIC(18,2)
SET #number = 0
IF IsNumeric(#value) = 1
BEGIN
SET #numeric = CONVERT(NUMERIC(18,2), #value)
IF #numeric BETWEEN -2147483648 AND 2147483647
SET #number = CONVERT(INT, #numeric)
END
RETURN #number
END
GO
The updated SQL for the insert into the formatted table would then look like this:
INSERT INTO FormattedTable
SELECT
CONVERT(int, CONVERT(NUMERIC(18,2), EntityId)),
CONVERT(int, CONVERT(NUMERIC(18,2), PerfId)),
CONVERT(int, CONVERT(NUMERIC(18,2), BaseId)),
CONVERT(int, CONVERT(NUMERIC(18,2), UpdateStatus))
FROM RawdataTable
WHERE
dbo.IsInt(EntityId) = 1
AND dbo.IsInt(PerfId) = 1
AND dbo.IsInt(BaseId) = 1
AND dbo.IsInt(UpdateStatus) = 1
There may be a little weirdness around handling NULLs (my function will return 0 if NULL is passed in, even though an INT can certainly be null), but that can be adjusted depending on what is supposed to happen with NULL values in the RawdataTable.
You can put a WHERE clause in your cursor definition so that only valid records are selected in the first place. You might need to create a function to determine validity, but it should be faster than looping over them.
Actually, you might want to create a temp table of the invalid records, so that you can log the errors, then define the cursor only on the rows that are not in the temp table.
Insert into will work much more better than Cursor.
As Cursor work solely in Memory of your PC and slows down the optimization of SQL Server. We should avoid using Cursors but (of course) there are situations where usage of Cursor cannot be avoided.