MySQL; Split one column into multiple columns;data changes daily - sql

This is a common question, but I found my problem to be unique. I have a database in MySQL workbench that has multiple order number (inconsistent length) and status conditions (three types) in a single column. I must separate the order numbers from their status and the order numbers from each other.
What I have:
|NUMBER_STATUS|
|1234-START, 12323-END|
|234 - END, 12423-START, 53443-WIP|
What my final output should be:
|Number_STATUS_1| |Number_STATUS_2| |Number_STATUS_3|
1234 12323
234 12423 53443
There are over 25,000 data points. So far, I have tried writing a left function:
SELECT *,
LEFT(NUMBER_STATUS, locate('-START', NUMBER_STATUS)-1) AS 'NUMBER_STATUS_1'
FROM Request;
This function does exactly what I want: Creates a new column with the status removed, but does not carry over any other data in its row.
I thought of three plans to attack this:
create new columns split the original by pieces. I can do this in excel using "split text into cells" and then bring it into MySQL Workbench, but I know SQL is more powerful, so I would like to write a script for this.
Create a pseudo table that stores each new column (Number status 1, number status 2, etc) but the data changes daily so I don't want to limit the number of new columns that can be created.
Ask you all.
Some other links I referenced for help:
Split one column to multiple columns but data will vary SQL
Split string and return data in multiple columns
But my knowledge of SQL is still growing and I have no idea what these functions mean, I would greatly appreciate the help.

This is what I used to solve my problem:
I altered table and column names for personal reasons. I hope this helps other people in the future!
/*remove status name from data*/
Update table
Set
columnSTATUS = REPLACE(columnSTATUS, '-status1', '');
Update table
Set
columnSTATUS = REPLACE(columnSTATUS, '-staus2', '');
Update table
Set
columnSTATUS = REPLACE(columnSTATUS, '-status3', '');
/*split data into columns*/
UPDATE table SET
`column1` = IF (
LOCATE(',', columnSTATUS) >0,
SUBSTRING(columnSTATUS, 1,LOCATE(',', columnSTATUS)-1),
column_STATUS
),
`column2` = IF(
LOCATE(',', columnSTATUS) > 0,
SUBSTRING(columnSTATUS, LOCATE(',', columnSTATUS)+1),
'');
UPDATE table SET
`column3` = IF (
LOCATE(',', column2) >0,
SUBSTRING(column2, LOCATE(',', column2)+1), '');
UPDATE table SET
`column4` = IF (
LOCATE(',', column3) >0,
SUBSTRING(column3, LOCATE(',', column3)+1), '');
/*remove data remaining in column after commas*/
UPDATE table SET
column2 = SUBSTRING_INDEX(column2, ',', 1);
UPDATE Service_Request SET
column3 = SUBSTRING_INDEX(column3,',',1);
NB: I do not have a choice how the data is imported.

Related

How to prevent insertion of value if value exists in other columns

Given 3 columns in a table tblEmails,
email1, nvarchar(50), NULLs not permitted
email2, nvarchar(50), NULLs permitted
email3, nvarchar(50), NULLs permitted
how do I prevent insertions or updates of a (non-NULL) value to any of the three columns if the value already exists in any of the other columns?
I was hoping to apply a CONSTRAINT by checking if the UNION ALL of the three tables contains the value to be inserted, but it seems count() can't be used in CONSTRAINTs.
Any solution implementable via the SSMS gui would be ideal.
I looked through at least a dozen SO posts, some SE posts, and articles online, but could not find a solution (or one that I could understand).
I would suggest creating a function which is then called by the check constraint. Following an example:
CREATE FUNCTION dbo.fn_chkMail(#mail nvarchar(100)) RETURNS INT AS
BEGIN
RETURN (SELECT COUNT(*) FROM mails WHERE mail1 = #mail) + (SELECT COUNT(*) FROM mails WHERE mail2 = #mail) + (SELECT COUNT(*) FROM mails WHERE mail3 = #mail)
END
and then
ALTER TABLE dbo.mails WITH CHECK ADD CONSTRAINT [CK_mails]
CHECK ((dbo.fn_chkMail([mail1]))+(dbo.fn_chkMail([mail2]))+(dbo.fn_chkMail([mail3]))=1)
See fiddle for details: http://sqlfiddle.com/#!18/6f375/7/1
You want to prevent modifying values using update when values are already in the table. Unfortunately, that suggests a trigger.
I think the logic looks like this:
CREATE TRIGGER trg_tblEmails_update ON tblEmails
AFTER UPDATE
AS BEGIN
IF (EXISTS (SELECT 1
FROM inserted i JOIN
deleted d
ON i.<primary key> = d.<primary key>
WHERE (d.email1 IS NOT NULL OR
d.email2 IS NOT NULL OR
d.email3 IS NOT NULL
) AND
(COALESCE(d.email1, '') <> COALESCE(i.email1, '') OR
COALESCE(d.email2, '') <> COALESCE(i.email2, '') OR
COALESCE(d.email3, '') <> COALESCE(i.email3, '')
)
)
)
BEGIN
RAISERROR('You can not update emails when value is already present', 16, 1);
ROLLBACK TRANSACTION;
END;
END;
I would suggest though that there might be a simpler data model. For instance, I would recommend storing the emails in a separate table with one row per email. You would use this table as:
When you insert a value into one email, you insert all three.
You have a unique index on the entity id and email number.
You don't allow updates on the table.
EDIT:
I suspect that you really want a unique constraint. You are not looking within a row but across all rows.
If that is the case, you simply have the wrong data model. You need a table with one email per row. This might require a column to identify which email, but something like this:
create table entity_emails (
entity_email_id int identity(1, 1) primary key,
which_email int,
email varchar(255)
);
Then you want the following constraints:
check (which_email in (1, 2, 3));
unique (entity_id, which_email);
unique (email);
The first two limits the number of emails to 3 per entity. The third insists that the email be unique over all rows and entities.
With the right data model, what you need to do may not require a trigger.

Do not Update the Values in Merge statement if old values do not change while update in Merge

MERGE PFM_EventPerformance_MetaData AS TARGET
USING
(
SELECT
[InheritanceMeterID] = #InheritanceMeterPointID
,[SubHourlyScenarioResourceID] = #SubHourlyScenarioResourceID
,[MeterID] = #MeterID--internal ID
,[BaselineID] = #BaselineID--internal ID
,[UpdateUtc] = GETUTCDATE()
)
AS SOURCE ON
TARGET.[SubHourlyScenarioResourceID] = SOURCE.[SubHourlyScenarioResourceID]
AND TARGET.[MeterID] = SOURCE.[MeterID]--internal ID
AND TARGET.[BaselineID] = SOURCE.[BaselineID]--internal ID
WHEN MATCHED THEN UPDATE SET
#MetaDataID = TARGET.ID--get preexisting ID when exists (must populate one row at a time)
,InheritanceMeterID = SOURCE.InheritanceMeterID
,[UpdateUtc] = SOURCE.[UpdateUtc]
WHEN NOT MATCHED
THEN INSERT
(
[InheritanceMeterID]
,[SubHourlyScenarioResourceID]
,[MeterID]--internal ID
,[BaselineID]--internal ID
)
VALUES
(
SOURCE.[InheritanceMeterID]
,SOURCE.[SubHourlyScenarioResourceID]
,SOURCE.[MeterID]--internal ID
,SOURCE.[BaselineID]--internal ID
);
In the above query I do not want to update the values in the Target table if there is no change in old values. I am not sure how to achieve this as I have used Merge statement rarely. Please help me with the solution. Thanks in advance
This is done best in two stages.
Stage 1: Merge Update on condition
SO Answer from before (Thanks to #Laurence!)
Stage 2: hash key condition to compare
Limits: max 4000 characters, including column separator characters
A rather simple way to compare multiple columns in one condition is the use of a computed column on both sides that HASHBYTES( , <column(s)> ) generates.
This moves writing lots of code from the merge statement to the table generation.
Quick example:
CREATE TABLE dbo.Test
(
id_column int NOT NULL,
dsc_name1 varchar(100),
dsc_name2 varchar(100),
num_age tinyint,
flg_hash AS HashBytes( 'SHA1',
Cast( dsc_name1 AS nvarchar(4000) )
+ N'•' + dsc_name2 + N'•' + Cast( num_age AS nvarchar(3) )
) PERSISTED
)
;
Comparing columns flg_hash between source and destination will make comparison quick as it is just a comparison between two 20 bit varbinary columns.
Couple of Caveat Emptor for working with HashBytes:
Function only works for a total of 4000 nvarchar characters
Trade off for short comparison code lies in generation of correct order in views and tables
There is a duplicate collision chance of around an 2^50+ for SHA1 - as security mechanism this is now considered insecure and a few years ago MS tried to drop SHA1 as algorithm
Added columns to tables and views can be overlooked from comparison if hash bytes code is outside of consideration for amendments
Overall I found that when comparing multiple columns this can overload my server engines but never had an issue with hash key comparisons

UPDATE two columns with new value under large size table

We have table like :
mytable (pid, string_value, int_value)
This table has more than 20M rows in total. Now we have a feature try to mark all the rows from this tables as invalid. So we need update the table columns: string_Value = NULL and int_value = 0 which indicate this is invalid row ( we still want to keep the pid as it is important to us)
So what is the best way?
I use the following SQL:
UPDATE Mytable
SET string_value = NULL,
int_value = 0;
but this query takes more than 4 minutes in my test env. Is there any better way we can improve it?
Updating all the rows can be quite expensive. Often, it is faster to empty the table and reload it.
In generic SQL this looks like:
create table mytable_temp as
select pid
from mytable;
truncate table mytable; -- back it up first!
insert into mytable (pid, string_value, int_value)
select pid, null, 0
from mytable_temp;
The creation of the temporary table may use different syntax, depending on our database.
Updates can take time to complete. Another way of achieving this is to follow the following steps:
Add new columns with the values you need set as the default value
Drop the original columns
Rename the new columns with the names of the original columns.
You can then drop the default values on the new columns.
This needs to be tested as different DBMSs allow different levels of table alters (i.e. not all DMBSs allow a drop default or a drop column).

Updating or inserting SQL Server destination table data based on conditions from source table data

I have two SQL Server tables, one is my destination table (LocaleStringResource), the other one is the source table (TempResourceSunil).
Source table has the following columns: TempResourceSunil
[ID], [LanguageId], [ResourceName], [ResourceValue], [Burmese], [Unicode]
and the destination table's columns are LocaleStringResource
[Id], [LanguageId], [ResourceName], [ResourceValue]
I want to update the destination table [ResourceValue] based on [ResourceName] from the source file.
Example:
[ResourceName] = 'Account.AccountActivation'
means I want to check it have corresponding Burmese [ResourceValue] in LocaleStringResource table if it does not exist, I will take it from TempResourceSunil and Burmese column and insert it into LocaleStringResource with language id =2.
Same if [ResourceValue] for Unicode (language id = 3) does not exist for [ResourceName] = 'Account.AccountActivation' means I want to insert [ResourceValue] from TempResourceSunil with language id = 3.
Can any SQL expert help me?
The description you gave isn't really fleshed out however, you want to use a Case Statement. CASE STATEMENT INFO
A case statement can have multiple WHENs to cover multiple logic statements. You can even have one inside the other. I wouldn't really do that for this situation.
The example below is just a simple version.
If l.[ResourceValue] is null and l.[ResourceName] = 'Account.AccountActivation' then use the value of T.[Burmese] for column l.[ResourceValue]. ELSE means if no When within the case statement is true, then use this value.
Also be aware that if you are trying to use an INT value from the first table in a string column on the 2nd, you need to cast it as a varchar.
Test out your logic and case statements and see how you get on.
SELECT
l.[Id],
l.[LanguageId],
l.[ResourceName],
CASE WHEN l.[ResourceName] = 'Account.AccountActivation' and l.[ResourceValue] is null then T.[Burmese]
else l.[ResourceValue] end as [ResourceValue],
T.[ID],
T.[LanguageId],
T.[ResourceName],
T.[ResourceValue],
T.[Burmese],
T.[Unicode]
FROM LocaleStringResource as L
LEFT JOIN TempResourceSunil t on (t.ID = L.ID) and (t.[LanguageId] = l.[LanguageId])

change ID number to smooth out duplicates in a table

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?
How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
do
update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;
alter table TMP change id into id varchar(255);
update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);
insert into SRC select * from tmp;
What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on ID.
Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.
Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.
Here is my final solution:
update a
set IDnum=b.IDnum
from tempimiportable A inner join
(select * from archivetable
where IDnum in
(select max(IDnum) from archivetable
where IDnum in
(select IDnum from tempimporttable)
group by left(IDnum,7)
)
) b
on b.IDnum like a.IDnum + '%'
WHERE
*row from tempimport table = row from archive table*
to set incoming rows to the same IDnum as old rows, and then
update a
set patient_account_number = case
when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
end
from tempimporttable a
where not exists ( *select rows from archive table* )
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...