Remove text from column in sql database - sql

I need to remove some text from a sql db in a metadata colum that's taking up too much space and isn't necessary.
ImageDescription: Make:NIKON Model:COOLPIX L26 Orientation:1 XResolution:300 YResolution:300 ResolutionUnit:2 Software:COOLPIX L26 V1.0 DateTime:2013:01:05 05:43:12 YCbCrPositioning:2 ExposureTime:0.04 FNumber:3.2 ExposureProgram:2 ISOSpeedRatings:200 ExifVersion:3836042731,664497658,2489535484,2327246609 DateTimeOriginal:2013:01:05 05:43:12 DateTimeDigitized:2013:01:05 05:43:12 ComponentsConfiguration:185856,59901696,256,1280 CompressedBitsPerPixel:2 ExposureBiasValue:0 MaxApertureValue:3.4 MeteringMode:5 LightSource:0 Flash:24 FocalLength:4.6 UserComment:0,0,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,2105376,16973830,65539,393216,18481972,65541,62128128,18546688,65541,62652416,19398656,65539,131072,33685414,65540,63176704,33685504,65540,223674368,0,19660800,65536,19660800,65536,3640590336,2214648831,202051584,269093902,302910734,403902481,370678312,590419990,975707429,960249139,1077360691,1078877256,927291204,1366118456,1734500183,1295935336,1685092721,1734696056,303104355,404035602,790239791,1110983267,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,3237962595,528640,60817528,33562881,285409553,29687553,16777378,16843013,65793,0,16777216,84148994,151521030,1051402,50528514,84083714,263173,24969472,67109634,554829073,319177009,570909009,2167542897,587768209,365015362,619762002,2188534323,387320329,622467352,690497318,909456426,976828471,1178944579,1246316615,1448432723,1515804759,1717920867,1785292903,1987409011 FlashpixVersion:1687135923,3231872644,1301805154,2554547069 ColorSpace:1 PixelXDimension:640 PixelYDimension:480 FileSource:3 SceneType:1 CustomRendered:0 ExposureMode:0 WhiteBalance:0 DigitalZoomRatio:0 FocalLengthIn35mmFilm:26 SceneCaptureType:0 GainControl:1 Contrast:0 Saturation:0 Sharpness:0 SubjectDistanceRange:0
I want to get rid of the Usercomment part of that text.
So far I have this that will remove anything left of the usercomment.
UPDATE Document
SET MetaData = LEFT(MetaData, CHARINDEX('UserComment:', MetaData) -1)
WHERE MetaData IS NOT NULL
AND MetaData like '%UserComment:%'
AND DocumentId = '480024'
But I just want to get rid of the usercomment and not anything before or after that.
here is a sample of just the UserComment, it seems to not have any spaces.
UserComment:0,0,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,538976288,2105376,16973830,65539,393216,18481972,65541,62128128,18546688,65541,62652416,19398656,65539,131072,33685414,65540,63176704,33685504,65540,223674368,0,19660800,65536,19660800,65536,3640590336,2214648831,202051584,269093902,302910734,403902481,370678312,590419990,975707429,960249139,1077360691,1078877256,927291204,1366118456,1734500183,1295935336,1685092721,1734696056,303104355,404035602,790239791,1110983267,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,1667457891,3237962595,528640,60817528,33562881,285409553,29687553,16777378,16843013,65793,0,16777216,84148994,151521030,1051402,50528514,84083714,263173,24969472,67109634,554829073,319177009,570909009,2167542897,587768209,365015362,619762002,2188534323,387320329,622467352,690497318,909456426,976828471,1178944579,1246316615,1448432723,1515804759,1717920867,1785292903,1987409011
Any help will be appreciated.

you can use charindex to find where the user comment start and where it ends
http://msdn.microsoft.com/en-us/library/ms186323.aspx
After that use Left & Right functions to create the string you want to keep
http://msdn.microsoft.com/en-us/library/ms177601.aspx
http://msdn.microsoft.com/en-us/library/ms177532.aspx
select
CHARINDEX('UserComment:', MetaData),
CHARINDEX(' ', MetaData, CHARINDEX('UserComment:', MetaData)),
LEFT(aaa, CHARINDEX('UserComment:', MetaData)-1) + RIGHT(MetaData, LEN(MetaData) - CHARINDEX(' ', MetaData, CHARINDEX('UserComment:', MetaData)))
from dbo.MetaData

Related

Parse string as JSON with Snowflake SQL

I have a field in a table of our db that works like an event-like payload, where all changes to different entities are gathered. See example below for a single field of the object:
'---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc'
Since accessing this field with pure SQL is a pain, I was thinking of parsing it as a JSON so that it would look like this:
{
"field_one":"1",
"field_two": "20",
"field_three": "4",
"id": "1234",
"another_id": "5678",
"some_text": "Hey you",
"a_date": "2022-11-29",
"utc": "2022-11-29 15:29:28.159296000 Z",
"another_date": "2022-11-30",
"utc": "2022-11-30 13:34:59.000000000 Z"
}
And then just use a Snowflake-native approach to access the values I need.
As you can see, though, there are two fields that are called utc, since one is referring to the first date (a_date), and the second one is referring to the second date (another_date). I believe these are nested in the object, but it's difficult to assess with the format of the field.
This is a problem since I can't differentiate between one utc and another when giving the string the format I need and running a parse_json() function (due to both keys using the same name).
My SQL so far looks like the following:
select
object,
replace(object, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last
from my_table
(Steps third and fourth are needed because I have some fields that have extra spaces in them)
And this actually gives me the format I need, but due to what I mentioned around the utc keys, I cannot parse the string as a JSON.
Also note that the structure of the string might change from row to row, meaning that some rows might gather two utc keys, while others might have one, and others even five.
Any ideas on how to overcome that?
Replace only one occurrence with regexp_replace():
with data as (
select '---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc' o
)
select parse_json(last2)
from (
select o,
replace(o, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last,
regexp_replace(last, '"utc"', '"utc2"', 1, 2) last2
from data
)
;
This may not be what you want but it seems to me that your problem could be solved if the UTC timestamps were to replace the dates preceding it where the keys are not duplicated. You can always calculate dates once you have the timestamps. If this is making sense, see if you can apply your parse_json solution to this output instead
set str='---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: 2022-11-29 15:29:28.159296000 Z\nanother_date: 2022-11-30\nutc: 2022-11-30 13:34:59.000000000 Z';
select regexp_replace($str,'[0-9]{4}-[0-9]{2}-[0-9]{2}\nutc:')

UTF8 changed to Latina 1 - Umlauts are not considered

I am currently updating a table using the UPDATE command, whereby a text section is also read from another table using a substring. The command works fine so far. The only problem are the umlauts, which are not taken into account during the update. As I found out, for some reason the substring is rewritten into the format latina1, although the corresponding column of the table (action) has utf8 preset. Enclosed is the code for updating.
SQL:
update vms_vertrag_datei d
inner join vms_vertrag_verlauf v ON d.vertrag = v.vertrag
SET d.nutzer = v.nutzer, d.uploaddatum= v.timestamp
WHERE d.filename in (SELECT DISTINCT SUBSTRING(v.aktion,LOCATE('"',v.aktion)+1,(((LENGTH(v.aktion))-LOCATE('"', REVERSE(v.aktion))-1)-LOCATE('"',v.aktion)))FROM vms_vertrag_verlauf v)
AND v.aktion like 'Datei%hinzugefügt';
Does anyone know how I can now also consider text with umlauts? Am just after longer online research something at despair.

Sql server column serialization without key

I have column A with value hello.
I need to migrate it into new column AJson with value ["hello"].
I have to do this with Sql Server command.
There are different commands FOR JSON etc. but they serialize value with column name.
This is the same value that C# method JsonConvert.SerializeObject(new List<string>(){"hello"} serialization result would be.
I can't simply attach [" in the beginning and end because the string value may contain characters which without proper serialization will break the json string.
My advice is you just make a lot of nested replaces and then do it yourself.
FOR JSON is intended for entire JSON, and therefore not valid without keys.
Here is a simple example that replaces the endline with \n
print replace('ab
c','
','\n')
Backspace to be replaced with \b.
Form feed to be replaced with \f.
Newline to be replaced with \n.
Carriage return to be replaced with \r.
Tab to be replaced with \t.
Double quote to be replaced with "
Backslash to be replaced with \
My approach was to use these 3 commands:
UPDATE Offers
SET [DetailsJson] =
(SELECT TOP 1 [Details] AS A
FROM Offers AS B
WHERE B.Id = Offers.Id
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
UPDATE Offers
SET [DetailsJson] = Substring([DetailsJson], 6, LEN([DetailsJson]) - 6)
UPDATE Offers
SET [DetailsJson] = '[' + [DetailsJson] + ']'
..for op's answer/table..
UPDATE Offers
SET [DetailsJson] = concat(N'["', string_escape([Details], 'json'), N'"]');
declare #col nvarchar(100) = N'
a b c " : [ ] ]
x
y
z'
select concat(N'["', string_escape(#col, 'json'), N'"]'), isjson(concat(N'["', string_escape(#col, 'json'), N'"]'));

XML - Extract after a String and before a certain character

I've the following XML Code:
_RCFM*=.·<form><text id="NomeTransporteSAP" label="JOB: *" mandatory="true" multiline="true" readonly="false" visible="true">AA123EDC/NB: Cheque holding v05 TESTE PT 223427</text>
I'm trying to create a statement that allows me to get the ID: AA123EDC
For that I'm using:
SUBSTRING(col1, LEN(SUBSTRING(col1, 0, LEN(col1) - CHARINDEX ('DSI Request Number', col1))) + 1,
LEN(col1) - LEN(SUBSTRING(col1, 0, LEN(col1) - CHARINDEX ('DSI Request Number', col1)))
- LEN(SUBSTRING(col1, CHARINDEX ('</text><text id=', col1), LEN(col1))))
But it gives me the wrong string...
Anybody can give me a help?
Thanks!
Your first line is quite unclear (but you've tagged it with tsql...). It seems that you want to read a value form within an XML. Furthermore this value is not atomic, so you have to parse it out.
If my assumptions are correct, you should try it this way:
DECLARE #YourXML XML=
N'<form>
<text id="NomeTransporteSAP" label="JOB: *" mandatory="true" multiline="true" readonly="false" visible="true">AA123EDC/NB: Cheque holding v05 TESTE PT 223427</text>
</form>';
WITH ReadFromXML AS
(
SELECT #YourXML.value(N'(/form/text/text())[1]',N'nvarchar(max)') AS TheValue --AA123EDC/NB: Cheque holding v05 TESTE PT 223427
)
SELECT LEFT(TheValue,CHARINDEX('/',TheValue)-1)
FROM ReadFromXML;
This will use a CTE to retrieve the inner text in a derived table and cut away everything starting with the / using LEFT.
The CTE-approach is not necessary, but is much better to read.
If your XML is living within a table you can use the same approach, but in this case I'd use CROSS APPLY instead of the CTE.

How do I pull a substring of various sides?

If I want to update a column by pulling out only a part of a substring of another column.
What I want to do is pull the name of the jpg from the file and for example i want imageName to be equal to great-family.jpg" a varchar string. But the image names are all different.
update tblPetTips
set imageName = "great-family.jpg"
where articleText = "<img src="/images/imgs/great-family.jpg" alt="A Great Family Dog">"
In this case I would like to say
update tblPetTips
set imageName = "yellow-smile.jpg"
where articleText = "<img src="/images/imgs/yellow-smile.jpg" alt="A Yellow Smiley Face">"
How do I (without hardcoding) update imageName fromthe articleText column.
All the directories are the same - all the images live in images/imgs.
if the source of your images is always /images/imgs/, you can use patindex to find the position of '/images/imgs/' and '" alt', then extract the text between them.
check if this works:
substring(articletext,
patindex('/images/imgs/', articletext) + length('/images/imgs/'),
patindex('" alt') - (patindex('/images/imgs/', articletext) + length('/images/imgs/')))
if your images can have any url, then it would be feasible with regular expressions, but I don't think sqlserver provides regex directly.
in that case you could write a function to extract the filename part of a url and call it in the update.
You can try to get values between the last / and " with SUBSTRING_INDEX function:
UPDATE tblPetTips
SET imageName = SUBSTRING_INDEX(SUBSTRING_INDEX(articleText, '/', -1), '"', 1);
It will only work if format of <img srs=... > html is consistent.