Is there a possibility to delete a string that starts with a ">" and ends with a newline in postgresql via sql update statement? - sql

Lets say I have a database with about 50k entries in a column called content.
This column contains strings which causes problems to my further work.
Now here is the thing I need to do it for all the rows inside of that table.
Any Ideas?
Here an example:
'user wrote:
-----------------------------------------------------
> Some text
> that vary too much and I dont need it actually
> here is end of the text
The text I actually need.'
I would like to remove all of the unnecessary part so the only thing that is left is in this case :
'The text I actually need.'

This should delete all lines that start with a >:
regexp_replace(textcol, E'^>.*\n', '', 'gn');
The g flag is needed to delete all such lines, and the n flag makes the ^ match the position right after each line break.
I use an “extended” string literal (the leading E) so that I can write a newline as \n.

Related

How to replace some characters after a specific character to another specific character in one big sql line in notepad++

I have a big sql file with thousand user something like this:
('someone1#mydomain.com','{SSHA512}JWHCqHzazH2vGneLPfhMKkoAamzvxdNCWYOlhZ+uDx36jHdoMXwQmbEemvUMn7ZG6c9+22noXjjb2hAb99/5A/slscDJPKav','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone1/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-19 13:15:58','2015-08-03 06:11:53','2020-03-19 13:15:58','9999-12-31 00:00:00',1'someone1'),
('someone2#mydomain.com','{SSHA512}UoMeyocmdC2DxM0S7B4WFdjnCNuvkngzzLus33h9nugKVlvdhlcboKmMDDuAkCHEyLBUgf8DicKWFPJVS7EOF/ytv27MQ3Ch','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone2/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2015-12-17 12:27:35','2015-08-03 06:44:10','2021-06-08 06:55:33','9999-12-31 00:00:00',1'someone2'),
('someone3#mydomain.com','{SSHA512}A6ToCf4OfP3XNEU9ngEmGN/LDquH9+s9Qxme3SoJaDyVvxiWpnwwTiAALSdnmhIxDB2VQK0zhdF+jP8ARvh0N3IDL0Xv/KmL','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone3/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2018-04-03 12:31:09','2015-08-03 06:50:01','2018-04-03 12:31:18','9999-12-31 00:00:00',1'someone3'),
('someone4#mydomain.com','{SSHA512}t7/JbUPQ+rtKeRTgWRH6KlETr2JsqYORBOZouzOzs4Wo6YfHYLoy0m+U4kZXk+AeNgMep2hGZSodPZdK2l2bn9MhOKHOuF/L','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone4/',0,'mydomain.com','','','normal',''0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-18 07:48:26','2016-11-14 06:59:04','2021-06-08 05:54:28',9999-12-31 00:00:00',1'someone4')
And now I need to delete the last word ('someone1' , 'someone2' , 'someone3' , 'someone4') for every user which adjoining to 1. It will be looks like
....9999-12-31 00:00:00',1)
not like in original
....9999-12-31 00:00:00',1'someone1')
....9999-12-31 00:00:00',1'someone2')
etc
But don't forget they are not in different lines. All this is in one big line and this makes me to ask you help. Thanks a lot.
It seems that (from your examples) the rows do not contain any parentheses except their start and end characters. So you can search for one quotation mark ', and a number of letters and/or digits, and one quotation mark ', and than ).
To do this;
Open Replace window in Notepad++ by using ctrl+h shortcut
From Search Mode section select Reqular expression
Write '[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?'\) to Find what box
Write '\) to Replace with box
Click Replace All button.
This works if user names consist of letters or digits and _, -, . at most 3 times.
Be Sure that you have a copy of original file as a backup. And also be aware of that the regular expression that we use may find unrelated parts if any row contains closing parentheses except end of it.

Getting specific rows in a Powershell variable/array

I hope I'm able to ask my question as simple as possible. I am very new to working with PowerShell.
Now to my question:
I use Invoke-Sqlcmd to run a query, which puts Data in a variable, let's say $Data.
In this case I query for triggers in an SQL Database.
Then I kind of split the array to get more specific information:
$Data2 = $Data | Where {$_.table -like 'dbo.sportswear'}
$Data3 = $Data2 | Where {$_.event -match "Delete"}
So in the end I have a variable with these Indexes(?), I'm not sure if they are called indexes.
table
trigger_name
activation
event
type
status
definition
Now all I want is to check something in the definition.
So I create a $Data4 = $Data3.definition, so far so good.
But now I have a big text and I want only the content of 2-3 specific rows.
When I used like $Data4[1] or $Data4[1..100], I realized that PowerShell sees every char as a line/row.
But when I just write $Data4 it shows me the content nice formatted with paragraphs, new lines and so on.
Has anyone an idea how I can get specific rows or lines of my variable?
Thank you all :)
It appears $Data4 is a formatted string. Since it is a single string, any indexed element lookups return single characters (of type System.Char). If you want indexes to return longer substrings, you will need to split your string into multiple strings somehow or come up with a more sophisticated search mechanism.
If we assume the rows you are after are actual lines separated by line feed and/or carriage return, you can just split on those newline characters and use indexes to access your lines:
# Array indexing starts at 0 for line 1. So [1] is line 2.
# Outputs lines 2,3,4
($Data4 -split '\r?\n')[1..3]
# Outputs lines 2,7,20
($Data4 -split '\r?\n')[1,6,19]
-split uses regex to match characters and perform a string split on all matches. It results in an array of substrings. \r matches a carriage return. \n matches a line feed. ? matches 0 or one character, which is needed in case there are no carriage returns preceding your line feeds.

Update a text column that contains a newline character '\n'

I'm encountering weird behaviour of PostgreSQL where I try to run the following query
update posts
set content = replace(content, '\n', '<br>')
where content is not null;
and it's not doing anything to the data in the database. I **tried committing manually (including trying to run this query from psql) ** as well as setting DBeaver/pgAdmin to AUTOCOMMIT but to no avail.
The result tells me 37 rows have been updated, but the changes are not there. If I try to commit it tells me 0 rows affected.
I have no triggers at all, so that's out of the question.
Am I missing something here?
Use e before the literal:
update posts
set content = replace(content, e'\n', '<br>')
where content is not null
-- or better
-- where content like e'%\n%'
From the documentation (String Constants with C-Style Escapes):
An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote.

How to read elements from a line in VHDL?

I'm trying to use VHDL to read from a file that can have different formats. I know you're supposed to use the following two lines of code to read a line at a time, the read individual elements in that line.
readline(file, aline);
read(aline, element);
However my question is what will read(aline, element) return into element? What will it return if the line is empty? What will it return if I've used it let's say 5 times and my line only has 4 characters?
The reason I want to know is that if I am reading a file with an arbitrary number of spaces between valid data, how do I parse this valid data?
The file contains ASCII characters separated by arbitrary amounts of white space (any number of spaces, tabs, or new lines). If the line starts with a # that line is a comment and should be ignored.
Outside of these comments, the first part of the file contains characters that are only letters or numbers in combinations of variable size. In other words this:
123 ABC 12ABB3
However, the majority of the file (after a certain number of read words) will be purely numbers of arbitrary length, separated by an arbitrary amount of white space. In other words, the second part of the file is this:
255 0 2245 625 430
2222 33 111111
and I must be able to parse these numbers (and interpret them as such) individually.
As mentioned in the comments, all the read procedures in std.textio and ieee.std_logic_textio skip over leading spaces apart from the character and string versions (because a space is as much a character as any other).
You can test whether a line variable (the buffer) is empty like this:
if L'length > 0 then
where L is your line variable. There is also a set of overloaded read procedures with an extra status output:
procedure read (L : inout LINE;
VALUE: out <type> ;
GOOD : out BOOLEAN);
The extra output - GOOD - is true if the read was successful and false if it wasn't. The advantage of these if that the read is unsuccessful, the simulation does not stop (as it does with the regular procedures). Also, with the versions in std.textio, if the read is unsuccessful, the read is non-destructive (ie whatever you were trying to read remains in the buffer). This is not the case with the versions in ieee.std_logic_textio, however.
If you really do not know what format you are trying to read, you could read the entire line into a string, like this:
variable S : string(1 to <some big number>);
...
readline(F, L);
assert L'length < S'length; -- make sure S is big enough
S := (others => ' '); -- make sure that the previous line is overwritten
if L'length > 0 then
read(L, S(1 to L'length);
end if;
The line L is now in the string S. You can then write some code to parse it. You may find the type attribute 'value useful. This converts a string to some type, eg
variable I : integer;
...
I := integer'value(S(12 to 14));
would set integer I to the value contained in elements 12 to 14 of string S.
Another approach, as suggested by user1155120 below, is to peek at the values in the buffer, eg
if L'length > 0 then -- check that the L isn't empty, otherwise the next line blows up
if L.all(1) = '#' then
-- the first character of the line is a '#' so the line must be a comment

how to cut part of string

How to cut part from this string...
"abb.c.d+de.ee+f.xxx+qaa.+.,,s,"
... where i know position by this:
Result is always between "." (left side of result) and "+" (right side).
I know number of "." from left side and number of "+" from right side, to delimit resulting string.
Problem is right side, cause i need to count "+" from end.
Say...
from left side: begining is at 4th "."
( this is easy ), result is =
"xxx+qaa.+.,,s,"
from right side: end is at second "+" from end!
"xxx[here]+qaa.+.,,s,"
result is =
"xxx"
I try to do this myself with .substring and .indexOf, but with no success...
Any ideas? thanks
You could use the StrReverse function to reverse the character sequence and then count + from the left (using the same method as counting the .).
To find the start of the substring, loop through the string from the left. Count the number of .s you have seen and stop when you've hit the number you want. Store the index in some variable like start.
Similarly to find the end of the substring, loop from the right and count +s.
You can solve this problem using Regex:
Dim r As New Regex("^(.*\.){4}(?<value>.*)(\+.*){2}$")
Dim m As Match = r.Match("abb.c.d+de.ee+f.xxx+qaa.+.,,s,")
Dim result As String = m.Result("${value}")
Explanation
^ Indicates the beginning of the string
(.*\.){4} This means any character (.) repeated any number of times (*) followed by a period (\.). The period has to be escaped with the backslash because otherwise the period would be the any-character wildcard. The .*\. is enclosed in (){4} to say that pattern must repeat four times.
(?<value>.*) This specifies the placeholder for the text we are after. value is the name we are assigning to it. The .* specifies that the value is any number of any characters.
(\+.*){2} This means a plus character (has to be escaped) followed by any number of any characters, repeated twice.
$ Indicates the end of the string