Notepad++ rows to columns, in groups - datatables

I have found a ton of ways to transpose columns to text in Notepad++ and vice versa. However, where I'm struggling is that I have one column with several rows. I can't 'just' transpose these as the data winds up being in the wrong order.
Example:
RANK
COMPANY
GROWTH
REVENUE
INDUSTRY
1
Skillz
50,058.92%
$54.2m
Software
2
EnviroSolar Power
36,065.06%
$37.4m
Energy
When I transpose this, I wind up with:
RANKCOMPANYGROWTHREVENUEINDUSTRY 1Skillz50,058.92%$54.2mSoftware2EnviroSolar Power36,065.06%$37.4mEnergy
I need everything to remain in groups so I wind up with the following, noting that I also need a delimiter added:
RANK|COMPANY|GROWTH|REVENUE|INDUSTRY
1|Skillz|50,058.92%|$54.2m|Software
2|EnviroSolar Power|36,065.06|$37.4m|Energy
As you can see with the company EnviroSolar Power, there is a space between "EnviroSolar" and "Power" and anything I've tried winds up removing the spaces that should remain in tact when transposing.
I appreciate ANY help you can offer! Thank you in advance!

Assuming that your rows always start with integers (except for the header row of course) and furthermore, that only the first column contains integers you could do do that with two search replace (Ctrl+H).
Be sure to opt for 'Regular expression' search mode.
First replace all newlines with pipes. This will put everything on one line for now.
Find what: \n
Replace with: |
Next find all pure numeric fields and make them start of a line to reach the desired result.
Find what: \|([0-9]+)\|
Replace with: \n$1|

If you know the number of columns, in fact here it is 5, you could do in two steps:
First:
Ctrl+H
Find what: (?:[^\r\n]+\R){5}
Replace with: $0\n
Replace all
Explanation:
(?: : start non capture group
[^\r\n]+ : 1 or more any character but line break
\R : any kind of line break
){5} : group must occurs 5 times,
here you can give the columns number of your choice
This will add a linebreak after 5 columns.
Check regular expression
Second:
Ctrl+H
Find what: (\R)(?!\R)|(\R\R)
Replace with: (?1|:\n)
Replace all
Explanation:
(\R) : any kind of line break, in group 1
(?!\R) : negative lookahead, make sure we have not another linebreak after
| : OR
(\R\R) : 2 line break, in group 2
Replacement:
(?1 : conditional replacement, is group 1 existing
| : yes ==> a pipe
:\n : no ==> linebreak
) : end condition
This will replace a single linebreak by a pipe and 2 consecutive linebreaks by a single one
Result for given example:
RANK|COMPANY|GROWTH|REVENUE|INDUSTRY
1|Skillz|50,058.92%|$54.2m|Software
2|EnviroSolar Power|36,065.06%|$37.4m|Energy

Related

Imapala Regex - find specific sequence of characters, with delimiters between them, some are not letters, digits or underscore

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC* followed by 11 more * that could have letters/digits between (but could not, they are basically delimiters in this string field). After the 12th * (if you count #1 in ~FC*) it should be immediately followed by Y~.
since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.
This is my SQL so far:
select
regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
data returned:
pattern_found
--------------
~FC*
(~FC\\*) in Impala SQL it returns ~FC* which is great (got it from my other question)
Been trying this (~FC\\*).*(\\*Y~) which obviously isnt counting the number of asterisks but its is also not picking the Y up.
This is a test string, it has 2 occurrences:
N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
results should be these 2, which has an overlapping ~ between them. but will settle for at least the first being found if both cannot.
~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~
~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
figured out a solution but happy to learn of a better way to accomplish this
This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:
(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)
Full SQL:
select
regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
and here is the RegexDemo without the additional syntax needed for Impala SQL

Blank Space in every row of table SQL

Hello i have a table with rows
and i was doing a simple
select from table where column ='string'
and it gives me back no result, but when i use:
select from table where column ='%string%'
it gives me the row that exist in my table,
then i did a select * from table and noticed that there is a blank space before my rows:
Image of my SQL result
If you look closely theres a space at the beginning of the second row, and only in the first row theres no blank space.
so i thought it was a simple white space at the beggining but when i tried using this:
SELECT LTRIM(RTRIM(MATERIAL)) FROM table
nothing happened.
then i tried to copy the result of my
select * from table
to Excel and noticed this:
Excel paste from SQL
my 2nd row got splitted in 2 rows right at the start of the column 'material', so the thing i thught it was a blank space its something like a jump line.
i have never had this problem before or seen this before.
Larnu has commented how to remove all the linebreaks from the data. Here are some other things that could also work, and slightly differently depending on the effect you want:
--trim everything that is not a number or letter off the left hand side only
UPDATE table SET material = SUBSTRING(material, PATINDEX(material, '[0-9a-z]', 99999)
--convert all linebreaks to spaces and trim off the left and right spaces
UPDATE table SET material = RTRIM(LTRIM(REPLACE(material, CHAR(10), ' ')))
Larnu's SQL isn't wrong, it'll just remove every line break anywhere, which may cause more formatting disruption than is wanted. I'd be tempted to replace all the linebreaks with spaces, as two words that are separated by a line break would remain separated by a space rather than become one word if the space was removed
some
word
-> some word (if you replace linebreak with space)
-> someword (if you replace linebreak with nothing)
If all you want is to remove linebreaks from the left side of the field, the patindex method will search the field for the first occurrence of a numbe rof a letter, and return the index, then substring will cut everything from that index for a length of 99999 (use a bigger number if your field is longer). This has the effect of removing only linebreaks at the start of the field
As to how it happened, whoever inserted the data, or the data import program, made some mistakes when it was cutting up the data. Perhaps it was a Windows style text file, whose line endings are CR LF (ascci 13 followed by 10), and the program that did the import decided to cut the file up based on the 13 only, leaving behind the 10 to become "part of" the material field:
this,is,my,data1<13><10>this,is,my,data2<13><10>
//now lets cut it up into 2 records, based on using <13> only to denote the end of line:
record 1= this,is,my,data1
record 2= <10>this,is,my,data2
The program just sees a stream of bytes, it is we humans that interpret "lines". If the program treats 13 as the separator, then all the 10s get left behind as part of the data that gets inserted. The very first record in the file won't have 13/10 (crlf) before it because it's the first line, so one of your rows (the one with ascii (49)) won't suffer this problem
You could "cure" the bad data with a trigger upon insert:
CREATE TRIGGER prevent_bad_data
ON yourtable
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO yourtable(somecolumn,othercolumn,material)
SELECT foo,
bar,
LTRIM(REPLACE(material, CHAR(10), ' '))
FROM Inserted
END
Or you could program the db to reject bad rows and fix the tool that is inserting the bad data:
ALTER TABLE yourtable
ADD CONSTRAINT prevent_bad_material
CHECK material LIKE '[0-9a-z]%'; --check it starts with a number or letter
Edit: though having seen your updated question with screenshots, the material column really should be a number, not a varchar type, then this wouldn't happen

Delete specific pattern between commas in text file

I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance
You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.

SQL - need help in parsing text of a field

I have a select query and it fetches a field with complex data. I need to parse that data in specified format. please help with your expertise:
selected string = complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I
expected output - PB|I
Please help me in writing a sql regular expression to accomplish this output.
The first step in figuring out the regular expression is to be able to describe it plain language. Based on what we know (and as others have said, more info is really needed) from your post, some assumptions have to be made.
I'd take a stab at it by describing it like this, which is based on the sample data you provided: I want the sets of one or more characters that follow the equal signs but not including the following space or end of the line. The output should be these sets of characters, separated by a pipe, in the order they are encountered in the string when reading from left to right. My assumptions are based on your test data: only 2 equal signs exist in the string and the last data element is not followed by a space but by the end of the line. A regular expression can be built using that info, but you also need to consider other facts which would change the regex.
Could there be more than 2 equal signs?
Could there be an empty data element after the equal sign?
Could the data set after the equal sign contain one or more spaces?
All these affect how the regex needs to be designed. All that said, and based on the data provided and the assumptions as stated, next I would build a regex that describes the string (really translating from the plain language to the regex language), grouping around the data sets we want to preserve, then replace the string with those data sets separated by a pipe.
SQL> with tbl(str) as (
2 select 'complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I' from dual
3 )
4 select regexp_replace(str, '^.*=([^ ]+).*=([^ ]+)$', '\1|\2') result from tbl;
RESU
----
PB|I
The match regex explained:
^ Match the beginning of the line
. followed by any character
* followed by 0 or more 'any characters' (refers to the previous character class)
= followed by an equal sign
( start remembered group 1
[^ ]+ which is a set of one or more characters that are not a space
) end remembered group one
.*= followed by any number of any characters but ending in an equal sign
([^ ]+) followed by the second remembered group of non-space characters
$ followed by the end of the line
The replace string explained:
\1 The first remembered group
| a pipe character
\2 the second remember group
Keep in mind this answer is for your exact sample data as shown, and may not work in all cases. You need to analyse the data you will be working with. At any rate, these steps should get you started on breaking down the problem when faced with a challenging regex. The important thing is to consider all types of data and patterns (or NULLs) that could be present and allow for all cases in the regex so you return accurate data.
Edit: Check this out, it parses all the values right after the equal signs and allows for nulls:
SQL> with tbl(str) as (
2 select 'a=zz|complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I - testing|test1=|test2=test2 - testing' from dual
3 )
4 select regexp_substr(str, '=([^ |]*)( |||$)', 1, level, null, 1) output, level
5 from tbl
6 connect by level <= regexp_count(str, '=')
7 ORDER BY level;
OUTPUT LEVEL
-------------------- ----------
zz 1
PB 2
I 3
4
test2 5
SQL>