parse text file and remove white space - sql

I have a file written from a Cobalt program that produces a pipe delimeted file. The file contains white space and "null" spaces that I need to get rid of and then rewrite the file. What is the best way to do that? I have sql server and visual studio that can be used to write the script, but not sure the best one to use or exactly how. The script will need to read through many different files in a folder. The data is being converted from an old system into a new one. Also, I would need to keep spaces between words, ie a business name or an address. I was going to use sql, but can only find examples reading fields in a database.
Example file (one line):
0000000009|LName |FName | | | | | | |1|1|0|000|000|000000000|
1||null null null| | null null|null null null null| |1|0|
Desired output:
0000000009|LName|Fname|||||||1|1|0|000|000|000000000|1||||||1|0|
Thanks!!

You said you can use visual studio, so this example uses c#.
I suppose you will load your file content into a string, then you can apply some replaces:
s.Replace("null", string.Empty).Replace(" |", "|").Replace("| ", "|").Replace("| |", "||");
I know there are probably a lot of much more elegant solution: this is quick and dirty but it will output the string you need.
Hope this helps.

Related

Formatting code in PhpStorm does not work as I need it to work

How to use automatic line wrap and code formatting together in PhpStorm? My PhpStorm version is 2019.3.1
My SQL statements and my Strings look like the pic below.
What do I have to do, that the automatic wrapped lines do NOT jump to column 1 in the editor?
What do I have to do, that the automatic wrapped lines do NOT jump to column 1 in the editor?
Configure it accordingly (affects soft wrapped lines only, obviously): Settings/Preferences | Editor | General | Soft Wraps | Use original line's indent for wrapped parts

IntelliJ IDEA: is there a way to format java code in a canonical way?

I have two java source files. Both represent the same class (semantically), but they were formatted differently.
For example, one of them contains the following line:
return Boolean.valueOf(Boolean.getBoolean("abc"));
While in the second file it looks like 2 lines:
return Boolean.valueOf(
Boolean.getBoolean("abc"));
In both cases, when I apply formatting (Ctrl+Alt+L), these lines do not change.
Is it possible to format them in some canonical way: that is, to get the same code if the only difference is formatting?
Equivalently: is there any way to remove all ignorable whitespace? Such a 'dried-out' program would then be easily restored using 'Reformat code'.
You should be able to do this if you turn off "Wrapping and Braces | Keep when reformatting | Line Breaks" in the Java code style settings.
Go to File > Settings > Ediotor > Code Style > java
in tab Wrapping and Braces uncheck line breaks
Apply and make (Ctrl+Alt+L) again.

How to forward logs with Splunk Forwarder for the files with no header and logs should be in form of key/Value

I have a splunk forwarder setup already on my host.
I have certain files in the folder (/tom/mike/). File names are starting with Back*.
The content of file may in one or multiple line. There are multiple fixed position values separated with some spaces in each line with no header.
Content (Example: Consider "-" as one space)
Tom---516-----RTYUI------45678
Mik---345-----XYXFF------56789
I need splunk logs for each line.
like:
Key1= Tom Key2=516 Key3= RTYUI Key4= 45678
Key1= Mike Key2= 345 Key3= XYXFF Key4= 56789
I know inputs.conf changes would be like below:
[monitor:///tom/mike/Back*]
index=myIndex
blacklist=\.(gz|zip|bkz|arch|etc)$
sourcetype = BackFileData
Please suggest changes which can be done in props.conf. Please keep in mind that delimiter is fixed for each value in line but its not same (like 2 spaces) for all column values. There are no headers as well in these files.
You can use kvdelims if you want a search-time extraction or you can make a transforms.conf rule and apply it in props.conf and it will extract at index time
Here's a good article covering all those scenarios
https://www.splunk.com/blog/2008/02/12/delimiter-based-key-value-pair-extraction.html

Inconsistent line endings in SSIS Flat File import

I have a large, pipe delineated text file with no text qualifiers, and it looks like whatever spit out this file accidentally spit out false "LF" markers in the last column every few hundred rows.
The last column is a descriptive column, and It is not text qualified in any way like it should be.
file looks similar to this:
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Descr[LF]
iption[LF]
id|data|data|data|data|Description[LF]
Id|data|data|data|data|Description[LF]
id|data|data|data|data|Descripti[LF]
on[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|D[LF]
escription[LF]
I'm pretty new to SSIS and SQL in general, Does anyone have any advice on how to fix this?
I did actually find a way to fix it in Notepad++, because I don't know C# and I don't know SSIS well enough..
The ID was 8 Digits long, and followed by 7 Blank spaces. That was absolutely unique to this file.
In notepad++ I used (Find Extended) to search and replace "\n"(LF) with nothing
then I used the this expression for find:
(\d\d\d\d\d\d\d\d[[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]])
to find all 8 digit numbers with 7 trailing spaces, and for replace, used this:
\r\n\1
to put a [CR][LF] in front of those 8 digit numbers.
Lo and behold it worked!
But either way.. My boss contacted the client and is requesting a better file. Now I get kudos, and we get proper data. Thanks for the advice all!
If I had to take a guess, I would say that this is occurring because of how the file is created... you are probably having data that just happens to include certain special characters which are being incorrectly interpreted as a Line Feed.
Check this site to see if the data within your problem lines match any of these encodings. If this is the case then ultimately you have two options available:
1) Create some elaborate and complicated ETL process to detect and correct the file data before you process it. This is inadvisable as it will be a major pain to create and maintain.
2) Try changing the way this file is produced. Most text export wizards will allow you to place quotes (") around text items so that your import process can quickly detect something as a text block as opposed to a series of encoded characters to interpret.

How can I set the line limit length for different types of files?

I have the following gray line that limits the number of characters a single line can accomodate:
I'd like to change the line length (this fill probably affect where this line is currently shown right now) for different files: for js, php files etc one line length, for HTML template files another. Is it possible? If not, how can I change at least for all file types?
This is now possible as the bug linked by #LazyOne has been resolved. You can now set different right margins for different programming languages.
The setting is under Settings | Editor | Code Style | <Your language> on the Wrapping and Braces tab at the very top of the list.
Default value can be set at
Settings/Preferences | Editor | Code Style --> Right margin (columns)
Since end of 2014 (after IDEA-59662 ticket was implemented) most languages have an option to adjust that value in language-dedicated section. For example: PHP
For unsupported languages .. or those that do not have such option -- you may try going with .editorconfig file and EditorConfig Support plugin -- should work.
For those who are from 2018:
Settings/Preferences | Editor | Code Style | Hard Wrap At
Or Settings/Preferences | Editor | Code Style | Your Technology(ex. PHP) | Hard wrap at