Powershell Code for List of Distinct Directories - sql

I have a file that contains a list such as:
tables\mytable1.sql
tables\myTable2.sql
procedures\myProc1.sql
functions\myFunction1.sql
functions\myFunction2.sql
From this data (and there will always be a path, and it will always be only one level), I want to retrieve a list of distinct paths (e.g. tables\, procedures\, functions\)
To maybe make it the file that contains this data will already have been read into a list (named $fileList), so the new list ($directoryList ??) can likely derived from it.
I've found reference to the -unique parameter, but I need to look from the start of the line, up to (and including) the '\', of which there will only be one occurrence of).

Assuming you already have the data on $fileList, try this:
$directoryList = $fileList | %{ $_.split("\")[0]} | select -unique
It will do a foreach (the %{}) on the elements of your list, and then split them by the \ and get you only the first part (in your case, the folder name), after that you use select -unique to get just the distinct values.
Alternatively, you could do it like this:
$fileList | %{ $_ -replace "\\.*$","" } | select -unique
Using -replace to remove everything after the \.
Also, if for some reason you don't have the values of your textfile on $fileList already, you can do so using:
$fileList = Get-Content yourFile.txt

Your file may contain empty lines and more often than not there's an empty line on the last one so this will account for that.
It also has a slightly different regular expression to match from the end of the string that is not a \ character which will work for paths with multiple levels including your example.
If you have a text file with the following:
Z:\Path to somewhere\Files\some file 1.txt
Z:\Path to somewhere\Files\some file 2.txt
tables\mytable1.sql
tables\myTable2.sql
procedures\myProc1.sql
functions\myFunction1.sql
functions\myFunction2.sql
With this code which also shows the output after the function:
$fileListToProcess = "$([Environment]::GetFolderPath(""Desktop""))\list.txt"
Function Get-UniqueDirectoriesFromFile {
Param
(
[Parameter(Mandatory = $true, HelpMessage = 'The file where the list of files is.')]
[string]$LiteralPath
)
if (Test-Path -LiteralPath $LiteralPath -PathType Leaf) {
$fileList = [IO.File]::ReadAllLines($LiteralPath)
return $fileList | %{ $_ -replace '\\[^\\]*$', '' } | ? { $_.trim() -ne "" } | Select -Unique
}
else {
return $null
}
}
$uniqueDirs = Get-UniqueDirectoriesFromFile -file $fileListToProcess
# Display the results:
$uniqueDirs
# PS>
# Z:\Path to somewhere\Files
# tables
# procedures
# functions
$uniqueDirs.count
# PS> 4

Related

Script to replace string of text at a end of a line

I would like to modify this script if possible:
((Get-Content -path "C:\Users\User1\OUT\Summary.txt" -Raw) -replace '</ab></cb>','</x>') | Set-Content -Path "C:\Users\User1\OUT\Summary.txt"
I would like a script that will run with Windows OS to search through one file it finds at this path:
C:\Users\User1\File\Summary.txt
And within that file, when it finds data starting with: <a><b>Data
And at the same time ending with: </ab></cb>
It would need to change the ending to: </x>
And it would need to save the file without changing the name of the file.
For instance a line showing this data:
<a><b>Data:</y> 12345678</ab></cb>
Would be changed to:
<a><b>Data:</y> 12345678</x>
The PowerShell script above will find all instances of </ab></cb> and replace it with </x>, which is not what I am hoping to accomplish.
You can use Get-Content to process the file line be line and only do the Replace when you have a Match on <a><b>. Something like this:
$InFile = ".\TestIn.txt"
$OutFile = ".\TestOut.txt"
If (Test-Path -Path $OutFile) {Remove-Item $OutFile}
Get-Content $InFile | ForEach-Object -Process {
$NewLine = $_
If ($_ -Match '<a><b>') {
$NewLine = ($_ -Replace '</ab></cb>','</x>')
}
Add-Content $OutFile $NewLine
}

Unix/Perl/Python: substitute list on big data set

I've got a mapping file of about 13491 key/value pairs which I need to use to replace the key with the value in a data set of about 500000 lines divided over 25 different files.
Example mapping:
value1,value2
Example input: field1,field2,**value1**,field4
Example output: field1,field2,**value2**,field4
Please note that the value could be in different places on the line with more than 1 occurrence.
My current approach is with AWK:
awk -F, 'NR==FNR { a[$1]=$2 ; next } { for (i in a) gsub(i, a[i]); print }' mapping.txt file1.txt > file1_mapped.txt
However, this is taking a very long time.
Is there any other way to make this faster? Could use a variety of tools (Unix, AWK, Sed, Perl, Python etc.)
Note   See the second part for a version that uses Text::CSV module to parse files
Load mappings into a hash (dictionary), then go through your files and test each field for whether there is such a key in the hash, replace with value if there is. Write each line out to a temporary file, and when done move it into a new file (or overwrite the processed file). Any tool has to do that, more or less.
With Perl, tested with a few small made-up files
use warnings;
use strict;
use feature 'say';
use File::Copy qw(move);
my $file = shift;
die "Usage: $0 mapping-file data-files\n" if not $file or not #ARGV;
my %map;
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
my ($key, $val) = map { s/^\s+|\s+$//gr } split /\s*,\s*/; # see Notes
$map{$key} = $val;
}
my $outfile = "tmp.outfile.txt.$$"; # but better use File::Temp
foreach my $file (#ARGV) {
open my $fh_out, '>', $outfile or die "Can't open $outfile: $!";
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
s/^\s+|\s+$//g; # remove leading/trailing whitespace
my #fields = split /\s*,\s*/;
exists($map{$_}) && ($_=$map{$_}) for #fields; # see Notes
say $fh_out join ',', #fields;
}
close $fh_out;
# Change to commented out line once thoroughly tested
#move($outfile, $file) or die "can't move $outfile to $file: $!";
move($outfile, 'new_'.$file) or die "can't move $outfile: $!";
}
Notes.
The check of data against mappings is written for efficiency: We must look at each field, there's no escaping that, but then we only check for the field as a key (no regex). For this all leading/trailing spaces need be stripped. Thus this code may change whitespace in output data files; in case this is important for some reason it can of course be modified to preserve original spaces.
It came up in comments that a field in data can differ in fact, by having extra quotes. Then extract the would-be key first
for (#fields) {
$_ = $map{$1} if /"?([^"]*)/ and exists $map{$1};
}
This starts the regex engine on every check, what affects efficiency. It would help to clean up that input CSV data of quotes instead, and run with the code as it is above, with no regex. This can be done by reading files using a CSV-parsing module; see comment at the end.
For Perls earlier than 5.14 replace
my ($key, $val) = map { s/^\s+|\s+$//gr } split /\s*,\s*/;
with
my ($key, $val) = map { s/^\s+|\s+$//g; $_ } split /\s*,\s*/;
since the "non-destructive" /r modifier was introduced only in v5.14
If you'd rather that your whole operation doesn't die for one bad file, replace or die ... with
or do {
# print warning for whatever failed (warn "Can't open $file: $!";)
# take care of filehandles and such if/as needed
next;
};
and make sure to (perhaps log and) review output.
This leaves room for some efficiency improvements, but nothing dramatic.
The data, with commas separating fields, may (or may not) be valid CSV. Since the question doesn't at all address this, and doesn't report problems, it is unlikely that any properties of the CSV data format are used in data files (delimiters embedded in data, protected quotes).
However, it's still a good idea to read these files using a module that honors full CSV, like Text::CSV. That also makes things easier, by taking care of extra spaces and quotes and handing us cleaned-up fields. So here's that -- the same as above, but using the module to parse files
use warnings;
use strict;
use feature 'say';
use File::Copy qw(move);
use Text::CSV;
my $file = shift;
die "Usage: $0 mapping-file data-files\n" if not $file or not #ARGV;
my $csv = Text::CSV->new ( { binary => 1, allow_whitespace => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
my %map;
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = $csv->getline($fh)) {
$map{ $line->[0] } = $line->[1]
}
my $outfile = "tmp.outfile.txt.$$"; # use File::Temp
foreach my $file (#ARGV) {
open my $fh_out, '>', $outfile or die "Can't open $outfile: $!";
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = $csv->getline($fh)) {
exists($map{$_}) && ($_=$map{$_}) for #$line;
say $fh_out join ',', #$line;
}
close $fh_out;
move($outfile, 'new_'.$file) or die "Can't move $outfile: $!";
}
Now we don't have to worry about spaces or overall quotes at all, what simplifies things a bit.
While it is difficult to reliably compare these two approaches without realistic data files, I benchmarked them for (made-up) large data files that involve "similar" processing. The code using Text::CSV for parsing runs either around the same, or (up to) 50% faster.
The constructor option allow_whitespace makes it remove extra spaces, perhaps contrary to what the name may imply, as I do by hand above. (Also see allow_loose_quotes and related options.) There is far more, see docs. The Text::CSV defaults to Text::CSV_XS, if installed.
You're doing 13,491 gsub()s on every one of your 500,000 input lines - that's almost 7 billion full-line regexp search/replaces total. So yes, that would take some time and it's almost certainly corrupting your data in ways you just haven't noticed as the result of one gsub() gets changed by the next gsub() and/or you get partial replacements!
I saw in a comment that some of your fields can be surrounded by double quotes. If those fields can't contain commas or newlines and assuming you want full string matches then this is how to write it:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
map[$1] = $2
map["\""$1"\""] = "\""$2"\""
next
}
{
for (i=1; i<=NF; i++) {
if ($i in map) {
$i = map[$i]
}
}
print
}
I tested the above on a mapping file with 13,500 entries and an input file of 500,000 lines with multiple matches on most lines in cygwin on my underpowered laptop and it completed in about 1 second:
$ wc -l mapping.txt
13500 mapping.txt
$ wc -l file500k
500000 file500k
$ time awk -f tst.awk mapping.txt file500k > /dev/null
real 0m1.138s
user 0m1.109s
sys 0m0.015s
If that doesn't do exactly what you want efficiently then please edit your question to provide a MCVE and clearer requirements, see my comment under your question.
There is some commentary below suggesting that the OP needs to handle real CSV data, whereas the question says:
Please note that the value could be in different places on the line with more than 1 occurrence.
I have taken this to mean that these are lines, not CSV data, and that a regex-based solution is required. The OP also confirmed that interpretation in a comment above.
As noted in other answers, however, it is faster to break the data into fields and simply lookup the replacement in the map.
#!/usr/bin/env perl
use strict;
use warnings;
# Load mappings.txt into a Perl
# Hash %m.
#
open my $mh, '<', './mappings.txt'
or die "open: $!";
my %m = ();
while ($mh) {
chomp;
my #f = split ',';
$m{$f[0]} = $f[1];
}
# Load files.txt into a Perl
# Array #files.
#
open my $fh, '<', './files.txt';
chomp(my #files = $fh);
# Update each file line by line,
# using a temporary file similar
# to sed -i.
#
foreach my $file (#files) {
open my $fh, '<', $file
or die "open: $!";
open my $th, '>', "$file.bak"
or die "open: $!";
while ($fh) {
foreach my $k (keys %m) {
my $v = $m[$k];
s/\Q$k/$v/g;
}
print $th;
}
rename "$file.bak", $file
or die "rename: $!";
}
I assume of course that you have your mappings in mappings.txt and file list in files.txt.
According to your comments, you have proper CSV. The following properly handles quoting and escapes when reading from the map file, when reading from a data file, and when writing to a data file.
It seems you want match entire fields. The following does this. It even supports fields that contains commas (,) and/or quotes ("). It does the comparisons using a hash lookup, which is much faster than a regex match.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
sub process {
my ($map, $in_fh, $out_fh) = #_;
while ( my $row = $csv->getline($in_fh) ) {
$csv->say($out_fh, [ map { $map->{$_} // $_ } #$row ]);
}
}
die "usage: $0 {map} [{file} [...]]\n"
if #ARGV < 1;
my $map_qfn = shift;
my %map;
{
open(my $fh, '<', $map_qfn)
or die("Can't open \"$map_qfn\": $!\n");
while ( my $row = $csv->getline($fh) ) {
$map{$row->[0]} = $row->[1];
}
}
if (#ARGV) {
for my $qfn (#ARGV) {
open(my $in_fh, '<', $qfn)
or warn("Can't open \"$qfn\": $!\n"), next;
rename($qfn, $qfn."~")
or warn("Can't rename \"$qfn\": $!\n"), next;
open(my $out_fh, '>', $qfn)
or warn("Can't create \"$qfn\": $!\n"), next;
eval { process(\%map, $in_fh, $out_fh); 1 }
or warn("Error processing \"$qfn\": $#"), next;
close($out_fh)
or warn("Error writing to \"$qfn\": $!\n"), next;
}
} else {
eval { process(\%map, \*STDIN, \*STDOUT); 1 }
or warn("Error processing: $#");
close(\*STDOUT)
or warn("Error writing to STDOUT: $!\n");
}
If you provide no files names beyond the map file, it reads from STDIN and outputs to STDOUT.
If you provide one or more file names beyond the map file, it replaces the files in-place (though it leaves a backup behind).

Using PowerShell, how can a SQL file be split into several files based on its content?

I'm trying to use PowerShell to automate the division of a SQL file into seperate files based on where the headings are located.
An example SQL file to be split is below:
/****************************************
Section 1
****************************************/
Select 1
/****************************************
Section 2
****************************************/
Select 2
/****************************************
Section 3
****************************************/
Select 3
I want the new files to be named as per the section headings in the file i.e. 'Section 1', 'Section 2' and 'Section 3'. The content of the first file should be as follows:
/****************************************
Section 1
****************************************/
Select 1
The string: /**************************************** is only used in the SQL file for the section headings and therefore can be used to identify the start of a section. The file name will always be the text on the line directly below.
You can try like this (split is here based on empty lines between sections) :
#create an index for our output files
$fileIndex = 1
#load SQLite file contents in an array
$sqlite = Get-Content "G:\input\sqlite.txt"
#for each line of the SQLite file
$sqlite | % {
if($_ -eq "") {
#if the line is empty, increment output file index to create a new file
$fileindex++
} else {
#if the line is not empty
#build output path
$outFile = "G:\output\section$fileindex.txt"
#push line to the current output file (appending to existing contents)
$_ | Out-File $outFile -Append
}
}
#load generated files in an array
$tempfiles = Get-ChildItem "G:\output"
#for each file
$tempfiles | % {
#load file contents in an array
$data = Get-Content $_.FullName
#rename file after second line contents
Rename-Item $_.FullName "$($data[1]).txt"
}
The below code uses the heading names found within the comment blocks. It also splits the SQL file into several SQL files based on the location of the comment blocks.
#load SQL file contents in an array
$SQL = Get-Content "U:\Test\FileToSplit.sql"
$OutputPath = "U:\TestOutput"
#find first section name and count number of sections
$sectioncounter = 0
$checkcounter = 0
$filenames = #()
$SQL | % {
#Add file name to array if new section was found on the previous line
If ($checkcounter -lt $sectioncounter)
{
$filenames += $_
$checkcounter = $sectioncounter
}
Else
{
If ($_.StartsWith("/*"))
{
$sectioncounter += 1
}
}
}
#return if too many sections were found
If ($sectioncounter > 50) { return "Too many sections found"}
$sectioncounter = 0
$endcommentcounter = 0
#for each line of the SQL file (Ref: sodawillow)
$SQL | % {
#if new comment block is found point to next section name, unless its the start of the first section
If ($_.StartsWith("/*") -And ($endcommentcounter -gt 0))
{
$sectioncounter += 1
}
If ($_.EndsWith("*/"))
{
$endcommentcounter += 1
}
#build output path
$tempfilename = $filenames[$sectioncounter]
$outFile = "$OutputPath\$tempfilename.sql"
#push line to the current output file (appending to existing contents)
$_ | Out-File $outFile -Append
}

PowerShell finding a file and creating a new one

The script I'm working on is producing a log file every time it runs. The problem is that when the script runs in parallel, the current log file becomes inaccessible for Out-File. This is normal because the previous script is still writing in it.
So I would like the script being able to detect, when it starts, that there is already a log file available, and if so, create a new log file name with an increased number between the brackets [<nr>].
It's very difficult to check if a file already exists, as it can have another number each time the script starts. It would be great if it could then pick up that number between the brackets and increment it with +1 for the new file name.
The code:
$Server = "UNC"
$Destination ="\\domain.net\share\target\folder 1\folder 22"
$LogFolder = "\\server\c$\my logfolder"
# Format log file name
$TempDate = (Get-Date).ToString("yyyy-MM-dd")
$TempFolderPath = $Destination -replace '\\','_'
$TempFolderPath = $TempFolderPath -replace ':',''
$TempFolderPath = $TempFolderPath -replace ' ',''
$script:LogFile = "$LogFolder\$(if($Server -ne "UNC"){"$Server - $TempFolderPath"}else{$TempFolderPath.TrimStart("__")})[0] - $TempDate.log"
$script:LogFile
# Create new log file name
$parts = $script:LogFile.Split('[]')
$script:NewLogFile = '{0}[{1}]{2}' -f $parts[0],(1 + $parts[1]),$parts[2]
$script:NewLogFile
# Desired result
# \\server\c$\my logfolder\domain.net_share_target_folder1_folder22[0] - 2014-07-30.log
# \\server\c$\my logfolder\domain.net_share_target_folder1_folder22[1] - 2014-07-30.log
#
# Usage
# "stuff" | Out-File -LiteralPath $script:LogFile -Append
As mentioned in my answer to your previous question you can auto-increment the number in the filename with something like this:
while (Test-Path -LiteralPath $script:LogFile) {
$script:LogFile = Increment-Index $script:LogFile
}
where Increment-Index implements the program logic that increments the index in the filename by one, e.g. like this:
function Increment-Index($f) {
$parts = $f.Split('[]')
'{0}[{1}]{2}' -f $parts[0],(1 + $parts[1]),$parts[2]
}
or like this:
function Increment-Index($f) {
$callback = {
$v = [int]$args[0].Groups[1].Value
$args[0] -replace $v,++$v
}
([Regex]'\[(\d+)\]').Replace($f, $callback)
}
The while loop increments the index until it produces a non-existing filename. The parameter -LiteralPath in the condition is required, because the filename contains square bracket, which would otherwise be treated as wildcard characters.

Address Variables with PowerShell and Import-CSV foreach-loop

Input file:
"Server1","lanmanserver"
"Server2","lanmanserverTest"
Program
$csvFilename = "D:\Scripts\ServerMonitorConfig.csv"
$csv = Import-Csv $csvFilename -Header #("ServerName","ServiceName")
foreach ($line in $csv) {
Write-Host "ServerName=$line.ServerName ServiceName=$line.ServiceName"
}
What I want:
Server-Name=Server1 ServiceName=lanmanserver
Server-Name=Server2 ServiceName=lanmanserverT
What I'm getting:
ServerName=#{ServerName=Server1; ServiceName=lanmanserver}.ServerName
ServiceName=#{ServerName=Server1; ServiceName=lanmanserver}.ServiceN
ame ServerName=#{ServerName=Server2;
ServiceName=lanmanserverTest}.ServerName
ServiceName=#{ServerName=Server2; ServiceName=lanmanserverTest}.
ServiceName
I really don't care if the Headers come from the first row of the CSV or not, I'm flexible there.
You usually see subexpressions or format strings used to solve that:
Subexpression:
Write-Host "ServerName=$($line.ServerName) ServiceName=$($line.ServiceName)"
Format string:
Write-Host ('ServerName={0} ServiceName={1}' -f $line.ServerName,$line.ServiceName)