Save list of txt files through referencing array [Powershell] - sql

I have a file that has some names (table_names.txt) whose contents are:
ALL_Dog
ALL_Cat
ALL_Fish
and another file that has some entries (test.txt) whose contents include the above names, like:
INSERT INTO ALL_Dog VALUES (1,2,3)
INSERT INTO ALL_Cat VALUES (2,3,4)
INSERT INTO ALL_Fish VALUES (3,4,5)
I need to write a for loop in powershell that creates, within my current directory three separate files: ALL_Dog.txt whose contents are "INSERT INTO ALL_Dog VALUES (1,2,3)", ALL_Cat.txt whose contents are "INSERT INTO ALL_Cat VALUES (2,3,4)", ALL_Fish.txt whose contents are "INSERT INTO ALL_Fish VALUES (3,4,5)"
Here's what I have so far:
[string[]]$tableNameArray = (Get-Content -Path '.\table_names.txt') | foreach {$_ + " VALUES"}
[string[]]$namingArray = (Get-Content -Path '.\table_names.txt') | foreach {$_}
For($i=0; $i -lt $tableNameArray.Length; $i++)
{Get-Content test.txt| Select-String -Pattern $tableNameArray[$i] -Encoding ASCII | Select-Object -ExpandProperty Line | Out-File -LiteralPath $namingArray[$i]}
The problem with what I currently have is that I cannot define the output files as .txt files, so my output files are just "ALL_Dog", "ALL_Cat", and "ALL_Fish".
The solution I'm looking for involves iteration through this namingArray to actually name the output files.
I feel like I'm really close to a solution and would mightily appreciate anyone's assistance or guidance to the correct result.

If I understand the question properly, you would like to get all lines from one file containing a certain table name and create a new textfile with these lines and having the table name as filename, with a .txt extension, correct?
In that case, I would do something like below:
$outputPath = 'D:\Test' # the folder where the output files should go
$inputNames = 'D:\Test\table_names.txt'
$inputCommands = 'D:\Test\test.txt'
# make sure the table names from this file do not have leading or trailing whitespaces
$table_names = Get-Content -Path $inputNames | ForEach-Object { $_.Trim() }
$sqlCommands = Get-Content -Path $inputCommands
# loop through the table names
foreach ($table in $table_names) {
# prepare the regex pattern \b (word boundary) means you are searching for a whole word
$pattern = '\b{0}\b' -f [regex]::Escape($table)
# construct the output file path and name
$outFile = Join-Path -Path $outputPath -ChildPath ('{0}.txt' -f $table)
# get the string(s) using the pattern and write the file
($sqlCommands | Select-String -Pattern $pattern).Line | Out-File -FilePath $outFile -Append
}

Related

Powershell 5 Get-content command to select specific word from text file with select-string command I get entire line

Powershell 5 Get-content command to select specific word from text file with select-string command I get entire line.
e.g I am running below command and looking to select only specific word, but output gives entire line in which test1 word exists.
PS C:> Get-Content C:\temp\testfile.txt | Select-String test1
hostname is test1, buildhistory 3 hours
and I am looking for command which willl only write test1 in output
You could try this:
Get-Content C:\temp\testfile.txt
| Select-String -Pattern '(test\d+)'
| ForEach-Object -Process {$_.Matches[0].Value}
$failures = Get-Content "C:\Users\Documents\abc.txt" | Select-String -Pattern 'Error' -Context 0, 1
this will get lines after word Error in file abc.txt

Sed/Awk: how to find and remove two lines if a pattern in the first line is being repeated; bash

I am processing text file(s) with thousands of records per file. Each record is made up of two lines: a header that starts with ">" and followed by a line with a long string of characters "-AGTCNR". The header has 10 fields separated by "|" whose first field is a unique identifier to each record e.g ">KEN096-15" and a record is termed duplicate if it has same identifier. Here is how a simple record look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
Now I am triying to delete repeats, like duplicate records of "ACRJP458-10" and "PMANL2431-12".
Using a bash script I have extracted the unique identifiers and stored repeated ones in a variable "$duplicate_headers". Currently, I am trying to find any repeated instances of their two-line records and deleting them as follows:
for i in "$#"
do
unset duplicate_headers
duplicate_headers=`grep ">" $1 | awk 'BEGIN { FS="|"}; {print $1 "\n"; }' | sort | uniq -d`
for header in `echo -e "${duplicate_headers}"`
do
sed -i "/^.*\b${header}\b.*$/,+1 2d" $i
#sed -i "s/^.*\b${header}\b.*$//,+1 2g" $i
#sed -i "/^.*\b${header}\b.*$/{$!N; s/.*//2g; }" $i
done
done
The final result (with thousands of records in mind) will look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
$ awk -F'[|]' 'NR%2{f=seen[$1]++} !f' file
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
To run it on multiple files at once would be this to remove duplicates across all files:
awk -F'[|]' 'FNR%2{f=seen[$1]++} !f' *
or this to only remove duplicates within each file:
awk -F'[|]' 'FNR==1{delete seen} FNR%2{f=seen[$1]++} !f' *

How do I find one matching strings in two txt files

I have to txt files with a bunch of data (in one case there is a string on each line that has an end of line break at the end in the other file its simply strings, with spaces but no distinct way to break apart the data). but sets of files consists of hyperlinks. This data doesn't currently exist in tables - I'm not sure of the best way to import it into appropriate columns/data fields).
all i need to do is pull the data "strings" that appear in both files.
Examples File1:
http://-28uw.c.cr http://www.2xik.com http://www.365cnblog.cn http://www.blogactif.net http://www.blogactifs.com http://www.blogalbums.com http://www.blogallerys.com ... etc.
Example File2:
http://en.wikipedia.org
http://stayyoungafter50.com
http://108.160.146.227
http://10tv.com
http://110mb.com
I don't care about www or http or https all i want to do is find the matching 10tv.com in both files. or the cnn.com, that's it.
Any suggestions?
Here's how I'd approach it (though using PowerShell rather than SQL):
clear
pushd c:\myPath\myFolder\
#read in the contents of the files
$file1 = get-content("file1.txt")
$file2 = get-content("file2.txt")
#loop through each row of the whitespace separated file
$file1 = $file1 | %{
#for each line, split on whitespace characters, returning the results back in a single column
$_ -split "\s" | %{$_}
}
#compare the two files for matching data & output this info
compare-object $file1 $file2 -IncludeEqual -ExcludeDifferent | ft -AutoSize
popd
NB: to ignore the protocol, simply remove it from the string using a similar technique to our split on spaces; i.e. a regex, this time going with replace instead of split.
clear
pushd c:\temp
$file1 = get-content("file1.txt")
$file2 = get-content("file2.txt")
$file1 = $file1 | %{
$_ -split "\s" | %{
$_ -replace ".*://(.*)",'$1'
}
}
$file2 = $file2 | %{
$_ -replace ".*://(.*)",'$1'
}
compare-object $file1 $file2 -IncludeEqual -ExcludeDifferent | ft -AutoSize
However, should you prefer a SQL solution, try this (MS SQL Server):
create table f1(url nvarchar(1024))
create table f2(url nvarchar(1024))
BULK INSERT f1
FROM 'C:\myPath\myFolder\file1.txt'
WITH ( ROWTERMINATOR =' ', FIRSTROW = 1 )
BULK INSERT f2
FROM 'C:\myPath\myFolder\file2.txt'
WITH ( FIRSTROW = 1 )
go
delete from f1 where coalesce(rtrim(url),'') = ''
delete from f2 where coalesce(rtrim(url),'') = ''
select x.url, x.x, y.y
from
(
select SUBSTRING(url,patindex('%://%',url)+3, len(url)) x
, url
from f1
) x
inner join
(
select SUBSTRING(url,patindex('%://%',url)+3, len(url)) y
, url
from f2
) y
on y.y = x.x

Passing multiple variables to Export-CSV in Powershell

Hey guys I'm having a Powershell 2.0 problem that is driving me crazy. My objective: Create a script that determines the size of the Documents folder along with the current user on the same row, but in two different fields in a csv file. I have tried the following scripts so far:
$startFolder= "C:\Users\$env:username\Documents"
$docinfo = Get-ChildItem $startFolder -recurse | Measure-Object -property length -sum |
$docinfo | Export-Csv -Path C:\MyDocSize\docsize.csv -Encoding ascii -NoTypeInformation
This script works and exports the folder size (sum), along with some columns that I don't need, "Average, Maximum, Minimum, and a property column that has "Length" as a value. Does anyone know how to just show the sum column and none of the other stuff? My main question is however, how do I pass "$env:username" into "$docinfo" and then get "$docinfo" to pass that into a CSV as an additional column and an additional value in the same row as the measurement value?
I tried this:
$startFolder= "C:\Users\$env:username\Documents"
$docinfo = Get-ChildItem $startFolder -recurse | Select-Object $env:username
$docinfo | Export-Csv -Path C:\MyDocSize\docsize.csv -Encoding ascii - NoTypeInformation
This will pass just the current username to the csv file, but without a column name, and then I can't figure out how to incorporate the measurement value with this. Also I'm not even sure why this will pass the username because if I take the "Get-ChildItem $startFolder -recurse" out it will stop working.
I've also tried this script:
$startFolder= "C:\Users\$env:username\Documents"
$docinfo = Get-ChildItem $startFolder -recurse | Measure-Object -property length -sum
New-Object -TypeName PSCustomObject -Property #{
UserName = $env:username
DocFolderSize = $docinfo
} | Export-Csv -Path C:\MyDocSize\docsize.csv -Encoding ascii -NoTypeInformation
This script will pass the username nicely with a column name of "UserName", however in the "DocFolderSize" column instead of the measurement values I get this string: Microsoft.PowerShell.Commands.GenericMeasureInfo
Not sure what to do now or how to get around this, I would be really appreciative of any help! Thanks for reading.
Give this a try"
Get-ChildItem $startFolder -Recurse | Measure-Object -property length -sum | Select Sum, #{Label="username";Expression={$env:username}}
The #{Label="username";Expression={$env:username}} will let you set a custom column header and value.
You can customize the Sum column using the same technique:
Get-ChildItem $startFolder -Recurse | Measure-Object -property length -sum | Select #{Label="FolderSize";Expression={$_.Sum}}, #{Label="username";Expression={$env:username}}
And if you want to show the folder size in MB:
Get-ChildItem $startFolder -Recurse | Measure-Object -property length -sum | Select #{Label="FolderSize";Expression={$_.Sum / 1MB}}, #{Label="username";Expression={$env:username}}

Powershell pass string to ForEach

I am trying to import a csv file and spitting out the columns in a pipe delimited format.
Import-csv file.dat | foreach {($_.'Column1')+ "|" +($_.'Column2')+ "|" +($_.'Column3')}
This works great when I am explicitly passing the column values
($_.'ColumnX' + "|" $_.'ColumnY') etc.
I want to pass a string of variables generated dynamically to the foreach component.
I am able to generate a "string" which looks exactly as
"($_.'Column1')+ "|" +($_.'Column2')+ "|" +($_.'Column3')"
however, powershell is treating that generated string as a single column and outputs the first column ONLY.
eg.
$columns = ($_.'Column1')+ "|" +($_.'Column2')+ "|" +($_.'Column3')
Import-csv file.dat | foreach {$columns}
Any advise on how to get the column string passed under the foreach section so that I get the correct output?
Does this work for you?
Import-csv file.dat | ConvertTo-Csv -Delimiter '|' -NoTypeInformation
If I understand correctly, you want to read a CSV (comma separated value) file and output it as a vertical bar "|" separated file. If this is the case, you can do the following:
gc file.dat | % {[string]::join("|", $_.split(','))}
This just reads the line, splits it into fields at each comma and joins them with vertical bars.
Following your comment on #mjolnir's answer, here is how to use dynamic field names within the foreach block:
$f1 = "ColumnA"
$f2 = "ColumnB"
import-csv file.dat | % {$_.$($f1) + "|" + $_.$($f1)}
See my answer here for explanation.