Finding table names in SSIS .dtsx packages - sql

I am trying to scan SSIS .dtsx packages for table names. Yes, I know that I should use [xml] and a tool that parses SQL language. That does not seem to be possibe at this time. PowerShell can understand [xml], but SQL parsers generally cost++ and using ANTLR is more of an investment than is acceptable at this time. I am open to suggestions, but I am not asking for a tool recommendation.
There are two (2) problems.
1) `&.;` does not appear to be recognized as separate from the table name capture item
2) TABLE5 does not appear to be found
Yes, I also know that schema names should not be hardcoded into source. It makes it difficult/impossible for DBAs to manage the database. That is the way it is done here.
How can I make the regex omit the &.*; from the capture and recognize dbo.TABLE5
Here is the code I am using to scan the .dtsx files.
PS C:\src\sql> Get-Content .\Find-FromJoinSql.ps1
Get-ChildItem -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern '(FROM|JOIN)(\s|&.*;)+(\S+)(\s|&.*;)+' -Path $_ -AllMatches |
ForEach-Object {
if ($_.Matches.Groups.captures[3].value -match 'dbo') {
"$Filename === $($_.Matches.Groups.captures[3].value)"
}
}
}
Here is a tiny sample of the type of text from the .dtsx file.
PS C:\src\sql> Get-Content .\sls_test.dtsx
USE ADATABASE;
SELECT * FROM dbo.TABLE1 WHERE F1 = 3;
SELECT * FROM dbo.TABLE2 T2
FULL OUTER JOIN dbo.TABLEJ TJ
ON T2.KEY = TJ.KEY;
SELECT * FROM dbo.TABLE3 T3
INNER JOIN ADATABASE2.dbo.TABLEK
TK ON
T3.user_id = TK.user_id
SELECT * FROM dbo.TABLE4 T4 FULL OUTER JOIN dbo.TABLE5 T5
ON T4.F1 = T5.F1;
EXIT
Running the script on this data produces:
PS C:\src\sql> .\Find-FromJoinSql.ps1
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
TK
sls_test.dtsx === dbo.TABLE4
PS C:\src\sql> $PSVersionTable.PSVersion.ToString()
7.1.5

Indeed strange that some entities (
) are not replaced in those files.
Change the regex pattern a bit to capture the dbo.table names like below.
Using Get-Content
$regex = [regex] '(?im)(?:FROM|JOIN)(?:\s|&[^;]+;)+([^\s&]+)(?:\s|&[^;]+;)*'
Get-ChildItem -Path D:\Test -File -Filter '*.dtsx' |
ForEach-Object {
$match = $regex.Match((Get-Content -Path $_.FullName -Raw))
while ($match.Success) {
"$($_.Name) === $($match.Groups[1].Value)"
$match = $match.NextMatch()
}
}
Using Select-String
As to why Select-String -AllMatches skipped your Table5.
From the docs: "When Select-String finds more than one match in a line of text, it still emits only one MatchInfo object for the line, but the Matches property of the object contains all the matches."
That means you need another loop to get all the $Matches from each $MatchInfo objects to get them in your output:
$pattern = '(?:FROM|JOIN)(?:\s|&[^;]+;)+([^\s&]+)(?:\s|&[^;]+;)*'
Get-ChildItem -Path 'D:\Test' -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern $pattern -Path $_.FullName -AllMatches |
ForEach-Object {
# loop again, because each $MatchInfo object may contain multiple
# $Matches objects if more matches were found in the same line
foreach ($match in $_.Matches) {
if ($match.Groups[1].value -match 'dbo') {
"$Filename === $($match.Groups[1].value)"
}
}
}
}
Output:
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
sls_test.dtsx === dbo.TABLE4
sls_test.dtsx === dbo.TABLE5
Regex details:
(?im) Use case-insensitive matching and have '^' and '$' match at linebreaks
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
FROM Match the characters “FROM” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
JOIN Match the characters “JOIN” literally
)
(?: Match the regular expression below
| Match either the regular expression below (attempting the next alternative only if this one fails)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
& Match the character “&” literally
[^;] Match any character that is NOT a “;”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
; Match the character “;” literally
)+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 1
[^\s&] Match a single character NOT present in the list below
A whitespace character (spaces, tabs, line breaks, etc.)
The character “&”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
| Match either the regular expression below (attempting the next alternative only if this one fails)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
& Match the character “&” literally
[^;] Match any character that is NOT a “;”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
; Match the character “;” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Related

Using WHERE with multiple columns with different data types to satisfy a single input in bash and postgressql

please assist with the following. i m trying to run a script that accepts one argument $1. The argument can either be a string or character or an integer. I want to use the argument in there where clause to search for the element in the database.
This is the table i want to search from:enter image description here
When i use the multiple conditions with OR , it works only when either the argument is a number or text.
This what my code looks like enter image description here
`
ELEMENT=$($PSQL "SELECT * FROM elements e FULL JOIN properties p USING(atomic_number) WHERE symbol = '$1' OR name = '$1' OR atomic_number = $1;")
`
and this is the results i get when i run with different aurgumentsenter image description here
Please help.
Thank you in advance
This will always fail on any non-numeric argument.
You are passing in H for hydrogen, but taking whatever was passed in and using it in the atomic_number comparison as an unquoted number, which the DB engine is trying to figure out what to do with. H isn't a number, and isn't a quoted string, so it must be the name of a column...but it isn't, so you are using invalid syntax.
I don't have a postgres available right now, but try something like this -
ELEMENT=$( $PSQL "
SELECT *
FROM elements e
FULL JOIN properties p USING(atomic_number)
WHERE symbol = '$1'
OR name = '$1'
OR atomic_number = CAST(`$1` as INTEGER); " )
Also, as an aside... avoid all-capital variable names.
As a convention, those are supposed to be system vars.
And please - please don't embed images except as helpful clarification.
Never rely on them to provide info if it can be avoided. Copy/paste actual formatted text people can copy/paste in their own testing.
An alternate way to construct the query: requires bash
looks_like_a_number() {
# only contains digits
[[ "$1" == +([[:digit:]]) ]]
}
sanitize() {
# at a minimum, handle embedded single quotes
printf '%s' "${1//\'/\'\'}"
}
if looks_like_a_number "$1"; then
field="atomic_number"
value=$1
elif [[ ${#1} -eq 1 ]]; then
field="symbol"
printf -v value "'%s'" "$(sanitize "$1")"
else
field="name"
printf -v value "'%s'" "$(sanitize "$1")"
fi
q="SELECT *
FROM elements e
FULL JOIN properties p USING(atomic_number)
WHERE $field = $value;"
printf '%s\n' "$q"
result=$("$PSQL" "$q")

Find a character in a string using Powershell?

I know I could use Contains to find it but it doesn't work.
Full Story:
I have to get the PartNo, Ver, Rev from SQl db and check if they occur in the first line of the text file. I get the first line of the file and store it in $EiaContent.
The PartNo is associated with MAFN as in $partNo=Select PartNo Where MAFN=xxx. Most of the time MAFN returns one PartNo. But in some cases for one MAFN there could be multiple PartNo. So the query returns multiple PartNo(PartNo_1,PartNo_2,PartNo_3,and PartNo_4) but only one of these will be in the text file.
The issue is that each of these PartNo. is treated as a single character in PowerShell. $partNo.Length is 4. Therefore, my check If ($EiaContent.Contains("*$partNo*")) fails and it shouldn't in this case because I can see that one of the PartNo is mentioned in the file. Also, Contains wouldn't work if there was one PartNo. I use like as in If ($EiaContent -like "*$partNo*") to match the PartNo and it worked but it doesn't work when there are multiple PartNo.
Data type of $partNo is string and so is $EiaContent. The data type of PartNo. in SQL is varchar(50) collation is COLLATE SQL_Latin1_General_CP1_CI_AS
I am using PowerShell Core 7.2 and SQL 2005
Code:
$EiaContent = (Get-Content $aidLibPathFolder\$folderName\$fileName -TotalCount 1)
Write-host $EiaContent
#Sql query to get the Part Number
$partNoQuery = "SELECT PartNo FROM [NML_Sidney].[dbo].[vMADL_EngParts] Where MAFN = $firstPartTrimmed"
$partNoSql = Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $partNoQuery
#Eliminate trailing spaces
$partNo = $partNoSql.PartNo.Trim()
If ($EiaContent.Contains("*$partNo*")) {
Write-Host "Part Matches"
}
Else {
#Send an email stating the PartNo discrepancy
}
Thank you in advance to those who try to help.
EDIT
Screenshot
[1]: https://i.stack.imgur.com/hIqJB.png
A1023 A1023MD C0400 C0400MD is the output of the variable $partNo and O40033( C0400 REV N VER 004, 37 DIA 4.5 BRAKE DRUM OP3 ) is the output of the variable $EiaContent
So the query returns multiple PartNo(PartNo_1,PartNo_2,PartNo_3,and PartNo_4) but only one of these will be in the text file.
A1023 A1023MD C0400 C0400MD is the output of the variable $partNo and O40033( C0400 REV N VER 004, 37 DIA 4.5 BRAKE DRUM OP3 ) is the output of the variable $EiaContent
So you first have to split $partNo and then for each sub string of $partNo, search for it in $EiaContent:
If ($partNo -split ' ' | Where-Object { $EiaContent.Contains( $_ ) }) {
Write-Host "Part Matches"
}
This is the generic form that most people are used to. We can simplify the query using the unary form of -split (as we split on the default separator) and use the intrinsic array method .Where() which is faster as it does not involve pipeline overhead.
If ((-split $partNo).Where{ $EiaContent.Contains( $_ ) }) {
Write-Host "Part Matches"
}
As correctly noted in comments, wildcards are not supported by the .Contains() string method.
Wildcards are supported only by the PowerShell -like operator. The following example is just for educational purposes, I wouldn't use it in your case as .Contains() string method is simpler and faster.
If ((-split $partNo).Where{ $EiaContent -like "*$EiaContent*" }) {
Write-Host "Part Matches"
}
Note that -contains would not be suitable here. A common misconception is that -contains does a substring search, when the LHS operand is a string. It doesn't! The operator tests whether a collection (such as an array) on the LHS contains the value given on the RHS.

Raku vs. Perl5, an unexpected result

When I run this script in Raku I get the letter A with several newlines.
Why do I not get the concatenated strings as expected (and as Perl5 does) ?
EDIT BTW can I compile in commaIDE the file with the Perl5 compiler, where can I change this compiler to be that of Perl5 ?
say "A";
my $string1 = "00aabb";
my $string2 = "02babe";
say join ("A", $string1, $string2);
print "\n";
my #strings = ($string1, $string2);
say join ("A", #strings);
print "\n";
The solution I suggest is to drop the parens:
say "A";
my $string1 = "00aabb";
my $string2 = "02babe";
say join "A", $string1, $string2; # Pass THREE arguments to `join`, not ONE
print "\n";
my #strings = $string1, $string2;
say join "A", #strings;
print "\n";
(The elements in #strings in the second call to join are flattened, thus acting the same way as the first call to join.)
The above code displays:
A
00aabbA02babe
00aabbA02babe
When I run this script in Raku I get the letter A with several newlines.
Your code calls join with ONE argument, which is the joiner, and ZERO strings to concatenate. So the join call generates null strings. Hence you get blank lines.
Why do I not get the concatenated strings
The two say join... statements in your code print nothing but a newline because they're like the third and fourth say lines below:
say join( " \o/ ", "one?", "two?" ); # one? \o/ two?␤
say join " \o/ ", "one?", "two?" ; # one? \o/ two?␤
say join ( " \o/ ", "one?", "two?" ); # ␤
say join( " \o/ one? two?" ); # ␤
The first and second lines above pass three strings to join. The third passes a single List which then gets coerced to become a single string (a concatenation of the elements of the List joined using a single space character), i.e. the same result as the fourth line.
The my #strings = ($string1, $string2); incantation happens to work as you intended, because an assignment to a "plural" variable on the left hand side of the = will iterate the value, or list of values, on the right hand side.
But it's a good habit in Raku to err on the side of avoiding redundant code, in this instance only using parens if you really have to use them to express something different from code without them. This is a general principle in Raku that makes code high signal, low noise. For your code, all the parens are redundant.
The problem is the whitespace after the function name. If you leave a space there, raku will treat the expression in paranthesis as a List which is single thing. Here join will think you want to use the list as a joiner.
Observe:
>raku -e "sub foo( $arg ) { say $arg.WHAT }; foo( 'abc', )"
(Str)
>raku -e "sub foo( $arg ) { say $arg.WHAT }; foo ( 'abc', )"
(List)
So in short, if you want to use paranthesis to call a sub, don't put a whitespace between the sub name and the opening paren.
see Traps to avoid,
try:
my #s=("00aabb", "02babe");
say join "A", #s;
say join |("A",#s);
say join("A",#s);
say #s.join: "A";
#or some crazy
say join\ ("A",#s);
say "A".&join: #s;
say join #s: "A";
say "A" [&join] #s;
say #s R[&join] "A";
#…
A: "do not put whitespace between a function name and the opening paren '('" ♎️KISS

Xargs, sqlplus and quote nightmare?

I have one big file containing data, for example :
123;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456;test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
In reality, there is much more column but I simplified here.
I want to treat each line, and do some sqlplus treatment with them.
Let say that I have one table, with two column, with this :
ID | CONTENT
123 | test/x/COD_ACT_333/descr="Test 1"
456 | test/x/COD_ACT_444/descr="Test 2"
Let say I want to update the two lines content value to have that :
ID | CONTENT
123 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
456 | test/x/COD_ACT_008510/descr="R08-Ballon d''eau"
I have a lot of data and complex request to execute in reality, so I have to use sqlplus, not tools like sqlloader.
So, I treat the input file on 5 multi thread, one line at each time, and define "\n" like separator to evict quote conflict :
cat input_file.txt | xargs -n 1 -P 5 -d '\n' ./my_script.sh &
In "my_script.sh" I have :
#!/bin/bash
line="$1"
sim_id=$(echo "$line" | cut -d';' -f1)
content=$(echo "$line" | cut -d';' -f2)
sqlplus -s $DBUSER/$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA #updateRequest.sql "$id" "'"$content"'"
And in the updateRequest.sql file (just containing a test) :
set heading off
set feed off
set pages 0
set verify off
update T_TABLE SET CONTENT = '&2' where ID = '&1';
commit;
And in result, I have :
01740: missing double quote in identifier
If I put “verify” parameter to on in the sql script, I can see :
old 1: select '&2' from dual
new 1: select 'test/BVAL/COD_ACT_008510/descr="R08-Ballon d'eau"' from dual
It seems like one of the two single quotes (used for escape the second quote) is missing...
I tried everything, but each time I have an error with quote or double quote, either of bash side, or sql side... it's endless :/
I need the double quote for the "descr" part, and I need to process the apostrophe (quote) in content.
For info, the input file is generated automatically, but I can modify his format.
With GNU Parallel it looks like this:
dburl=oracle://$DBUSER:$DBPASSWORD#$DBHOST:$DBPORT/$DBSCHEMA
cat big |
parallel -j5 -v --colsep ';' -q sql $dburl "update T_TABLE SET CONTENT = '{=2 s/'/''/g=}' where ID = '{1}'; commit;"
But only if you do not have ; in the values. So given this input it will do the wrong thing:
456;test/x/COD_ACT_008510/descr="semicolon;in;value"

How to remove space from an array (powershell)

I have a script that reads TAB seperated .TXT file and grabs information from a table and then it creates a .SQL script based off of the names in the list. But every time a $Variable[#] is called it adds an extra space.
This space does not exist in the source data. I am looking for a method of trimming it.
$start | Out-File -filepath $target1 -append
$infile = $source1
$reader = [System.IO.File]::OpenText($infile)
$writer = New-Object System.IO.StreamWriter $file1;
$counter = 1
try {
while (($line = $reader.ReadLine()) -ne $null)
{
$myarray=$line -split "\t" | foreach {$_.Trim()}
If ($myarray[1] -eq "") {$myarray[1]=”~”}
If ($myarray[2] -eq "") {$myarray[2]=”~”}
If ($myarray[3] -eq "") {$myarray[3]=”~”}
If ($myarray[4] -eq "") {$myarray[4]=”~”}
If ($myarray[5] -eq "") {$myarray[5]=”~”}
if ($myarray[0] -Match "\d{1,4}\.\d{1,3}"){
"go
Insert into #mytable Select convert(varchar(60),replace('OSFI Name: "+$myarray[1],$myarray[2],$myarray[3],$myarray[4],$myarray[5],"')), no_,branch,name,surname,midname,usual,bname2
from cust where cust.surname in ('"+$myarray[2].,"',"+$myarray[1],"',"+$myarray[3],"',"+$myarray[4],"',"+$myarray[5],"')' and ( name in ('"+$myarray[1],"','"+$myarray[2],"','"+$myarray[3],"','"+$myarray[4],"','"+$myarray[5],"') or
midname in ('"+$myarray[1],"','"+$myarray[2],"','"+$myarray[3],"','"+$myarray[4],"','"+$myarray[5],"') or
usualy in ('"+$myarray[1],"','"+$myarray[2],"','"+$myarray[3],"','"+$myarray[4],"','"+$myarray[5],"') or
bname2 in ('"+$myarray[1],"','"+$myarray[2],"','"+$myarray[3],"','"+$myarray[4],"','"+$myarray[5],"') )
" -join "," | foreach {$_.Trim()} | Out-File -filepath $target1 -append
}
#$writer.WriteLine($original);
#Write-Output $original;
#Write-Output $newlin
}
}
finally {
$reader.Close()
$writer.Close()
}
$end | Out-File -filepath $target1 -append
Every time it calls $myarray[1] or any other number it adds a space. This is not good as this will create a duplicate entry for every name it pulls in my DB.
We have an existing ".Java" script that does what I am trying to achieve so I know what my output should look like.
The output I should be getting looks like:
go
Insert into #mytable Select convert(varchar(60),replace('OSFI Name: Fake Name Faker unreal ','''''','''')), no_,branch,name,surname,midname,usual,bname2
from cust where cust.surname in ('Faker unreal','Fake Name','~','~','~') and ( name in ('Fake Name', 'Faker unreal', '~', '~', '~') or
midname in ('Fake Name', 'Faker unreal', '~', '~', '~') or
usual in ('Fake Name', 'Faker unreal', '~', '~', '~') or
bname2 in ('Fake Name', 'Faker unreal', '~', '~', '~') )
But instead I am getting
go
Insert into #mytable Select convert(varchar(60),replace('OSFI Name: Fake Name Fake Faker unreal ~ ~ ')), no_,branch,name,surname,midname,usual,bname2
from cust where cust.surname in ('Fake Name ',unreal ',~ ',~ ')' and ( name in ('Fake Name ','Fake Faker ','unreal ','~ ','~ ') or
midname in ('Fake Name ','Fake Faker ','unreal ','~ ','~ ') or
usualy in ('Fake Name ','Fake Faker ','unreal ','~ ','~ ') or
bname2 in ('Fake Name ','Fake Faker ','unreal ','~ ','~ ') )
The use of commas in your string building are contributing to this issue. I see that you are using a -join at the end of that string as well. This could explain why you were using commas as -join is an array operator. In essence you are muddying the waters in how you are building your string by combining that with basic string concatenation. A simple example that shows the issue (minor refactor of your code to accomplish the goal)
'"+$myarray[2].,"',"+$myarray[1],"',"+$myarray[3],"'
Fix those and your issue should go away (see section below about better approaches). That comma makes PowerShell take the next string as part of an array as supposed to a string concat. When arrays are flattened to strings in PowerShell the elements are space delimited. The issue is not a space in the array element but how you are building your string.
Compare these results to see what I mean
$myarray = 65..90 | %{[string]([char]$_) * 4}
"'"+$myarray[2],"'"
"'"+$myarray[2]+"'"
'CCCC '
'CCCC'
In the first output example the 2 element array $myarray[2],"'" is being added to the string "'". In the second the quote then the array element are being added to the string and then another quote. Pure string concatenation.
Consider better string building approaches
Know that there are other ways to do this as well. You can use subexpressions and the format operator if that helps.
"'$($myarray[2])','$($myarray[3])'"
"'{0}','{1}'" -f $myarray[2],$myarray[3]
SQL Injection Warning
You are manually building, with string concatenation, sql code. This means that an attacked could put something malicious in your input file and you could very well execute this. You need to be using command parameterization. Look up SQL injection as this is out of scope of the question.