How to remove the double quotes and skip last 3 lines in bcp loading into SQL Server? - sql

I am loading data into SQL Server using bcp command line utility.
The problem is data is coming is like below. Every field in double quotes and I have to skip last three rows because it has some trailers. How can i solve this?
Thanks in Advance. I appreciate if you could help in this.
bcp database.schema.tablename in Filename.text -T -c -t"|" -r"0x0a" -F 3 -m 2
UHDR 20211110
"DATE","CUSIP","ISIN","SEDOL","TICKER","DESCRIPTION","QUANTITY","RATE","COMMENT","MARKET","FEE"
11/10/2021|""|"CA45826T3010"|"BMVXZT5"|"ITR"|"INTEGRA RESOURCES CORP REGISTERED SHS"|"28712"|"0.0000"|"HTB"|"CA"|"11.5000"

If you want to:
unconditionally remove all " chars.
unconditionally skip the last 3 lines:
(Get-Content -ReadCount 0 Filename.text) -replace '"' |
Select-Object -SkipLast 3 |
Set-Content Filename_CleanedUp.text
Note: -ReadCount 0 is a performance optimization that makes Get-Content read all lines into a single array instead of streaming the lines one by one.
Then pass Filename_CleanedUp.text to your bcp command.

Related

BCP update database table base on output from powershell

I have 4 files with the same csv header as following
Column1,Column2,Column3,Column4
But I only required data from Column2,Column3,Column4 for import the data into SQL database using BCP . I am using the PowerShell to select the columns that I want and import the required data using BCP but my powershell executed with no error and there are not data updated in my database table. May I know how to set the BCP to import the output from Powershell to database table. Here are my powershell script
$filePath = Get-ChildItem -Path 'D:\test\*' -Include $filename
$desiredColumn = 'Column2','Column3','Column4'
foreach($file in $filePath)
{
write-host $file
$test = import-csv $file | select $desiredColumn
write-host $test
$action = bcp <myDatabaseTableName> in $test -T -c -t";" -r"\n" -F2 -S <MyDatabase>
}
These are the output from the powershell script
D:\test\sample1.csv
#{column2=111;column3=222;column4=333} #{column2=444;column3=555;column4=666}
D:\test\sample2.csv
#{column2=777;column3=888;column4=999} #{column2=aaa;column3=bbb;column4=ccc}
First off, you can't update a table with bcp. It is used to bulk load data. That is, it will either insert new rows or export existing data into a flat file. Changing existing rows, usually called as updating, is out of scope for bcp. If that's what you need, you need to use another a tool. Sqlcmd works fine, and Powershell's got Invoke-Sqlcmd for running arbitary TSQL statements.
Anyway, the BCP utility has notoriously tricky syntax. As far as I know, one cannot bulk load data by passing the data as parameter to bcp, a source file must be used. Thus you need to save the filtered file and pass its name to bcp.
Exporting a filtered CSV is easy enough, just remember to use -NoTypeInformation switch, lest you'll get #TYPE Selected.System.Management.Automation.PSCustomObject as your first row of data. Assuming the bcp arguments are well and good (why -F2 though? And Unix newlines?).
Stripping double quotes requires another an edit to the file. Scrpting Guy has a solution.
foreach($file in $filePath){
write-host $file
$test = import-csv $file | select $desiredColumn
# Overwrite filtereddata.csv, should one exist, with filtered data
$test | export-csv -path .\filtereddata.csv -NoTypeInformation
# Remove doulbe quotes
(gc filtereddata.csv) | % {$_ -replace '"', ''} | out-file filtereddata.csv -Fo -En ascii
$action = bcp <myDatabaseTableName> in filtereddata.csv -T -c -t";" -r"\n" -F2 -S <MyDatabase>
}
Depending on your locale, column separator might be semicolon, colon or something else. Use -Delimiter '<character>' switch to pass whatever you need or change bcp's argument.
Erland's got a helpful page about bulk operations. Also, see Redgate's advice.
Without need to modify the file first, there is an answer here about how bcp can handle quoted data.
BCP in with quoted fields in source file
Essentially, you need to use the -f option and create/use a format file to tell SQL your custom field delimiter (in short, it is no longer a lone comma (,) but it is now (",")... comma with two double quotes. Need to escape the dblquotes and a small trick to handle the first doulbe quote on a line. But it works like a charm.
Also, need the format file to ignore column(s)... just set the destination column number to zero. All with no need to modify the file before load. Good luck!

Store my "Sybase" query result /output into a script variable

I need a variable to keep the results retrieved from a query (Sybase) that´s in a script.
I have built the following script, it works fine I get the desired result when I run it
Script: EXECUTE_DAILY:
isql -U database_dba -P password <<EOF!
select the_name from table_name where m_num="NUMB912" and date="17/01/2019"
go
quit
EOF!
echo "All Done"
Output:
"EXECUTE_DAILY" 97 lines, 293 characters
user#zp01$ ./EXECUTE_DAILY
the_name
-----------------------------------
NAME912
(1 row affected)
But now I would like to keep the output(the_name: NAME912) in a variable.
So far this is basically what I'm trying with no success.
variable=$(isql -U database_dba -P password -se "select the_name from table_name where m_num="NUMB912" and date="17/01/2019" ")
But, is not working. I can't save NAME912 in a variable.
You need to parse the output for the desired string/piece-of-data that you wish to store in your variable. I tend to make my life a bit easier by making sure I can easily/quickly search/parse out what I want.
Keeping a few issues in mind ...
I tend to use isql -s"|" -w10000 to ensure (most of the time) that a) the result set has all columns delimited with the pipe ('|') and b) a single row of data does not span multiple rows; the pipe delimiter makes it easier to parse out columns that may contain white space; obviously (?) use a different delimiter if a pipe may be part of your actual data
to make parsing of the isql output a bit easier I tend to add a unique, grep-able (literal) string to the rows that I'm looking to search/parse
some databases (eg, SQLAnywhere, Oracle) tend to mimic a literal value as the column header if said literal string has not been assigned an explicit alias/header; this means that if you do a simple search on your literal string then you'll get a match for the result set header as well as the actual data row
I tend to capture all isql output to a temporary file; this allows for easier follow-on processing, eg, error checking, data parsing, dumping contents to a logfile, etc
So, with the above in mind my code typically looks something like:
$ outfile=/tmp/.$$.isql.outfile
$ isql -s"|" -w10000 -U database_dba -P password <<-EOF > ${outfile} 2>&1
-- 'GREP'||'ME' ensures that 'GREPME' only shows up in the data row
select 'GREP'||'ME',the_name
from table_name
where m_num = "NUMB912"
and date = "17/01/2019"
go
EOF
$ cat ${outfile}
... snip ...
|'GREP'||'ME'|the_name | # notice the default column header = 'GREP'||'ME' which won't match my search for 'GREPME'
|------------|----------|
|GREPME |NAME912 | # this is the line I want to search/parse
... snip ...
$ read -r namevar < <(egrep GREPME ${outfile} | awk -F"|" '{print $3}')
$ echo ${namevar}
NAME912

How can I delete a specific line (e.g. line 102,206,973) from a 30gb csv file?

What method can I use to delete a specific line from a csv/txt file that is too big too load into memory and edit manually?
Background
My question is actually an indirect solution to a problem related with importing csv into sql databases.
I have a series of 10-30gb csv files I want to import and populate an sqlite table from within R (Since they are too large to import as data frames as a whole into R). I am using the 'RSQlite' package for this.
A couple fail because of an error related to one of the lines being badly formatted. The populating process is then cancelled. R returns the line number which caused the process to fail.
The error given is:
./csvfilename line 102206973 expected 9 columns of data but found 3)
So I know exactly the line which causes the error.
I see 2 potential 'indirect' solutions which I was hoping someone could help me with.
(i) Deleting the line causing the error in 20+gb files. e.g. line 102,206,973 in the example above.
I am not concerned with 'losing' the data in line 102,206,973 by just skipping or deleting it. However I have tried and failed to somehow access the csv file and to remove the line.
(ii) Using sqlite directly (or anything else?) to import an csv which does allow you to skip lines or an error.
Although not likely to be related directly to the solution, here is the R code used.
db <- dbConnect(SQLite(), dbname=name_of_table)
dbWriteTable(conn = db, name ="currentdata", value = csvfilename, row.names = FALSE, header = TRUE)
Thanks!
To delete a specific line you can use sed:
sed -e '102206973d' your_file
If you want the replacement to be done in-place, do
sed -i.bak -e '102206973d' your_file
This will create a backup names your_file.bak and your_file will have the specified line removed.
Example
$ cat a
1
2
3
4
5
$ sed -i.bak -e '3d' a
$ cat a
1
2
4
5
$ cat a.bak
1
2
3
4
5

How to delete last row in output file generated by nzsql

I am trying to delete last row in the file generated by nzsql.Please find the below query.
nzsql -A -c "SELECT * FROM AM_MAS_DIVISION_DIM" > abc.out
When I execute this query the output will be generated and stored in abc.out.This will include both header columns as well as some time information at the bottom.But I don't need the bottom metadata and want to keep only my header columns. How can I do this using only nzsql.Please help me.Thanks in advance.
use -r flag in the nzsql command to avoid getting that row [assuming the metadata referred in question is the row count summary line, ex: (3 rows)]
-r Suppresses the row count that is displayed at the end of the SQL output.
reference: http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.adm.doc%2Fr_sysadm_nzsql_command.html
Why don't you just pipe the output to a unix command to remove it? I think something like this will work:
nzsql -A -c "SELECT * FROM AM_MAS_DIVISION_DIM" | sed '$d' > abc.out
Seems to be a recommended solution for getting rid of the last line (although ed, gawk, and other tools can handle it).

Parse sqlite query line by line

i have a problem which i cant figure out myself, i have a SQLite database which contains data
to retrieve the data i use a bash script and the command sqlite3 db "SELECT prkey FROM printers"
the output is something like this:
1
3
4
5
6
7
i need to parse each line and use it for my next command. and with out the use of a > file
regards Marco
The for loops work, but they are not very robust: they break if a line in the output has a space, and the shell must store the whole output of the query in memory before it starts running the loop. These two issues are easily dealt with:
sqlite3 db "SELECT ..." | while read prkey; do
echo "do stuff with $prkey"
done
#!/bin/bash
for line in $(sqlite3 db "SELECT prkey FROM printers"); do
echo "line is --> $line" # do stuff with $line
done
You can use for loop on the return value of the sqlite command.
for prkey in `sqlite3 db "SELECT prkey FROM printers"`; do
echo $prkey; #replace with your command
done;