The code below works, but all of the data displays in one row(but different columns) when opened in Excel. The query SHOULD display the data headings, row 1, and row 2. Also, when I open the file, I get a warning that says "The file you are trying to open,'xxxx.csv', is in a different format than specified by the file extension. Verify that the file is not corrupted...etc. Do you want to open the file now?" Anyway to fix that? That may be the cause too.
tldr; export to csv with multiple rows - not just one. fix Excel error. Thanks!
#!/usr/bin/perl
use warnings;
use DBI;
use Text::CSV;
# local time variables
($sec,$min,$hr,$mday,$mon,$year) = localtime(time);
$mon++;
$year += 1900;
# set name of database to connect to
$database=MDLSDB1;
# connection to the database
my $dbh = DBI->connect("dbi:Oracle:$database", "", "")
or die "Can't make database connect: $DBI::errstr\n";
# some settings that you usually want for oracle 10
$dbh->{LongReadLen} = 65535;
$dbh->{PrintError} = 0;
# sql statement to run
$sql="select * from eg.well where rownum < 3";
my $sth = $dbh->prepare($sql);
$sth->execute();
my $csv = Text::CSV->new ( { binary => 1 } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, ">:raw", "results-$year-$mon-$mday-$hr.$min.$sec.csv";
$csv->print($fh, $sth->{NAME});
while(my $row = $sth->fetchrow_arrayref){
$csv->print($fh, $row);
}
close $fh or die "Failed to write CSV: $!";
while(my $row = $sth->fetchrow_arrayref){
$csv->print($fh, $row);
$csv->print($fh, "\n");
}
CSV rows are delimited by newlines. Just simply add a newline after each row.
I think another solution is to use the instantiation of the Text::CSV object and pass along the desired line termination there...
my $csv = Text::CSV->new ( { binary => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag();
becomes:
my $csv = Text::CSV->new({ binary => 1, eol => "\r\n" })
or die "Cannot use CSV: " . Text::CSV->error_diag();
Related
I'm trying to read a gz file line by line in Perl6, however, I'm getting blocked:
How to read gz file line by line in Perl6 however, this method, reading everything into :out uses far too much RAM to be usable except on very small files.
I don't understand how to use Perl6's Compress::Zlib to get everything line by line, although I opened an issue on their github https://github.com/retupmoca/P6-Compress-Zlib/issues/17
I'm trying Perl5's Compress::Zlib to translate this code, which works perfectly in Perl5:
use Compress::Zlib;
my $file = "data.txt.gz";
my $gz = gzopen($file, "rb") or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {
# Process the line read in $_
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
to something like this using Inline::Perl5 in Perl6:
use Compress::Zlib:from<Perl5>;
my $file = 'chrMT.1.vcf.gz';
my $gz = Compress::Zlib::new(gzopen($file, 'r');
while ($gz.gzreadline($_) > 0) {
print $_;
}
$gz.gzclose();
but I can't see how to translate this :(
I'm confused by Lib::Archive example https://github.com/frithnanth/perl6-Archive-Libarchive/blob/master/examples/readfile.p6 I don't see how I can get something like item 3 here
There should be something like
for $file.IO.lines(gz) -> $line { or something like that in Perl6, if it exists, I can't find it.
How can I read a large file line by line without reading everything into RAM in Perl6?
Update Now tested, which revealed an error, now fixed.
Solution #2
use Compress::Zlib;
my $file = "data.txt.gz" ;
my $handle = try open $file or die "Error reading $file: $!" ;
my $zwrap = zwrap($handle, :gzip) ;
for $zwrap.lines {
.print
}
CATCH { default { die "Error reading $file: $_" } }
$handle.close ;
I've tested this with a small gzipped text file.
I don't know much about gzip etc. but figured this out based on:
Knowing P6;
Reading Compress::Zlib's README and choosing the zwrap routine;
Looking at the module's source code, in particular the signature of the zwrap routine our sub zwrap ($thing, :$zlib, :$deflate, :$gzip);
And trial and error, mainly to guess that I needed to pass the :gzip adverb.
Please comment on whether my code works for you. I'm guessing the main thing is whether it's fast enough for the large files you have.
A failed attempt at solution #5
With solution #2 working I would have expected to be able to write just:
use Compress::Zlib ;
.print for "data.txt.gz".&zwrap(:gzip).lines ;
But that fails with:
No such method 'eof' for invocant of type 'IO::Path'
This is presumably because this module was written before the reorganization of the IO classes.
That led me to #MattOates' IO::Handle like object with .lines ? issue. I note no response and I saw no related repo at https://github.com/MattOates?tab=repositories.
I am focusing on the Inline::Perl5 solution that you tried.
For the call to $gz.gzreadline($_): it seems like gzreadline tries to return the line read from the zip file by modifying its input argument $_ (treated as an output argument, but it is not a true Perl 5 reference variable[1]), but the modified value is not returned to the Perl 6 script.
Here is a possoble workaround:
Create a wrapper module in the curent directory, e.g. ./MyZlibWrapper.pm:
package MyZlibWrapper;
use strict;
use warnings;
use Compress::Zlib ();
use Exporter qw(import);
our #EXPORT = qw(gzopen);
our $VERSION = 0.01;
sub gzopen {
my ( $fn, $mode ) = #_;
my $gz = Compress::Zlib::gzopen( $fn, $mode );
my $self = {gz => $gz};
return bless $self, __PACKAGE__;
}
sub gzreadline {
my ( $self ) = #_;
my $line = "";
my $res = $self->{gz}->gzreadline($line);
return [$res, $line];
}
sub gzclose {
my ( $self ) = #_;
$self->{gz}->gzclose();
}
1;
Then use Inline::Perl5 on this wrapper module instead of Compress::Zlib. For example ./p.p6:
use v6;
use lib:from<Perl5> '.';
use MyZlibWrapper:from<Perl5>;
my $file = 'data.txt.gz';
my $mode = 'rb';
my $gz = gzopen($file, $mode);
loop {
my ($res, $line) = $gz.gzreadline();
last if $res == 0;
print $line;
}
$gz.gzclose();
[1]
In Perl 5 you can modify an input argument that is not a reference, and the change will be reflected in the caller. This is done by modifying entries in the special #_ array variable. For example: sub quote { $_[0] = "'$_[0]'" } $str = "Hello"; quote($str) will quote $str even if $str is not passed by reference.
I have coded a perl script which connects to Oracle database and updates the data. The execution of perl script and connections to Database works fine.
But, I want to export the logs of the updated data into a Temporary Text file. Please suggest a solution. Below is my code,
use strict;
use DBD::Oracle;
use DBI;
my $driver = "Oracle";
my $database = "host=xxxxx;port=xxx;sid=xxxx";
my $dsn = "DBI:$driver:$database";
my $userid = "xxxxxx";
my $password = "xxxxx";
# Database Connection
my $dbh = DBI->connect($dsn, $userid, $password,{RaiseError => 1}) or die "Can't connect to the Database: $DBI::errstr";
my $sth = $dbh->prepare("UPDATE XXXX SET ABCD=1233 WHERE LOGIN BETWEEN SYSDATE-24*30 and SYSDATE-12*30") or die "$DBI::errstr";
$sth->execute() or die "couldn't execute statementn$!";
$sth->rows;
# End of Program
$sth->finish();
$dbh->disconnect();
I would propose to replace
$sth->execute() or die "couldn't execute statementn$!";
by
unless ( $sth->execute() ) {
open LOGFILE, ">>", /tmp/logs/perl.log or die "Could not open logfile\n";
print LOGFILE $ora->errstr;
close LOGFILE;
die "couldn't execute statementn$!";
}
I am trying to load 160gb csv file to sql and I am using powershell script I got from Github and I get this error
IException calling "Add" with "1" argument(s): "Input array is longer than the number of columns in this table."
At C:\b.ps1:54 char:26
+ [void]$datatable.Rows.Add <<<< ($line.Split($delimiter))
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : DotNetMethodException
So I checked the same code with small 3 line csv and all of the columns match and also have header in first row and there are no extra delimiters not sure why I am getting this error.
The code is below
<# 8-faster-runspaces.ps1 #>
# Set CSV attributes
$csv = "M:\d\s.txt"
$delimiter = "`t"
# Set connstring
$connstring = "Data Source=.;Integrated Security=true;Initial Catalog=PresentationOptimized;PACKET SIZE=32767;"
# Set batchsize to 2000
$batchsize = 2000
# Create the datatable
$datatable = New-Object System.Data.DataTable
# Add generic columns
$columns = (Get-Content $csv -First 1).Split($delimiter)
foreach ($column in $columns) {
[void]$datatable.Columns.Add()
}
# Setup runspace pool and the scriptblock that runs inside each runspace
$pool = [RunspaceFactory]::CreateRunspacePool(1,5)
$pool.ApartmentState = "MTA"
$pool.Open()
$runspaces = #()
# Setup scriptblock. This is the workhorse. Think of it as a function.
$scriptblock = {
Param (
[string]$connstring,
[object]$dtbatch,
[int]$batchsize
)
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connstring,"TableLock")
$bulkcopy.DestinationTableName = "abc"
$bulkcopy.BatchSize = $batchsize
$bulkcopy.WriteToServer($dtbatch)
$bulkcopy.Close()
$dtbatch.Clear()
$bulkcopy.Dispose()
$dtbatch.Dispose()
}
# Start timer
$time = [System.Diagnostics.Stopwatch]::StartNew()
# Open the text file from disk and process.
$reader = New-Object System.IO.StreamReader($csv)
Write-Output "Starting insert.."
while ((($line = $reader.ReadLine()) -ne $null))
{
[void]$datatable.Rows.Add($line.Split($delimiter))
if ($datatable.rows.count % $batchsize -eq 0)
{
$runspace = [PowerShell]::Create()
[void]$runspace.AddScript($scriptblock)
[void]$runspace.AddArgument($connstring)
[void]$runspace.AddArgument($datatable) # <-- Send datatable
[void]$runspace.AddArgument($batchsize)
$runspace.RunspacePool = $pool
$runspaces += [PSCustomObject]#{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
# Overwrite object with a shell of itself
$datatable = $datatable.Clone() # <-- Create new datatable object
}
}
# Close the file
$reader.Close()
# Wait for runspaces to complete
while ($runspaces.Status.IsCompleted -notcontains $true) {}
# End timer
$secs = $time.Elapsed.TotalSeconds
# Cleanup runspaces
foreach ($runspace in $runspaces ) {
[void]$runspace.Pipe.EndInvoke($runspace.Status) # EndInvoke method retrieves the results of the asynchronous call
$runspace.Pipe.Dispose()
}
# Cleanup runspace pool
$pool.Close()
$pool.Dispose()
# Cleanup SQL Connections
[System.Data.SqlClient.SqlConnection]::ClearAllPools()
# Done! Format output then display
$totalrows = 1000000
$rs = "{0:N0}" -f [int]($totalrows / $secs)
$rm = "{0:N0}" -f [int]($totalrows / $secs * 60)
$mill = "{0:N0}" -f $totalrows
Write-Output "$mill rows imported in $([math]::round($secs,2)) seconds ($rs rows/sec and $rm rows/min)"
Working with a 160 GB input file is going to be a pain. You can't really load it into any kind of editor - or at least you don't really analyze such a data mass without some serious automation.
As per the comments, it seems that the data has some quality issues. In order to find the offending data, you could try binary searching. This approach shrinks the data fast. Like so,
1) Split the file in about two equal chunks.
2) Try and load first chunk.
3) If successful, process the second chunk. If not, see 6).
4) Try and load second chunk.
5) If successful, the files are valid, but you got another a data quality issue. Start looking into other causes. If not, see 6).
6) If either load failed, start from the beginning and use the failed file as the input file.
7) Repeat until you narrow down the offending row(s).
Another a method would be using an ETL tool like SSIS. Configure the package to redirect invalid rows into an error log to see what data is not working properly.
I have this perl script which takes the data from sqlplus database... this database adds a new entry every time when there is a change in the value of state for a particular serial number. Now we need to pick the entries at every state change and prepare a csv file with old state, new state and other fields. db table sample.
SERIALNUMBER STATE AT OPERATORID SUBSCRIBERID TRANSACTIONID
51223344558899 Available 20081008T10:15:47 vsuser
51223344558857 Available 20081008T10:15:49 vsowner
51223344558899 Used 20081008T10:20:25 vsuser
51223344558860 Stolen 20081008T10:15:49 vsanyone
51223344558857 Damaged 20081008T10:50:49 vsowner
51223344558899 Damaged 20081008T10:50:25 vsuser
51343253335355 Available 20081008T11:15:47 vsindian
my script:
#! /usr/bin/perl
#use warnings;
use strict;
#my $circle =
#my $schema =
my $basePath = "/scripts/Voucher-State-Change";
#my ($sec, $min, $hr, $day, $month, $years) = localtime(time);
#$years_+=1900;$mont_+=1;
#my $timestamp=sprintf("%d%02d%02d",$years,$mont,$moday);
sub getDate {
my $daysago=shift;
$daysago=0 unless ($daysago);
#my #months=qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time-(86400*$daysago));
# YYYYMMDD, e.g. 20060126
return sprintf("%d%02d%02d",$year+1900,$mon+1,$mday);
}
my $filedate=getDate(1);
#my $startdate="${filedate}T__:__:__";
my $startdate="20081008T__:__:__";
print "$startdate\n";
##### Generating output file---
my $outputFile = "${basePath}/VoucherStateChangeReport.$filedate.csv";
open (WFH, ">", "$outputFile") or die "Can't open output file $outputFile for writing: $!\n";
print WFH "VoucherSerialNumber,Date,Time,OldState,NewState,UserId\n";
##### Generating log file---
my $logfile = "${basePath}/VoucherStateChange.$filedate.log";
open (STDOUT, ">>", "$logfile") or die "Can't open logfile $logfile for writing: $!\n";
open (STDERR, ">>", "$logfile") or die "Can't open logfile $logfile for writing: $!\n";
print "$logfile\n";
##### Now login to sqlplus-----
my $SQLPLUS='/opt/oracle/product/11g/db_1/bin/sqlplus -S system/coolman7#vsdb';
`$SQLPLUS \#${basePath}/VoucherQuery1.sql $startdate> ${basePath}/QueryResult1.txt`;
open (FH1, "${basePath}/QueryResult1.txt");
while (my $serial = <FH1>) {
chomp ($serial);
my $count = `$SQLPLUS \#${basePath}/VoucherQuery2.sql $serial $startdate`;
chomp ($count);
$count =~ s/\s+//g;
#print "$count\n";
next if $count == 1;
`$SQLPLUS \#${basePath}/VoucherQuery3.sql $serial $startdate> ${basePath}/QueryResult3.txt`;
# print "select * from sample where SERIALNUMBER = $serial----\n";
open (FH3, "${basePath}/QueryResult3.txt");
my ($serial_number, $state, $at, $operator_id);
my $count1 = 0;
my $old_state;
while (my $data = <FH3>) {
chomp ($data);
#print $data."\n";
my #data = split (/\s+/, $data);
my ($serial_number, $state, $at, $operator_id) = #data[0..3];
#my $serial_number = $data[0];
#my $state = $data[1];
#my $at = $data[2];
#my $operator_id = $data[3];
$count1++;
if ($count1 == 1) {
$old_state = $data[1];
next;
}
my ($date, $time) = split (/T/, $at);
$date =~ s/(\d{4})(\d{2})(\d{2})/$1-$2-$3/;
print WFH "$serial_number,$date,$time,$old_state,$state,$operator_id\n";
$old_state = $data[1];
}
}
close(WFH);
query in VoucherQuery1.sql:
select distinct SERIALNUMBER from sample where AT like '&1';
query in VoucherQuery2.sql:
select count(*) from sample where SERIALNUMBER = '&1' and AT like '&2';
query in VoucherQuery2.sql:
select * from sample where SERIALNUMBER = '&1' and AT like '&2';
and my sample output:
VoucherSerialNumber,Date,Time,OldState,NewState,UserId
51223344558857,2008-10-08,10:50:49,Available,Damaged,vsowner
51223344558899,2008-10-08,10:20:25,Available,Used,vsuser
51223344558899,2008-10-08,10:50:25,Used,Damaged,vsuser
Script is working pretty fine. But problem is that actual db table has millions of records for a specific day... and therefore it is raising performance issues... could you please advise how can we improve the efficiency of this script in terms of time & load. Only restriction is that I can't use DBI module for this...
Also in case of any error in the sql queries, error msg is coming to QueryResult?.txt files. I want to handle and receive these errors in my log file. how this can be accomplished? thanks
I think you need to tune your query. A good starting point is to use the EXPLAIN PLAN, if it is an Oracle database.
here i have 3 variable in perl and i m passing those variable in one db.sql file in sql query line then executing that db.sql file in same perl file but it's not fetching those variable.
$patch_name="CS2.001_PROD_TEST_37987_spyy";
$svn_url="$/S/B";
$ftp_loc="/Releases/pkumar";
open(SQL, "$sqlfile") or die("Can't open file $sqlFile for reading");
while ($sqlStatement = <SQL>)
{
$sth = $dbh->prepare($sqlStatement) or die qq("Can't prepare $sqlStatement");
$sth->execute() or die ("Can't execute $sqlStatement");
}
close SQL;
$dbh->disconnect;
following is the db.sql file query
insert into branch_build_info(patch_name,branch_name,ftp_path)
values('$patch_name','$svn_url','$ftp_loc/$patch_name.tar.gz');
pls help me how can i pass the variable so that i can run more the one insert query at a time.
If you can influence the sql-file, you could use placeholders
insert into ... values(?, ?, ?);
and then supply the parameters at execution:
my $ftp_path = "$ftp_loc/$patch_name.tar.gz";
$sth->execute( $patch_name, $svn_url, $ftp_path) or die ...
That way you would only need to prepare the statement once and can execute it as often as you want. But I'm not sure, where that sql-file comes from. It would be helpful, if you could clarify the complete workflow.
You can use the String::Interpolate module to interpolate a string that's already been generated, like the string you're pulling out of a file:
while ($sqlStatement = <SQL>)
{
my $interpolated = new String::Interpolate { patch_name => \$patch_name, svn_url => \$svn_url, ftp_loc => \$ftp_loc };
$interpolated->($sqlStatement);
$sth = $dbh->prepare($interpolated) or die qq("Can't prepare $interpolated");
$sth->execute() or die ("Can't execute $interpolated");
}
Alternately, you could manually perform replacements on the string to be interpolated:
while ($sqlStatement = <SQL>)
{
$sqlStatement =~ s/\$patch_name/$patch_name/g;
$sqlStatement =~ s/\$svn_url/$svn_url/g;
$sqlStatement =~ s/\$ftp_loc/$ftp_loc/g;
$sth = $dbh->prepare($sqlStatement) or die qq("Can't prepare $sqlStatement");
$sth->execute() or die ("Can't execute $sqlStatement");
}
there's a few problems there - looks like you need to read in all of the SQL file to make the query, but you're just reading it a line at a time and trying to run each line as a separate query. The <SQL> just takes the next line from the file.
Assuming there is just one query in the file, something like the following may help:
open( my $SQL_FH, '<', $sqlfile)
or die "Can't open file '$sqlFile' for reading: $!";
my #sql_statement = <$SQL_FH>;
close $SQL_FH;
my $sth = $dbh->prepare( join(' ',#sql_statement) )
or die "Can't prepare #sql_statement: $!";
$sth->execute();
However, you still need to expand the variables in your SQL file. If you can't replace them with placeholders as already suggested, then you'll need to either eval the file or parse it in some way...