Perl DBI - transfer data between two sql servers - fetchall_arrayref - sql

I have 2 servers: dbh1 and dbh2 where I query dbh1 and pull data via fetchall_arrayref method. Once I execute the query, I want to insert the output from dbh1 into a temp table on server dbh2.
I am able to establish access to both servers at the same time and am able to pull data from both.
1. I pull data from dbh1:
while($row = shift(#$rowcache) || shift(#{$rowcache=$sth1->fetchall_arrayref(undef, $max_rows)})) {
#call to sub insert2tempData
&insert2tempData(values #{$row});
}
2. Then on dbh2 I have an insert query:
INSERT INTO ##population (someid, Type, anotherid)
VALUES ('123123', 'blah', '634234');
Question:
How can I insert the bulk result of the fetchall_arrayref from dbh1 into the temp table on server dbh2 (without looping through individual records)?
Ok - so i was able to resolve this issue and was able to implement the following code:
my $max_rows = 38;
my $rowcache = [];
my $sum = 0;
if($fldnames eq "ALL"){ $fldnames = join(',', #{ $sth1->{NAME} });}
my $ins = $dbh2->prepare("insert into $database2.dbo.$tblname2 ($fldnames) values $fldvalues");
my $fetch_tuple_sub = sub { shift(#$rowcache) || shift(#{$rowcache=$sth1->fetchall_arrayref(undef, $max_rows)}) };
my #tuple_status;
my $rc;
$rc = $ins->execute_for_fetch($fetch_tuple_sub, \#tuple_status);
my #errors = grep { ref $_ } #tuple_status;
The transfer works but it is still slower than if I were to transfer data manually through SQL Server export/import wizard . The issue that i notice is that the data flows row by row into the destination and I was wondering if it is possible to increase the bulk transfer size. It downloads the data extremely fast, but when i combine download and upload then the speeds decreases dramatically and it takes up to 10 minutes to transfer a 5000 row table between servers.

It would be better if you said what your goal was (speed?) rather than asking a specific question on avoiding looping.
For a Perl/DBI way:
Look at DBI's execute_array and execute_for_fetch however as you've not told us which DBD you are using it is impossible to say more. Not all DBDs support bulk insert and when they don't DBI emulates it. DBD::Oracle does and DBD::ODBC does (in recent versions see odbc_array_operations) but in the latter it is off by default.

You didn't mention which version of SQL Server you are using. First, I would look into the "BULK INSERT" support of that version.
You also didn't mention how many rows are involved. I'll assume that they fit into memory, otherwise a bulk insert won't work.
From there it's up to you to translate the output of fetchall_arrayref into the syntax needed for the "BULK INSERT" operation.

Related

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

Can't figure out how to insert keys and values of nested JSON data into SQL rows with NiFi

I'm working on a personal project and very new (learning as I go) to JSON, NiFi, SQL, etc., so forgive any confusing language used here or a potentially really obvious solution. I can clarify as needed.
I need to take the JSON output from a website's API call and insert it into a table in my MariaDB local server that I've set up. The issue is that the JSON data is nested, and two of the key pieces of data that I need to insert are used as variable key objects rather than values, so I don't know how to extract it and put it in the database table. Essentially, I think I need to identify different pieces of the JSON expression and insert them as values, but I'm clueless how to do so.
I've played around with the EvaluateJSON, SplitJSON, and FlattenJSON processors in particular, but I can't make it work. All I can ever do is get the result of the whole expression, rather than each piece of it.
{"5381":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":74.0,"tm_def_snp":63.0,"temperature":58.0,"st_snp":8.0,"punts":4.0,"punt_yds":178.0,"punt_lng":55.0,"punt_in_20":1.0,"punt_avg":44.5,"humidity":47.0,"gp":1.0,"gms_active":1.0},
"1023":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":82.0,"tm_def_snp":56.0,"temperature":74.0,"off_snp":82.0,"humidity":66.0,"gs":1.0,"gp":1.0,"gms_active":1.0},
"5300":{"wind_speed":17.0,"tm_st_snp":27.0,"tm_off_snp":80.0,"tm_def_snp":64.0,"temperature":64.0,"st_snp":21.0,"pts_std":9.0,"pts_ppr":9.0,"pts_half_ppr":9.0,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl":4.0,"idp_sack":1.0,"idp_qb_hit":2.0,"humidity":100.0,"gp":1.0,"gms_active":1.0,"def_snp":23.0},
"608":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":53.0,"tm_def_snp":79.0,"temperature":88.0,"st_snp":4.0,"pts_std":5.5,"pts_ppr":5.5,"pts_half_ppr":5.5,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl_ast":1.0,"idp_tkl":5.0,"humidity":78.0,"gs":1.0,"gp":1.0,"gms_active":1.0,"def_snp":56.0},
"3396":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":60.0,"tm_def_snp":70.0,"temperature":63.0,"st_snp":19.0,"off_snp":13.0,"humidity":100.0,"gp":1.0,"gms_active":1.0}}
This is a snapshot of an output with a couple thousand lines. Each of the numeric keys that you see above (5381, 1023, 5300, etc) are player IDs for the following stats. I have a table set up with three columns: Player ID, Stat ID, and Stat Value. For example, I need that first snippet to be inserted into my table as such:
Player ID Stat ID Stat Value
5381 wind_speed 4.0
5381 tm_st_snp 26.0
5381 tm_off_snp 74.0
And so on, for each piece of data. But I don't know how to have NiFi select the right pieces of data to insert in the right columns.
I believe that it's possible to use jolt to transform your json into a format:
[
{"playerId":"5381", "statId":"wind_speed", "statValue": 0.123},
{"playerId":"5381", "statId":"tm_st_snp", "statValue": 0.456},
...
]
then use PutDatabaseRecord with json reader.
Another approach is to use ExecuteGroovyScript processor.
Add new parameter to it with name SQL.mydb and link it to your DBCP controller service
And use the following script as Script Body parameter:
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
def ff=session.get()
if(!ff)return
//read flow file content and parse it
def body = ff.read().withReader("UTF-8"){reader->
new JsonSlurper().parse(reader)
}
def results = []
//use defined sql connection to create a batch
SQL.mydb.withTransaction{
def cmd = 'insert into mytable(playerId, statId, statValue) values(?,?,?)'
results = SQL.mydb.withBatch(100, cmd){statement->
//run through all keys/subkeys in flow file body
body.each{pid,keys->
keys.each{k,v->
statement.addBatch(pid,k,v)
}
}
}
}
//write results as a new flow file content
ff.write("UTF-8"){writer->
new JsonBuilder(results).writeTo(writer)
}
//transfer to success
REL_SUCCESS << ff

Groovy SQL Multiple ResultSets

I am calling a stored procedure from my Groovy code. The stored proc looks like this
SELECT * FROM blahblahblah
SELECT * FROM suchAndsuch
So basically, two SELECT statements and therefore two ResultSets.
sql.eachRow("dbo.testing 'param1'"){ rs ->
println rs
}
This works fine for a single ResultSet. How can I get the second one (or an arbitrary number of ResultSets for that matter).
You would need callWithAllRows() or its variant.
The return type of this method is List<List<GroovyRowResult>>.
Use this when calling a stored procedure that utilizes both output
parameters and returns multiple ResultSets.
This question is kind of old, but I will answer since I came across the same requirement recently and it maybe useful for future reference for me and others.
I'm working on a Spring application with SphinxSearch. When you run a query in sphinx, you get results, you need to run a second query to get the metadata for number of records etc...
// the query
String query = """
SELECT * FROM INDEX_NAME WHERE MATCH('SEARCHTERM')
LIMIT 0,25 OPTION MAX_MATCHES=25;
SHOW META LIKE 'total_found';
"""
// create an instance of our groovy sql (sphinx doesn't use a username or password, jdbc url is all we need)
// connection can be created from java, don't have to use groovy for it
Sql sql = Sql.newInstance('jdbc:mysql://127.0.0.1:9306/?characterEncoding=utf8&maxAllowedPacket=512000&allowMultiQueries=true','sphinx','sphinx123','com.mysql.jdbc.Driver')
// create a prepared statement so we can execute multiple resultsets
PreparedStatement ps = sql.getConnection().prepareStatement(query)
// execute the prepared statement
ps.execute()
// get the first result set and pass to GroovyResultSetExtension
GroovyResultSetExtension rs1 = new GroovyResultSetExtension(ps.getResultSet())
rs1.eachRow {
println it
}
// call getMoreResults on the prepared statement to activate the 2nd set of results
ps.getMoreResults()
// get the second result set and pass to GroovyResultSetExtension
GroovyResultSetExtension rs2 = new GroovyResultSetExtension(ps.getResultSet())
rs2.eachRow {
println it
}
Just some test code, this needs some improving on. You can loop the result sets and do whatever processing...
Comments should be self-explanatory, hope it helps others in the future!

TCL List or variables in SQL

I am running into an issue trying to use a list in with orasql. Is there another way to do it? I know I could loop through using foreach and set the value to a string variable but I think it would take a lot longer with all of those little db pulls. I could also run one query but I am no DBM and I can't seem to get the query time down, it's a lot more complex than what I am laying out here and I need users to pull it as quickly as possible.
How can I do something like this:
orasql $DB(db) "select this from this_table"
orafetch $DB(db) {
lappend list1 #1
}
orasql $DB(db) "select that from that_table where this in ($list1)"
orafetch $DB(db) {
lappend that #1
}
Is this what you are trying to do?
select that
from that_table
where this in (select this from this_table);

Perl DBI modifying Oracle database by creating a VIEW

I wrote a Perl script to check the data in an Oracle database. Because the query process is very complex I chose to create a VIEW in the middle. Using this view the code could be largely simplified.
The Perl code run well when I used it to query the database starting from a file, like Perl mycode.pl file_a. The Perl code reads lines from file_a and creates/updates the view until the end of the input. The results I achieved are completely right.
The problem came when I simultaneously run
perl mycode.pl file_a
and
perl mycode.pl file_b
to access the same database. According to my observation, the VIEW used by the first process will be modified by the second process. These two processes were intertwined on the same view.
Is there any suggestion to make these two processes not conflict with one another?
The Perl code for querying database is normally like this, but the details in each real query is more complex.
my ($gcsta,$gcsto,$cms) = #t; #(details of #t is read from a line in file a or b)
my $VIEWSS = 'CREATE OR REPLACE VIEW VIEWSS AS SELECT ID,GSTA,GSTO,GWTA FROM TABLEA WHERE GSTA='.$gcsta.' AND GSTO='.$gcsto.' AND CMS='.$cms;
my $querying = q{ SELECT COUNT(*) FROM VIEWSS WHERE VIEWSS.ID=1};
my $inner_sth = $dbh->prepare($VIEWSS);
my $inner_rv = $inner_sth->execute();
$inner_sth = $dbh->prepare($querying);
$inner_rv = $inner_sth->execute();
You must
Create the view only once, and use it everywhere
Use placeholders in your SQL statements, and pass the actual parameters with the call to execute
Is this the full extent of your SQL? Probably not, but if so it really is fairly simple.
Take a look at this refactoring for some ideas. Note that is uses a here document to express the SQL. The END_SQL marker for the end of the text must have no whitespace before or after it.
If your requirement is more complex than this then please describe it to us so that we can better help you
my $stmt = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $rv = $stmt->execute($gcsta, $gcsto, $cms);
If you must use a view then you should use placeholders in the CREATE VIEW as before, and make every set of changes into a transaction so that other processes can't interfere. This involves disabling AutoCommit when you create the database handle $dbh and adding a call to $dbh->commit when all the steps are complete
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:Oracle:mydbase', 'user', 'pass',
{ AutoCommit => 0, RaiseError => 1 } );
my $make_view = $dbh->prepare(<<'END_SQL');
CREATE OR REPLACE VIEW viewss AS
SELECT id, gsta, gsto, gwta
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $get_count = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM viewss
WHERE id = 1
END_SQL
while (<>) {
my ($gcsta, $gcsto, $cms) = split;
my $rv = $make_view->execute($gcsta, $gcsto, $cms);
$rv = $get_count->execute;
my ($count) = $get_count->fetchrow_array;
$dbh->commit;
}
Is the view going to be the same or different?
If the views are all the same then create it only once, or check if it exists with the all_views table : http://docs.oracle.com/cd/B12037_01/server.101/b10755/statviews_1202.htm#i1593583
You can easily create a view including your pid with the $$ variable to be the pid, but it wont be unique across computers, oracle has also some unique ids, see http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions150.htm, for example, the SESSIONID.
But do you really need to do this? why dont you prepare a statement and then execute it? http://search.cpan.org/dist/DBI/DBI.pm#prepare
thanks,
mike