I'm using tiny_tds to pull data from a few different databases. As of right now, I have a Ruby file with multiple methods, each one devoted to a specific query (since the databases are very large, and not all of the scripts I'm using require the same kind/amount of data). To make things cleaner and easier, I wanted to separate out the SQL queries themselves into a single file, rather than have them embedded into the Ruby file containing the functions. But the SQL queries depend on certain fields having specific values. In essence, what I'm trying to do is send a variable to the SQL query, get data based on that particular value in a field, and feed that data back into the Ruby file.
So a simplified version of what I'm currently doing is this:
def initialize
#client = TinyTds::Client.new(:username => '', :password => '', :host => '', timeout: 0)
end
def query_example(value)
results = #client.execute("SELECT DISTINCT field1, field2, field3
FROM db
WHERE field1 = '#{value}'
")
results.each {|x| return x}
end
The script calls the query_example(value) function, and based on the value variable, gets the relevant data for that case.
But what I would like is to essentially have one file that has nothing but the raw SQL queries, like:
SELECT DISTINCT field1, field2, field3
FROM db
WHERE field1 = '#{value}'
where the #{value} is populated by an external value fed to it (although I'm not sure how that kind of wildcard would be declared here). Let's say I save this file as "query.sql", then I would just want to read that file into the Ruby function like:
def query_example(value)
query = File.read("query.sql")
results = #client.execute(query)
end
The problem is I don't actually know how to pass that value argument into the SQL file itself, so that the data that is pulled with the execute command is the specific data for that value. Is this possible with tiny_tds, or is tiny_tds simply not made for this kind of two-way interaction between external SQL queries, and the Ruby functions that call them? I'm open to considering other SQL libraries, I'm just very unfamiliar with the options as I deal mostly with the Ruby side of things.
You can use format method to replace placeholders with actual values. Here is the simplest example:
template = "Hello, %{name}!"
format(template, name: "World")
=> "Hello, World!"
And your code may look like this:
# query.sql
SELECT DISTINCT field1, field2, field3
FROM db
WHERE field1 = '%{value}'
# ruby file
def query_example(value)
query = File.read("query.sql")
results = #client.execute(format(query, value: value))
end
Related
I'm a fan of Yesql-style templates that consist of raw SQL with placeholders for parameters.
Here is an example from the docs:
SELECT *
FROM users
WHERE country_code = ?
This snipped is stored in a file and pulled into the application like so:
(defquery users-by-country "some/where/users_by_country.sql")
(users-by-country db-spec "GB")
Is there any gem with the same functionality in Ruby? Or is there a way to at least load raw SQL from a file and execute it, storing the result in an array or json?
Absolutely, and this (a separate SQL file) is a good technique for longer queries. Any of the database adapter gems can do this. In pg, which I'm most familiar with,
Read your query from a SQL file, e.g. File.read
Open a connection, e.g. PG::Connection.open
Call exec_params
Example from pg documentation:
require 'pg'
conn = PG::Connection.open(:dbname => 'test')
res = conn.exec_params('SELECT $1 AS a, $2 AS b, $3 AS c', [1, 2, nil])
# Equivalent to:
# res = conn.exec('SELECT 1 AS a, 2 AS b, NULL AS c')
http://deveiate.org/code/pg/PG/Connection.html
As you can see, the placeholders here are sequentially numbered, an improvement on simple question marks, I think.
We have a weekly backup process which exports our production Google Appengine Datastore onto Google Cloud Storage, and then into Google BigQuery. Each week, we create a new dataset named like YYYY_MM_DD that contains a copy of the production tables on that day. Over time, we have collected many datasets, like 2014_05_10, 2014_05_17, etc. I want to create a data set Latest_Production_Data that contains a view for each of the tables in the most recent YYYY_MM_DD dataset. This will make it easier for downstream reports to write their query once and always retrieve the most recent data.
To do this, I have code that gets the most recent dataset and the names of all the tables that dataset contains from the BigQuery API. Then, for each of these tables, I fire a tables.insert call to create a view that is a SELECT * from the table I am looking to create a reference to.
This fails for tables that contain a RECORD field, from what looks to be a pretty benign column-naming rule.
For example, I have this table:
For which I issue this API call:
{
'tableReference': {
'projectId': 'redacted',
'tableId': u'AccountDeletionRequest',
'datasetId': 'Latest_Production_Data'
}
'view': {
'query': u'SELECT * FROM [2014_05_17.AccountDeletionRequest]'
},
}
This results in the following error:
HttpError: https://www.googleapis.com/bigquery/v2/projects//datasets/Latest_Production_Data/tables?alt=json returned "Invalid field name "__key__.namespace". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.">
When I execute this query in the BigQuery web console, the columns are renamed to translate the . to an _. I kind of expected the same thing to happen when I issued the create view API call.
Is there an easy way I can programmatically create a view for each of the tables in my dataset, regardless of their underlying schema? The problem I'm encountering now is for record columns, but another problem I anticipate is for tables that have repeated fields. Is there some magic alternative to SELECT * that will take care of all these intricacies for me?
Another idea I had was doing a table copy, but I would prefer not to duplicate the data if I can at all avoid it.
Here is the workaround code I wrote to dynamically generate a SELECT statement for each of the tables:
def get_leaf_column_selectors(dataset, table):
schema = table_service.get(
projectId=BQ_PROJECT_ID,
datasetId=dataset,
tableId=table
).execute()['schema']
return ",\n".join([
_get_leaf_selectors("", top_field)
for top_field in schema["fields"]
])
def _get_leaf_selectors(prefix, field):
if prefix:
format = prefix + ".%s"
else:
format = "%s"
if 'fields' not in field:
# Base case
actual_name = format % field["name"]
safe_name = actual_name.replace(".", "_")
return "%s as %s" % (actual_name, safe_name)
else:
# Recursive case
return ",\n".join([
_get_leaf_selectors(format % field["name"], sub_field)
for sub_field in field["fields"]
])
We had a bug where you needed to need to select out the individual fields in the view and use an 'as' to rename the fields to something legal (i.e they don't have '.' in the name).
The bug is now fixed, so you shouldn't see this issue any more. Please ping this thread or start a new question if you see it again.
I wrote a Perl script to check the data in an Oracle database. Because the query process is very complex I chose to create a VIEW in the middle. Using this view the code could be largely simplified.
The Perl code run well when I used it to query the database starting from a file, like Perl mycode.pl file_a. The Perl code reads lines from file_a and creates/updates the view until the end of the input. The results I achieved are completely right.
The problem came when I simultaneously run
perl mycode.pl file_a
and
perl mycode.pl file_b
to access the same database. According to my observation, the VIEW used by the first process will be modified by the second process. These two processes were intertwined on the same view.
Is there any suggestion to make these two processes not conflict with one another?
The Perl code for querying database is normally like this, but the details in each real query is more complex.
my ($gcsta,$gcsto,$cms) = #t; #(details of #t is read from a line in file a or b)
my $VIEWSS = 'CREATE OR REPLACE VIEW VIEWSS AS SELECT ID,GSTA,GSTO,GWTA FROM TABLEA WHERE GSTA='.$gcsta.' AND GSTO='.$gcsto.' AND CMS='.$cms;
my $querying = q{ SELECT COUNT(*) FROM VIEWSS WHERE VIEWSS.ID=1};
my $inner_sth = $dbh->prepare($VIEWSS);
my $inner_rv = $inner_sth->execute();
$inner_sth = $dbh->prepare($querying);
$inner_rv = $inner_sth->execute();
You must
Create the view only once, and use it everywhere
Use placeholders in your SQL statements, and pass the actual parameters with the call to execute
Is this the full extent of your SQL? Probably not, but if so it really is fairly simple.
Take a look at this refactoring for some ideas. Note that is uses a here document to express the SQL. The END_SQL marker for the end of the text must have no whitespace before or after it.
If your requirement is more complex than this then please describe it to us so that we can better help you
my $stmt = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $rv = $stmt->execute($gcsta, $gcsto, $cms);
If you must use a view then you should use placeholders in the CREATE VIEW as before, and make every set of changes into a transaction so that other processes can't interfere. This involves disabling AutoCommit when you create the database handle $dbh and adding a call to $dbh->commit when all the steps are complete
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:Oracle:mydbase', 'user', 'pass',
{ AutoCommit => 0, RaiseError => 1 } );
my $make_view = $dbh->prepare(<<'END_SQL');
CREATE OR REPLACE VIEW viewss AS
SELECT id, gsta, gsto, gwta
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $get_count = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM viewss
WHERE id = 1
END_SQL
while (<>) {
my ($gcsta, $gcsto, $cms) = split;
my $rv = $make_view->execute($gcsta, $gcsto, $cms);
$rv = $get_count->execute;
my ($count) = $get_count->fetchrow_array;
$dbh->commit;
}
Is the view going to be the same or different?
If the views are all the same then create it only once, or check if it exists with the all_views table : http://docs.oracle.com/cd/B12037_01/server.101/b10755/statviews_1202.htm#i1593583
You can easily create a view including your pid with the $$ variable to be the pid, but it wont be unique across computers, oracle has also some unique ids, see http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions150.htm, for example, the SESSIONID.
But do you really need to do this? why dont you prepare a statement and then execute it? http://search.cpan.org/dist/DBI/DBI.pm#prepare
thanks,
mike
I have this site with the following parameters:
http://www.example.com.com/pagination.php?page=4&order=comment_time&sc=desc
I use the values of each of the parameters as a value in a SQL query.
I am trying to test my application and ultimately hack my own application for learning purposes.
I'm trying to inject this statement:
http://www.example.com.com/pagination.php?page=4&order=comment_time&sc=desc' or 1=1 --
But It fails, and MySQL says this:
Warning: mysql_fetch_assoc() expects parameter 1 to be resource,
boolean given in /home/dir/public_html/pagination.php on line 132
Is my application completely free from SQL injection, or is it still possible?
EDIT: Is it possible for me to find a valid sql injection statement to input into one of the parameters of the URL?
The application secured from sql injection never produces invalid queries.
So obviously you still have some issues.
Well-written application for any input produces valid and expected output.
That's completely vulnerable, and the fact that you can cause a syntax error proves it.
There is no function to escape column names or order by directions. Those functions do not exist because it is bad style to expose the DB logic directly in the URL, because it makes the URLs dependent on changes to your database logic.
I'd suggest something like an array mapping the "order" parameter values to column names:
$order_cols = array(
'time' => 'comment_time',
'popular' => 'comment_score',
... and so on ...
);
if (!isset($order_cols[$_GET['order'])) {
$_GET['order'] = 'time';
}
$order = $order_cols[$_GET['order']];
Restrict "sc" manually:
if ($_GET['sc'] == 'asc' || $_GET['sc'] == 'desc') {
$order .= ' ' . $_GET['sc'];
} else {
$order .= ' desc';
}
Then you're guaranteed safe to append that to the query, and the URL is not tied to the DB implementation.
I'm not 100% certain, but I'd say it still seems vulnerable to me -- the fact that it's accepting the single-quote (') as a delimiter and then generating an error off the subsequent injected code says to me that it's passing things it shouldn't on to MySQL.
Any data that could possibly be taken from somewhere other than your application itself should go through mysql_real_escape_string() first. This way the whole ' or 1=1 part gets passed as a value to MySQL... unless you're passing "sc" straight through for the sort order, such as
$sql = "SELECT * FROM foo WHERE page='{$_REQUEST['page']}' ORDER BY data {$_REQUEST['sc']}";
... which you also shouldn't be doing. Try something along these lines:
$page = mysql_real_escape_string($_REQUEST['page']);
if ($_REQUEST['sc'] == "desc")
$sortorder = "DESC";
else
$sortorder = "ASC";
$sql = "SELECT * FROM foo WHERE page='{$page}' ORDER BY data {$sortorder}";
I still couldn't say it's TOTALLY injection-proof, but it's definitely more robust.
I am assuming that your generated query does something like
select <some number of fields>
from <some table>
where sc=desc
order by comment_time
Now, if I were to attack the order by statement instead of the WHERE, I might be able to get some results... Imagine I added the following
comment_time; select top 5 * from sysobjects
the query being returned to your front end would be the top 5 rows from sysobjects, rather than the query you try to generated (depending a lot on the front end)...
It really depends on how PHP validates those arguments. If MySQL is giving you a warning, it means that a hacker already passes through your first line of defence, which is your PHP script.
Use if(!preg_match('/^regex_pattern$/', $your_input)) to filter all your inputs before passing them to MySQL.
Setting the DBIC_TRACE environment variable to true:
BEGIN { $ENV{DBIC_TRACE} = 1 }
generates very helpful output, especially showing the SQL query that is being executed, but the SQL query is all on one line.
Is there a way to push it through some kinda "sql tidy" routine to format it better, perhaps breaking it up over multiple lines? Failing that, could anyone give me a nudge into where in the code I'd need to hack to add such a hook? And what the best tool is to accept a badly formatted SQL query and push out a nicely formatted one?
"nice formatting" in this context simply means better than "all on one line". I'm not particularly fussed about specific styles of formatting queries
Thanks!
As of DBIx::Class 0.08124 it's built in.
Just set $ENV{DBIC_TRACE_PROFILE} to console or console_monochrome.
From the documentation of DBIx::Class::Storage
If DBIC_TRACE is set then trace information is produced (as when the
debug method is set). ...
debug Causes trace information to be emitted on the debugobj
object. (or STDERR if debugobj has not specifically been set).
debugobj Sets or retrieves the object used for metric collection.
Defaults to an instance of DBIx::Class::Storage::Statistics that is
compatible with the original method of using a coderef as a callback.
See the aforementioned Statistics class for more information.
In other words, you should set debugobj in that class to an object that subclasses DBIx::Class::Storage::Statistics. In your subclass, you can reformat the query the way you want it to be.
First, thanks for the pointers! Partial answer follows ....
What I've got so far ... first some scaffolding:
# Connect to our db through DBIx::Class
my $schema = My::Schema->connect('dbi:SQLite:/home/me/accounts.db');
# See also BEGIN { $ENV{DBIC_TRACE} = 1 }
$schema->storage->debug(1);
# Create an instance of our subclassed (see below)
# DBIx::Class::Storage::Statistics class
my $stats = My::DBIx::Class::Storage::Statistics->new();
# Set the debugobj object on our schema's storage
$schema->storage->debugobj($stats);
And the definition of My::DBIx::Class::Storage::Statistics being:
package My::DBIx::Class::Storage::Statistics;
use base qw<DBIx::Class::Storage::Statistics>;
use Data::Dumper qw<Dumper>;
use SQL::Statement;
use SQL::Parser;
sub query_start {
my ($self, $sql_query, #params) = #_;
print "The original sql query is\n$sql_query\n\n";
my $parser = SQL::Parser->new();
my $stmt = SQL::Statement->new($sql_query, $parser);
#printf "%s\n", $stmt->command;
print "The parameters for this query are:";
print Dumper \#params;
}
Which solves the problem about how to hook in to get the SQL query for me to "pretty-ify".
Then I run a query:
my $rs = $schema->resultset('SomeTable')->search(
{
'email' => $email,
'others.some_col' => 1,
},
{ join => 'others' }
);
$rs->count;
However SQL::Parser barfs on the SQL generated by DBIx::Class:
The original sql query is
SELECT COUNT( * ) FROM some_table me LEFT JOIN others other_table ON ( others.some_col_id = me.id ) WHERE ( others.some_col_id = ? AND email = ? )
SQL ERROR: Bad table or column name '(others' has chars not alphanumeric or underscore!
SQL ERROR: No equijoin condition in WHERE or ON clause
So ... is there a better parser than SQL::Parser for the job?