Data transformation in logstash - sql

I have a data source, a read-only SQL server, and by combining tables from this server, I am making a log, that I need to upload it Elastic Search.
To do this, I have an API call to the data source, and then have the data transformation happen in Logstash, and then upload it to ES.
I have done the data transformation several times before, in SQL. In SQL I would JOIN several tables and INSERT the query results into a log table, but I don't have SQL option in this setup, I need to do the transformation in logstash.
What I am asking for is best-practice suggestions for logstash.

input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.38-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost/student?user=root&password="
jdbc_user => "Croos"
parameters => {
}
schedule => "* * * * *"
statement => "SELECT * from subject WHERE id > :sql_last_value"
use_column_value => true
tracking_column => id
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
This link may help full to you.

Related

KnexJS raw query in migration

I have a problem with the following migration in KnexJS, working with PostgreSQL:
exports.up = (knex) => {
knex.raw('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"');
return knex.schema.createTable('car_brands', (table) => {
table.uuid('brandId').unique().notNullable().primary().defaultTo(knex.raw('uuid_generate_v4()'));
table.string('name').notNullable().unique();
table.timestamp('created_at').notNullable().defaultTo(knex.raw('now()'));
table.timestamp('updated_at').notNullable().defaultTo(knex.raw('now()'));
});
};
exports.down = (knex) => {
knex.raw('drop extension if exists "uuid-ossp"');
return knex.schema.dropTable('car_brands');
};
I am using the UUID type for my default values, by using the
defaultTo(knex.raw('uuid_generate_v4()')).
However, when running the above migration, by:
knex migrate:latest --env development --knexfile knexfile.js --debug true
I get an error that:
function uuid_generate_v4() does not exist
Do you know why the knex.raw() query method is not working?
The problem is you are running
knex.raw('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"');
and
knex.schema.createTable('car_brands');
asynchronously, so the first query is not executed before the second one.
Rewrite it using async/await:
exports.up = async (knex) => {
await knex.raw('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"');
return knex.schema.createTable('car_brands', (table) => {
table.uuid('brandId').unique().notNullable().primary().defaultTo(knex.raw('uuid_generate_v4()'));
table.string('name').notNullable().unique();
table.timestamp('created_at').notNullable().defaultTo(knex.raw('now()'));
table.timestamp('updated_at').notNullable().defaultTo(knex.raw('now()'));
});
};
or using Promises:
exports.up = (knex) => {
knex.raw('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"')
.then(() => {
return knex.schema.createTable('car_brands', (table) => {
table.uuid('brandId').unique().notNullable().primary().defaultTo(knex.raw('uuid_generate_v4()'));
table.string('name').notNullable().unique();
table.timestamp('created_at').notNullable().defaultTo(knex.raw('now()'));
table.timestamp('updated_at').notNullable().defaultTo(knex.raw('now()'));
});
})
};
These are the steps I took to resolve this issue in my app with PostgreSQL, Objection and KNEX.
Go to your database to verify your extension is available.
postgres=# SELECT * FROM pg_extension;
Verify "uuid-ossp" is installed in the database_name you need.
database_name=# CREATE EXTENSION "uuid-ossp"
Back to your app, go to the KNEX migration file where you are altering your table.
t.uuid('user_id').defaultTo(knex.raw('uuid_generate_v4()'));
Use the KNEX command to get the Batch running:
knex migrate:latest
Insert a new raw in your table and verify your UUID has been auto-generated.
I hope these steps can be helpful.

LINQ: Split Where OR conditions

So I have the following where conditions
sessions = sessions.Where(y => y.session.SESSION_DIVISION.Any(x => x.DIVISION.ToUpper().Contains(SearchContent)) ||
y.session.ROOM.ToUpper().Contains(SearchContent) ||
y.session.COURSE.ToUpper().Contains(SearchContent));
I want to split this into multiple lines based on whether a string is empty for example:
if (!String.IsNullOrEmpty(Division)) {
sessions = sessions.Where(y => y.session.SESSION_DIVISION.Any(x => x.DIVISION.ToUpper().Contains(SearchContent)));
}
if (!String.IsNullOrEmpty(Room)) {
// this shoudl be OR
sessions = sessions.Where(y => y.session.ROOM.ToUpper().Contains(SearchContent));
}
if (!String.IsNullOrEmpty(course)) {
// this shoudl be OR
sessions = sessions.Where(y => y.session.COURSE.ToUpper().Contains(SearchContent));
}
If you notice I want to add multiple OR conditions split based on whether the Room, course, and Division strings are empty or not.
There are a few ways to go about this:
Apply the "where" to the original query each time, and then Union() the resulting queries.
var queries = new List<IQueryable<Session>>();
if (!String.IsNullOrEmpty(Division)) {
queries.Add(sessions.Where(y => y.session.SESSION_DIVISION.Any(x => x.DIVISION.ToUpper().Contains(SearchContent))));
}
if (!String.IsNullOrEmpty(Room)) {
// this shoudl be OR
queries.Add(sessions.Where(y => y.session.ROOM.ToUpper().Contains(SearchContent)));
}
if (!String.IsNullOrEmpty(course)) {
// this shoudl be OR
queries.Add(sessions.Where(y => y.session.COURSE.ToUpper().Contains(SearchContent)));
}
sessions = queries.Aggregate(sessions.Where(y => false), (q1, q2) => q1.Union(q2));
Do Expression manipulation to merge the bodies of your lambda expressions together, joined by OrElse expressions. (Complicated unless you've already got libraries to help you: after joining the bodies, you also have to traverse the expression tree to replace the parameter expressions. It can get sticky. See this post for details.
Use a tool like PredicateBuilder to do #2 for you.
.Where() assumes logical AND and as far as I know, there's no out of box solution to do it. If you want to separate OR statements, you may want to look into using Predicate Builder or Dynamic Linq.
You can create an extension method to conditionally apply the filter:
public static IQueryable<T> WhereIf<T>(
this IQueryable<T> source, bool condition,
Expression<Func<T, bool>> predicate)
{
return condition ? source.Where(predicate) : source;
}
And use it like this:
using static System.String;
...
var res = sessions
.WhereIf(!IsNullOrEmpty(Division), y => y.session.SESSION_DIVISION.ToUpper().Contains(SearchContent))
.WhereIf(!IsNullOrEmpty(Room), y => y.session.ROOM.ToUpper().Contains(SearchContent))
.WhereIf(!IsNullOrEmpty(course), y => y.session.COURSE.ToUpper().Contains(SearchContent)));

cakephp see the compiled SQL Query before execution

My query gets the timeout error on each run. Its a pagination with joins.
I want to debug the SQL, but since I get a timeout, I can't see it.
How can I see the compiled SQL Query before execution?
Some cake code:
$this -> paginate = array(
'limit' => '16',
'joins' => array( array(
'table' => 'products',
'alias' => 'Product',
'type' => 'LEFT',
'conditions' => array('ProductModel.id = Product.product_model_id')
)),
'fields' => array(
'COUNT(Product.product_model_id) as Counter',
'ProductModel.name'
),
'conditions' => array(
'ProductModel.category_id' => $category_id,
),
'group' => array('ProductModel.id')
);
First off, set the debug variable to 2 in app/config/config.php.
Then add:
<?php echo $this->element('sql_dump');?>
at the end of your layout. This should actually be commented out in your default cake layout.
You will now be able see all SQL queries that go to the database.
Now copy the query and use the SQL EXPLAIN command (link is for MySQL) over the database to see what the query does in the DBMS. For more on CakePHP debugging check here.
Since your script doesn't even render you can try to get the latest log directly from the datasource with:
function getLastQuery()
{
$dbo = $this->getDatasource();
$logs = $dbo->getLog();
$lastLog = end($logs['log']);
return $lastLog['query'];
}
This needs to be in a model since the getDatasource() function is defined in a model.
Inspect the whole $logs variable and see what's in there.
One more thing you can do is ....
Go to Cake/Model/DataSource/DboSource.php and locate function execute() and print $sql variable.
That should print the sql.
This certainly is not be the cleanest way (as you are changing Cake directory) .. but certainly would be quickest just to debug if something is not working with sql.
Try...
function getLastQuery($model) {
$dbo = $model->getDatasource();
$logData = $dbo->getLog();
$getLog = end($logData['log']);
echo $getLog['query'];
}
Simple way to show all executed query of your given model:
$sqllog = $this->ModelName->getDataSource()->getLog(false, false);
debug($sqllog);
class YourController extends AppController {
function testfunc(){
$this->Model->find('all', $options);
echo 'SQL: '.$this->getLastQuery();
}
function getLastQuery()
{
$dbo = ConnectionManager::getDataSource('default');
$logs = $dbo->getLog();
$lastLog = end($logs['log']);
return $lastLog['query'];
}
}
or you can get all the query by adding following line in to the function execute() in lib/Cake/Model/DataSource.php
Debugger::dump($sql);
set the debug variable to 2 in app/config/config.php.
echo $this->Payment->save();
Out put like =>SQL Query: INSERT INTO photoora_photoorange.payments VALUES (*******)
[insert query][2]
set the debug variable to 2 in app/config/config.php.
And

NHibernate/LINQ - Aggregate query on subcollection

Querying child collections has been a recurring issue in our applications where we use NHibernate (via LINQ). I want to figure out how to do it right. I just tried forever to get this query to work efficiently using LINQ, and gave up. Can someone help me understand the best way to do something like this?
Model: ServiceProvider
HasMany->ServicesProvided
The gotcha here is that the HasMany is mapped as a component, so I can't directly query the ServicesProvided. For posterity's sake, here's the mapping:
public ServiceProviderMap()
{
DiscriminatorValue(ProfileType.SERVICE_PROVIDER.ID);
HasMany(p => p.ServicesProvided)
.Table("ServiceProvider_ServicesProvided")
.KeyColumn("ProfileID")
.Component(spMapping =>
{
spMapping.Map(service => service.ID)
.Not.Nullable();
})
.AsBag();
}
The query I am trying to create would return a collection of the count of each service that is provided. IE: Service1 -> 200, Service2 -> 465, etc.
I was able to get the query working using HQL, so here it is. Note that it just returns the ID of the service that is provided:
select service.ID, count(service)
from ServiceProvider as profile
inner join profile.ServicesProvided as service
group by service.ID
I was able to get the query "working" using LINQ, but it performed atrociously. Here's the code I used (warning - it's ugly).
Func<ServiceProvider, IEnumerable<ServicesProvided>> childSelector = sp => sp.ServicesProvided;
var counts = this._sessionManager.GetCurrentSession().Linq<ServiceProvider>()
.Expand("ServicesProvided")
.SelectMany(childSelector, (t, c) => new { t = t, c = c })
.Select(child => child.c)
.GroupBy(sp => sp.ID)
.Select(el => new { serviceID = el.Key, count = el.Count() });
I would love to learn how to do this correctly, please.
Short of going with HQL, the most elegant solution I can think of would be using a Criteria object. The following will give you what you need and with very low overhead:
ICriteria criteria = this._sessionManager.GetCurrentSession().CreateCriteria(typeof(ServiceProvider), "sp");
//set projections for the field and aggregate, making sure to group by the appropriate value
criteria.CreateAlias("sp.ServicesProvided", "s", JoinType.LeftOuterJoin)
.SetProjection(Projections.ProjectionList()
.Add(Projections.Property("s.ID"), "serviceID")
.Add(Projections.Count("sp.ID"), "count")
.Add(Projections.GroupProperty("s.ID")));
IList<object[]> results = criteria.List();
foreach (object[] entry in results)
{
int id = (int)entry[0], qty = (int)entry[1];
//Do stuff with the values
}

How do I make DBIx::Class join tables using other operators than `=`?

Summary
I've got a table of items that go in pairs. I'd like to self-join it so I can retrieve both sides of the pair in a single query. It's valid SQL (I think), the SQLite engine actually does accept it, but I'm having trouble getting DBIx::Class to bite the bullet.
Minimal example
package Schema::Half;
use parent 'DBIx::Class';
__PACKAGE__->load_components('Core');
__PACKAGE__->table('half');
__PACKAGE__->add_columns(
whole_id => { data_type => 'INTEGER' },
half_id => { data_type => 'CHAR' },
data => { data_type => 'TEXT' },
);
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => 'self.half_id',
# previous line results in a '='
# I'd like a '<>'
});
package Schema;
use parent 'DBIx::Class::Schema';
__PACKAGE__->register_class( 'Half', 'Schema::Half' );
package main;
unlink 'join.db';
my $s = Schema->connect('dbi:SQLite:join.db');
$s->deploy;
my $h = $s->resultset('Half');
$h->populate([
[qw/whole_id half_id data /],
[qw/1 L Bonnie/],
[qw/1 R Clyde /],
[qw/2 L Tom /],
[qw/2 R Jerry /],
[qw/3 L Batman/],
[qw/3 R Robin /],
]);
$h->search({ 'me.whole_id' => 42 }, { join => 'dual' })->first;
The last line generates the following SQL:
SELECT me.whole_id, me.half_id, me.data
FROM half me
JOIN half dual ON ( dual.half_id = me.half_id AND dual.whole_id = me.whole_id )
WHERE ( me.whole_id = ? )
I'm trying to use DBIx::Class join syntax to get a <> operator between dual.half_id and me.half_id, but haven't managed to so far.
Things I've tried
The documentation hints towards SQL::Abstract-like syntax.
I tried writing the has_one relationship as such:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => { '<>' => 'self.half_id' },
});
# Invalid rel cond val HASH(0x959cc28)
Straight SQL behind a stringref doesn't make it either:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => \'<> self.half_id',
});
# Invalid rel cond val SCALAR(0x96c10b8)
Workarounds and why they're insufficient to me
I could get the correct SQL to be generated with a complex search() invocation, and no defined relationship. It's quite ugly, with (too) much hardcoded SQL. It has to imitated in a non-factorable way for each specific case where the relationship is traversed.
I could work around the problem by adding an other_half_id column and joining with = on that. It's obviously redundant data.
I even tried to evade said redundancy by adding it through a dedicated view (CREATE VIEW AS SELECT *, opposite_of(side) AS dual FROM half...) Instead of the database schema it's the code that got redundant and ugly, moreso than the search()-based workaround. In the end I wasn't brave enough to get it working.
Wished SQL
Here's the kind of SQL I'm looking for. Please note it's only an example: I really want it done through a relationship so I can use it as a Half ResultSet accessor too in addition to a search()'s join clause.
sqlite> SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id AND l.half_id<>r.half_id
WHERE l.half_id='L';
1|L|Bonnie|1|R|Clyde
2|L|Tom|2|R|Jerry
3|L|Batman|3|R|Robin
Side notes
I really am joining to self in my full expanded case too, but I'm pretty sure it's not the problem. I kept it this way for the reduced case here because it also helps keeping the code size small.
I'm persisting on the join/relationship path instead of a complex search() because I've got multiple uses for the association, and I didn't find any "one size fits all" search expression.
Late update
Answering my own question two years later, it used to be a missing functionality that has since then been implemented.
For those still interested by this, it's finally been implemented as of 0.08192 or earlier. (I'm on 0.08192 currently)
One correct syntax would be:
__PACKAGE__->has_one(dual => 'Schema::Half', sub {
my $args = shift;
my ($foreign,$self) = #$args{qw(foreign_alias self_alias)};
return {
"$foreign.whole_id" => { -ident => "$self.whole_id" },
"$foreign.half_id" => { '<>' => { -ident => "$self.half_id" } },
}
});
Trackback: DBIx::Class Extended Relationships on fREW Schmidt's blog where I got to first read about it.
I think that you could do it by creating a new type of relationship extending DBIx::Class::Relationship::Base but it doesn't seem incredibly well documented. Have you considered the possibility of just adding a convenience method on the resultset set for Half that does a ->search({}, { join => ... } and returns the resultset from that to you? It's not introspectable like a relationship but other than that it works pretty much as well. It uses DBIC's ability to chain queries to your advantage.
JB, notice that instead of:
SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id AND l.half_id<>r.half_id
WHERE l.half_id='L';
You can write the same query using:
SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id
WHERE l.half_id<>r.half_id AND l.half_id='L';
Which will return the same data and is definitely easier to express using DBIx::Class.
Of course, this doesn't answer the question "How do I make DBIx::Class join tables using other operators than =?", but the example you showed doesn't justify such need.
Have you tried:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => {'<>' => 'self.half_id'},
});
I believe the matching criteria in the relationship definition is the same used for searches.
Here is how to do it:
...
field => 1, # =
otherfield => { '>' => 2 }, # >
...
'foreign.half_id' => \'<> self.half_id'