Bad performance when filtering Azure logs - WCF Data Services filters - wcf

Azure Diagnostics is pushing Windows Events into a storage table "WADWindowsEventLogsTable".
I would like to query this storage Table using VisualStudio(2015) and CloudExplorer.
As this table has an huge content, I'm indefinitely waiting for the results..
Here is a query sample:
EventId eq 4096 and Timestamp gt datetime'2016-06-24T08:20:00' and Timestamp lt datetime'2016-06-24T10:00:00'
I suppose that this query is correct ?
Does exist a way to improve performance ?
filter result columns ?
return only TOP X results ?
another usefull tips ?
I know that a better way would be to script that; for example using Python, but I would like to use the UI as much as possible..
(Edit) following Gaurav Mantri answer I used this little C# program to build my query. The answer is so quick and that solve my initial performance issue:
static void Main(string[] args)
{
string startDate = "24 June 2016 8:20:00 AM";
string endDate = "24 June 2016 10:00:00 AM";
string startPKey = convertDateToPKey(startDate);
string endPKey = convertDateToPKey(endDate);
Debug.WriteLine("(PartitionKey gt '" + startPKey + "'"
+ " and PartitionKey le '" + endPKey +"')"
+ " and (EventId eq 4096)"
);
}
private static string convertDateToPKey(string myDate)
{
System.DateTime dt = System.Convert.ToDateTime(myDate);
long dt2ticks = dt.Ticks;
string ticks = System.Convert.ToString(dt2ticks);
return "0" + ticks;
}
NB: for those, like me, who are searching so far away how to export results to a CSV file, you should know that this icon is your answer (and it's not a 'undo' ;) ):

In your query, you're filtering on Timestamp attribute which is not indexed (Only PartitionKey and RowKey attributes are indexed). Thus your query is making a full table scan (i.e. going from the 1st record till the time it finds a matching record) and hence not optimized.
In order to avoid full table scan, please use PartitionKey in your query. In case of WADWindowsEventLogsTable, the PartitionKey essentially represents the date/time value in ticks. What you would need to do is convert the date/time range for which you want to get the data into ticks, prepend a 0 in front of it and then use it in the query.
So your query would be something like:
(PartitionKey gt 'from date/time value in ticks prepended with 0' and PartitionKey le 'to date/time value in ticks prepended with 0') and (EventId eq 4096)
I wrote a blog post about it some time ago that you may find useful: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/

Related

How to compare string data to time

I have data type in string and the time is like 06:00A, 09:00P, etc. I would like to query data from 6am to 12pm, how do I convert the string data to time format and query it in linq to sql?
Use DateTime.ParseExact or DateTime.TryParseExact to convert the string to a date. If you can't guarantee that the string version of your time is always going to be correct, stick to the TryParseExact version.
Once you have it converted to date, query as normal.
Example at: https://dotnetfiddle.net/MDnERt
Edited after response:
If you are using the code as written against EntityFramework then no, this will not work. (Please also note that there is a big difference between Linq To SQL and Entity Framework, but the same concepts apply, to some degree)
ORMs that support LINQ are actually converting your where clauses into an Expression which is then translated by the ORM into SQL. You will get a NotSupported exception, or something similar.
Is there some reason why the table in question is using that time format? Why would you not just use a datetime in the table? There is also the option of using the time datatype in sql server (assuming you are targetting sql server) which is mapped to the TimeSpan type in .net.
You would define your table in Sql server like:
create table log ( data varchar(20), logtime time )
and the LINQ expression would look something like:
from x in Logs
where x.Logtime >= new TimeSpan(6,0,0) && x.Logtime <= new TimeSpan(12,0,0)
select x
Now we are getting into actual design questions, though, which is off topic. :)
I'd suggest writing own parser and represent times as TimeSpan:
TimeSpan? ToTimeSpan(string str)
{
// get A or P at the end
var amPm = str.Last();
int hrs, mins;
try
{
hrs = int.Parse(str.Substring(0, 2));
mins = int.Parse(str.Substring(3, 2));
}
catch
{
return null;
}
switch (amPm)
{
case 'P': hrs += 12; break;
case 'A': break;
default: return null;
}
return new TimeSpan(hrs, mins, 0);
}

Tuples are not inserted sequentially in database table?

I am trying to insert 10 values of the format "typename_" + i where i is the counter of the loop in a table named roomtype with attributes typename (primary key of SQL type character varying (45)) and samplephoto (it can be NULL and I am not dealing with this for now). What seems strange to me is that the tuples are inserted in different order than the loop counter increments. That is:
typename_1
typename_10
typename_2
typename_3
...
I suppose it's not very important but I can't understand why this is happening. I am using PostgreSQL 9.3.4, pgAdmin III version 1.18.1 and Eclipse Kepler.
The Java code that creates the connection (using JDBC driver) and makes the query is:
import java.sql.*;
import java.util.Random;
public class DBC{
Connection _conn;
public DBC() throws Exception{
try{
Class.forName("org.postgresql.Driver");
}catch(java.lang.ClassNotFoundException e){
java.lang.System.err.print("ClassNotFoundException: Postgres Server JDBC");
java.lang.System.err.println(e.getMessage());
throw new Exception("No JDBC Driver found in Server");
}
try{
_conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/hotelreservation","user", "0000");
ZipfGenerator p = new ZipfGenerator(new Random(System.currentTimeMillis()));
_conn.setCatalog("jdbcTest");
Statement statement = _conn.createStatement();
String query;
for(int i = 1; i <= 10; i++){
String roomtype_typename = "typename_" + i;
query = "INSERT INTO roomtype VALUES ('" + roomtype_typename + "','" + "NULL" +"')";
System.out.println(i);
statement.execute(query);
}
}catch(SQLException E){
java.lang.System.out.println("SQLException: " + E.getMessage());
java.lang.System.out.println("SQLState: " + E.getSQLState());
java.lang.System.out.println("VendorError: " + E.getErrorCode());
throw E;
}
}
}
But what I get in pgAdmin table is:
This is a misunderstanding. There is no "natural" order in a relational database table. While rows are normally inserted in sequence to the physical file holding a table, a wide range of activities can reshuffle physical order. And queries doing anything more than a basic (non-parallelized) sequential scan may return rows in any opportune order. That's according to standard SQL.
The order you see is arbitrary unless you add ORDER BY to the query.
pgAdmin3 by default orders rows by the primary key (unless specified otherwise). Your column is of type varchar and rows are ordered alphabetically (according to your current locale). All by design, all as it should be.
To sort rows like you seem to be expecting, you could pad some '0' in your text:
...
typename_0009
typename_0010
...
The proper solution would be to have a numeric column with just the number, though.
You may be interested in natural-sort. You may also be interested in a serial column.
i guess, that the output is ordered via alphabet ... if you create typename_1 thru typename_9, everything should be ok. you can also use typename_01 ( filled up with zeros ) to get the correct order.
if you are unsure about that, you can also add a sleep between the insert statements and record the insert-time in the database( as a column )
You are not seeing the order in which PostgreSQL stores the data, but rather the order in which pgadmin displays it.
The edit table feature of pgadmin automatically sorts the data by the primary key by default. that is what you are seeing.
In general, databases store table data in whatever order is convenient. Since you did not intentionally supply an ORDER BY you have no right to care what order it is actually in.

LINQ to SQL selecting records and converting dates

I'm trying to select records from a table based on a date using Linq to SQL. Unfortunately the date is split across two tables - the Hours table has the day and the related JobTime table has the month and year in two columns.
I have the following query:
Dim qry = From h As Hour In ctx.Hours Where Convert.ToDateTime(h.day & "/" & h.JobTime.month & "/" & h.JobTime.year & " 00:00:00") > Convert.ToDateTime("01/01/2012 00:00:00")
This gives me the error "Arithmetic overflow error converting expression to data type datetime."
Looking at the SQL query in SQL server profiler, I see:
exec sp_executesql N'SELECT [t0].[JobTimeID], [t0].[day], [t0].[hours]
FROM [dbo].[tbl_pm_hours] AS [t0]
INNER JOIN [dbo].[tbl_pm_jobtimes] AS [t1] ON [t1].[JobTimeID] = [t0].[JobTimeID]
WHERE (CONVERT(DateTime,(((((CONVERT(NVarChar,[t0].[day])) + #p0) + (CONVERT(NVarChar,COALESCE([t1].[month],NULL)))) + #p1) + (CONVERT(NVarChar,COALESCE([t1].[year],NULL)))) + #p2)) > #p3',N'#p0 nvarchar(4000),#p1 nvarchar(4000),#p2 nvarchar(4000),#p3 datetime',#p0=N'/',#p1=N'/',#p2=N' 00:00:00',#p3='2012-01-31 00:00:00'
I can see that it's not passing in the date to search for correctly but I'm not sure how to correct it.
Can anyone please help?
Thanks,
Emma
The direct cause of the error may have to do with this issue.
As said there, the conversions you use are a very inefficient way to build a query. On top of that, it is inefficient because the expressions are not sargable. I.e. you are using a computed value from database columns in a comparison which disables the query analyzer to use indexes to jump to individual column values. So, you could try to fix the error by doctoring the direct cause, but I think it's better to rewrite the query in a way that only the single column values are used in comparions.
I've worked this out in C#:
var cfg = new DateTime(12,6,12);
int year = 12, month = 6, day = 13; // Try some more values here.
// Date from components > datetime value?
bool gt = (
year > cfg.Year || (
(year == cfg.Year && month > cfg.Month) || (
year == cfg.Year && month == cfg.Month && day > cfg.Day)
)
);
You see that it's not as straightforward as it may look at first, but it works. There are much more comparisons to work out, but I'm sure that the ability to use indexes will easily outweigh this.
A more straightforward, but not sargable, way is to use sortable dates, like 20120101 and compare those (as integers).

NHibernate Like with integer

I have a NHibernate search function where I receive integers and want to return results where at least the beginning coincides with the integers, e.g.
received integer: 729
returns: 729445, 7291 etc.
The database column is of type int, as is the property "Id" of Foo.
But
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.InsensitiveLike("Id", id.ToString() + "%"));
return criteria.List<Foo>();
does result in an error (Could not convert parameter string to int32). Is there something wrong in the code, a work around, or other solution?
How about this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(Expression.Like(Projections.Cast(NHibernateUtil.String, Projections.Property("Id")), id.ToString(), MatchMode.Anywhere));
return criteria.List<Foo>();
Have you tried something like this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.Like(Projections.SqlFunction("to_char", NHibernate.NHibernateUtil.String, Projections.Property("Id")), id.ToString() + "%"));
return criteria.List<Foo>();
The idea is convert the column before using a to_char function. Some databases do this automatically.
AFAIK, you'll need to store your integer as a string in the database if you want to use the built in NHibernate functionality for this (I would recommend this approach even without NHibernate - the minute you start doing 'like' searches you are dealing with a string, not a number - think US Zip Codes, etc...).
You could also do it mathematically in a database-specific function (or convert to a string as described in Thiago Azevedo's answer), but I imagine these options would be significantly slower, and also have potential to tie you to a specific database.

Lucene Field Grouping

say i m having fields stud_roll_number and date_leave.
select stud_roll_number,count(*) from some_table where date_leave > some_date group by stud_roll_number;
how to write the same query using Lucene....I tried after querying date_leave > some_date
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = search.doc(scoreDoc.doc);
String value = doc.get(fieldName);
Integer key = mapGrouper.get(value);
if (key == null) {
key = 1;
} else {
key = key+1;
}
mapGrouper.put(value, key);
}
But, I m having huge data set, it takes much time to compute this. Is there any other way to find it???? Thanks in advance...
Your performance bottleneck is almost certainly the I/O it takes to perform the document and field value lookups. What you want to do in this situation is use a FieldCache for the field you want to group by. Once you have a field cache, you can look up the values by Lucene doc ID, which will be fast because all the values are in memory.
Also remember to give your HashMap an initial capacity to avoid array resizing.
There is a very new grouping module, on https://issues.apache.org/jira/browse/LUCENE-1421 as a patch, that will do this.