Loop or DB query, Which one causes more time? - sql

I have a table which has 8 column, but for a certain functionality I need only 4 column values out of the table. So I am currently selecting only the 4 column and mapping it to my entity and returning it. When I am mapping I believe there is a loop running on the retrieved value. So what is the best practice ? should I just retrieve all 8 values from DB or get only what I need and map it to entity and return it?
var users= (from u in context.Users
join r in base.DatabaseContext.Users
where s.userId== id
select new {
Id=u.Id,
FirstName=u.FirstName,
LastName =u.LastName,
IsActive=u.IsActive,
}).GroupBy(x => x.Id).Select(y => y.FirstOrDefault()).AsEnumerable()
.Select(x => new WebCore.Model.User // Here i believe there will be an iteration on the retrieved result
{
Id = x.Id,
FirstName = x.FirstName,
LastName = x.LastName,
IsActive = x.IsActive,
}).AsQueryable();
return users;
My understanding is that IO operation causes the main performance issue. but at this scenario I am a bit confused. If possible please provide an detail explanation for me to understand the issue clearly.

As a rule of thumb, you should only query the fields you need in your application. Querying more fields than needed will result in an overhead reading these fields from these database and transporting the data over the network. So in your case you should just query 4 columns, because that is what you need for now.
Regarding your concerns about your LINQ query, these queries are generally relatively efficient and I strongly advise against premature optimisation until it turns out to be a bottleneck. See Donald Knuth's opinion about this topic in my comment.

Related

Query fast without search, slow with search, but with search fast in SSMS

I have this function that takes data from the database and also has search. The problem is that when I search with Entity framework it's slow, but if I use the same query I got from the log and use it in SSMS it's fast. I must also say that there are allot of movies, 388262. I also tried adding an index on title at movie, but didn't help.
Query I use in SSMS:
SELECT *
FROM Movie
WHERE title LIKE '%pirate%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
Entity code (_movieRepository.GetAll() returns Queryable not all movies):
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetAll().AsNoTracking();
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.ToLower().Contains(vm.Search.ToLower()));
}
vm.TotalItemCount = query.Count();
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize);
vm.PageSize = _pageSize;
return View(vm);
}
Caveat: I don't have much experience with the Entity framework.
However, you might find useful debugging tips available in the Entity Framework Performance Article from Simple talk. Looking at what you've posted you might be able to improve your query performance by:
Choosing only the specific column you're interested in (it sounds like you're only interested in querying for the 'Title' column).
Pay special attention to your data-types. You might want to convert your NVARCHAR variables to VARCHAR(40) (or some appropriate character limit)
try removing all of the ToLower() stuff,
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search)));
}
sql server (unlike c#) is not case sensitive by default (though you can configure it to be that way). Your query is forcing sql server to lower case every record in the table and then do the comparison.

How to implement deduplication in a billion rows table ssis

which is the best option to implement distinct operation in ssis?
I have a table with more than 200 columns and contain more than 10 million rows.
I need to get the ditinct rows from this table.Is it wise to use a execute sql task (with select query to deduplicate the rows) or is there any other way to achieve this in ssis
I do understood that the ssis sort component deduplicate the rows..but this is a blocking component it is not at all a good idea to use ...Please let me know your views on this
I had done it in 3 steps this way:
Dump the MillionRow table into HashDump table, which has only 2 columns: Id int identity PK, and Hash varbinary(20). This table shall be indexed on its Hash column.
Dump the HashDump table into HashUni ordered by Hash column. In between would be a Script Component that check whether the current row's Hash column value is same as the previous row. If same, direct row to Duplicate output, else Unique. This way you can log the Duplicate even if what you need is just the Unique.
Dump the MillionRow table into MillionUni table. In between would be a Lookup Component that uses HashUni to tell which row is Unique.
This method allows me to log each duplicates with a message such as: "Row 1000 is a duplicate of row 100".
I have not found a better way than this. Earlier, I made a unique index on MillionUni, to dump directly the MillionRow into it, but I was not able to use "fast load", which was way too slow.
Here is one way to populate the Hash column:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
StringBuilder sb = new StringBuilder();
sb.Append(Row.Col1String_IsNull ? "" : Row.Col1String); sb.Append("|");
sb.Append(Row.Col2Num_IsNull ? "" : Row.Col2Num.ToString()); sb.Append("|");
sb.Append(Row.Col3Date_IsNull ? "" : Row.Col3Date.ToString("yyyy-MM-dd"));
var sha1Provider = HashAlgorithm.Create("SHA1");
Row.Hash = sha1Provider.ComputeHash(Encoding.UTF8.GetBytes(sb.ToString()));
}
If 200 columns prove to be a chore for you, part of this article shall inspire you. It is making a loop for the values of all column objects into a single string.
And to compare the Hash, use this method:
byte[] previousHash;
int previousRowNo;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (StructuralComparisons.StructuralEqualityComparer.Equals(Row.Hash, previousHash))
{
Row.DupRowNo = previousRowNo;
Row.DirectRowToDuplicate();
}
else
{
Row.DirectRowToUnique();
}
previousHash = Row.Hash;
previousRowNo = Row.RowNo;
}
I won't bother SSIS for it, a couple of queries will do; also you have a lot of data, so i suggest you check the execution plan before running the queries, and optimize your indexes
http://www.brijrajsingh.com/2011/03/delete-duplicate-record-but-keep.html
Check out a small article i wrote on the same topic
As far as I know, the Sort Component is the only transformation which allows you to distinct the duplcities. Or you could use SQL-like command.
If the sorting operation is problem, then you should use (assuming your source is DB) "SQL Command" in Data Access Mode specification. Select distinct your data and that's it .. you may also save a bit time as the ETL wont have to go through the Sort Component.

NHibernate Query parent and child objects eagerly without join

I have simple domain with Order and OrderLines. Is it possible to load the Order and associated OrderLine objects without a join? I'm trying to use the Future/FutureValue to perform two simple queries. I'm hoping that NHibernate knows how to combine these in the cache. I'm using NHibernate 3.2 with code only mapping.
So far here is what I have:
// Get all the order lines for the order
var lineQuery = session.QueryOver<OrderLine>()
.Where(x => x.WebOrder.Id == id).Future<OrderLine>();
// Get the order
var orderQuery = session.QueryOver<WebOrder>()
.Where(x => x.Id == id)
.FutureValue<WebOrder>();
var order = orderQuery.Value;
This works as expected sending two queries into the database. However when I use loop to go through order.OrderLines NHibernate send another query to get the order lines. My guess is that because I'm using constraints (Where(x => ...) NHibernate doesn't know how to get the objects from the session cache.
Why do I want to do this without join
I know I can use Fetch(x => x.OrderLines).Eager but sometimes the actual parent (in this case Order) is so large that I don't want to perform the join. After all the result set contains all the order columns for each orderline if I perform the join. I don't have any raw numbers or anything I'm just wondering if this is possible.
it's quite possilbe. see nHib's fetching strategies.
you can choose either 'select' (if you're only dealing with one Order at a time) or 'subselect'.

nHibernate QueryOver Projects - are Select and Where the same thing?

Using nHibernate QueryOver, if I want to enforce a projection for performance, are "Select" and "Where" the same thing? In other words, will ..
var member = session.QueryOver<Member>()
.Select( projections => projections.Email == model.Email )
.Take(1).SingleOrDefault();
Run the same as
var member = session.QueryOver<Member>()
.Where( context => context.Email == model.Email )
.Take(1).SingleOrDefault();
Or is there a difference in the two?
Select projects (you could also say maps); Where filters. This is the same as SQL and all LINQ providers (and QueryOver is also sort of a LINQ provider). It seems that in this case you want to filter, not project, so you need Where
No offense intended, but I think the best way to answer a question like the one you asked is to try it. Sometimes things become clearer when you can see the output.
That said, when you use Select, you're telling NHibernate how to project your data. This determines the final makeup of the data resulting from the query. There's a little bit more to this, but that's the general idea. You use Where when you want to specify the criteria that the data that you are querying should satisfy.

CakePHP query additions in controller

I am migrating raw PHP code to CakePHP and have some problems. As I have big problems with query to ORM transformation I temporary use raw SQL. All is going nice, but I met the ugly code and don't really know how to make it beautiful. I made DealersController and added function advanced($condition = null) (it will be called from AJAX with parameters 1-15 and 69). function looks like:
switch ($condition) {
case '1':
$cond_query = ' AND ( (d.email = \'\' OR d.email IS NULL) )';
break;
case '2':
$cond_query = ' AND (d.id IN (SELECT dealer_id FROM dealer_logo)';
break;
// There are many cases, some long, some like these two
}
if($user_group == 'group_1') {
$query = 'LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} elseif ($user_group == 'group_2'){
$query = 'A LITLE BIT DIFFERENT LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} else {
$query = 'A LITLE MORE BIT DIFFERENT LONG QUERY WITH 10+ TABLES JOINING' . $cond_query;
}
// THERE IS $this->Dealer->query($query); and so on
So.. As you see code looks ugly. I have two variants:
1) get out query addition and make model methods for every condition, then these conditions seperate to functions. But this is not DRY, because main 3 big queries is almost the same and if I will need to change something in one - I will need to change 16+ queries.
2) Make small reusable model methods/queries whitch will get out of DB small pieces of data, then don't use raw SQL but play with methods. It would be good, but the performance will be low and I need it as high as possible.
Please give me advice. Thank you!
If you're concerned about how CakePHP makes a database query for every joined table, you might find that the Linkable behaviour can help you reduce the number of queries (where the joins are simple associations on the one table).
Otherwise, I find that creating simple database querying methods at the Model level to get your smaller pieces of information, and then combining them afterwards, is a good approach. It allows you to clearly outline what your code does (through inline documentation). If you can migrate to using CakePHP's find methods instead of raw queries, you will be using the conditions array syntax. So one way you could approach your problem is to have public functions on your Model classes which append their appropriate conditions to an inputted conditions array. For example:
class SomeModel extends AppModel {
...
public function addEmailCondition(&$conditions) {
$conditions['OR'] = array(
'alias.email_address' => null,
'alias.email_address =' => ''
);
}
}
You would call these functions to build up one large conditions array which you can then use to retrieve the data you want from your controller (or from the model if you want to contain it all at the model layer). Note that in the above example, the conditions array is being passed by reference, so it can be edited in place. Also note that any existing 'OR' conditions in the array will be overwritten by this function: your real solution would have to be smarter in terms of merging your new conditions with any existing ones.
Don't worry about 'hypothetical' performance issues - if you've tried to queries and they're too slow, then you can worry about how to increase performance. But for starters, try to write the code as cleanly as possible.
You also might want to consider splitting up that function advanced() call into multiple Controller Actions that are grouped by the similarity of their condition query.
Finally, in case you haven't already checked it out, here's the Book's entry on retrieving data from models. There might be some tricks you hadn't seen before: http://book.cakephp.org/view/1017/Retrieving-Your-Data
If the base part of the query is the same, you could have a function to generate that part of the query, and then use other small functions to append the different where conditions, etc.