Search Query Not Handling Large Number of Users - sql

We have an asp.net-core website which handles users search as follows:
public async Task<ICollection<UserSearchResult>> SearchForUser(string name, int page)
{
return await db.ApplicationUsers.Where(u => u.Name.Contains(name) && !u.Deleted && u.AppearInSearch)
.OrderByDescending(u => u.Verified)
.Skip(page * recordsInPage)
.Take(recordsInPage)
.Select(u => new UserSearchResult()
{
Name = u.Name,
Verified = u.Verified,
PhotoURL = u.PhotoURL,
UserID = u.Id,
Subdomain = u.Subdomain
}).ToListAsync();
}
The query translates to something similar to the following:
SELECT [t].[Name], [t].[Verified], [t].[PhotoURL], [t].[Id], [t].[Subdomain] FROM (SELECT [u0].* FROM [AspNetUsers] AS [u0] WHERE (((CHARINDEX('khaled', [u0].[Name]) > 0) OR ('khaled' = N'')) AND ([u0].[Deleted] = 0)) AND ([u0].[AppearInSearch] = 1) ORDER BY [u0].[Verified] DESC OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY ) AS [t]
In Client-Side we use typeahead and bloodhound as follows:
engine = new Bloodhound({
identify: function (user) {
return user.UserID;
},
queryTokenizer: Bloodhound.tokenizers.whitespace,
datumTokenizer: Bloodhound.tokenizers.obj.whitespace('name'),
dupDetector: function (a, b) { return a.UserID === b.UserID; },
remote: {
cache: false,
url: '/account/Search?name=%QUERY&page=0',
wildcard: '%QUERY'
}
});
and we configure typeahead as follows:
$('#demo-input').typeahead(
{
hint: $('.Typeahead-hint'),
menu: $('.Typeahead-menu'),
minLength: 3,
classNames:
{
open: 'is-open',
empty: 'is-empty',
cursor: 'is-active',
suggestion: 'Typeahead-suggestion',
selectable: 'Typeahead-selectable'
}
},
{
source: engineWithDefaults,
displayKey: 'name',
templates:
{
suggestion: template,
empty: empty,
footer: all
},
limit: 5
})
The search works just find on localhost and the query runs great as a sql query.
I have also created an index on Verified and cut the speed to 1 second or less.
Our website has millions of registered users and the problem is that as soon as we make search available for all users the DTU percentage on Azure goes to 100% and the queries timeout.
We also have a redis cache to speed-up similar queries but this didn't help us with this issue.
Your support is appreciated :)

It's quite likely to be u.Name.Contains(name) i.e. CHARINDEX('khaled', [u0].[Name]) > 0 which will have to scan the entire table or, at best, the index. That will be slow and there's not much you can do about it.
If you have a large bias to deleted or appearInSearch you might be able to use a conditional index but these types of searches are notoriously slow. You will need some special constructs to make this work.

Related

How to make complex nested where conditions with typeORM?

I am having multiple nested where conditions and want to generate them without too much code duplication with typeORM.
The SQL where condition should be something like this:
WHERE "Table"."id" = $1
AND
"Table"."notAvailable" IS NULL
AND
(
"Table"."date" > $2
OR
(
"Table"."date" = $2
AND
"Table"."myId" > $3
)
)
AND
(
"Table"."created" = $2
OR
"Table"."updated" = $4
)
AND
(
"Table"."text" ilike '%search%'
OR
"Table"."name" ilike '%search%'
)
But with the FindConditions it seems not to be possible to make them nested and so I have to use all possible combinations of AND in an FindConditions array. And it isn't possible to split it to .where() and .andWhere() cause andWhere can't use an Object Literal.
Is there another possibility to achieve this query with typeORM without using Raw SQL?
When using the queryBuilder I would recommend using Brackets
as stated in the Typeorm doc: https://typeorm.io/#/select-query-builder/adding-where-expression
You could do something like:
createQueryBuilder("user")
.where("user.registered = :registered", { registered: true })
.andWhere(new Brackets(qb => {
qb.where("user.firstName = :firstName", { firstName: "Timber" })
.orWhere("user.lastName = :lastName", { lastName: "Saw" })
}))
that will result with:
SELECT ...
FROM users user
WHERE user.registered = true
AND (user.firstName = 'Timber' OR user.lastName = 'Saw')
I think you are mixing 2 ways of retrieving entities from TypeORM, find from the repository and the query builder. The FindConditions are used in the find function. The andWhere function is use by the query builder. When building more complex queries it is generally better/easier to use the query builder.
Query builder
When using the query build you got much more freedom to make sure the query is what you need it to be. With the where you are free to add any SQL as you please:
const desiredEntity = await connection
.getRepository(User)
.createQueryBuilder("user")
.where("user.id = :id", { id: 1 })
.andWhere("user.date > :date OR (user.date = :date AND user.myId = :myId)",
{
date: specificCreatedAtDate,
myId: mysteryId,
})
.getOne();
Note that depending on your used database the actual SQL that you use here needs to be compatible. With that could also come a possible draw back of using this method. You will tie your project to a specific database. Make sure to read up about the aliases for tables you can set if you are using relations this would be handy.
Repository
You already saw that this is much less comfortable. This is because the find function or more specific the findOptions are using objects to build the where clause. This makes is harder to implement a proper interface to implement nested AND and OR clauses side by side. There for (I assume) they have chosen to split AND and OR clauses. This makes the interface much more declarative and means the you have to pull your OR clauses to the top:
const desiredEntity = await repository.find({
where: [{
id: id,
notAvailable: Not(IsNull()),
date: MoreThan(date)
},{
id: id,
notAvailable: Not(IsNull()),
date: date
myId: myId
}]
})
I cannot imagin looking a the size of the desired query that this code would be very performant.
Alternatively you could use the Raw find helper. This would require you to rewrite your clause per field, since you will only get access to the one alias at a time. You could guess the column names or aliases but this would be very poor practice and very unstable since you cannot directly control this easily.
if you want to nest andWhere statements if a condition is meet here is an example:
async getTasks(filterDto: GetTasksFilterDto, user: User): Promise<Task[]> {
const { status, search } = filterDto;
/* create a query using the query builder */
// task is what refer to the Task entity
const query = this.createQueryBuilder('task');
// only get the tasks that belong to the user
query.where('task.userId = :userId', { userId: user.id });
/* if status is defined then add a where clause to the query */
if (status) {
// :<variable-name> is a placeholder for the second object key value pair
query.andWhere('task.status = :status', { status });
}
/* if search is defined then add a where clause to the query */
if (search) {
query.andWhere(
/*
LIKE: find a similar match (doesn't have to be exact)
- https://www.w3schools.com/sql/sql_like.asp
Lower is a sql method
- https://www.w3schools.com/sql/func_sqlserver_lower.asp
* bug: search by pass where userId; fix: () whole addWhere statement
because andWhere stiches the where class together, add () to make andWhere with or and like into a single where statement
*/
'(LOWER(task.title) LIKE LOWER(:search) OR LOWER(task.description) LIKE LOWER(:search))',
// :search is like a param variable, and the search object is the key value pair. Both have to match
{ search: `%${search}%` },
);
}
/* execute the query
- getMany means that you are expecting an array of results
*/
let tasks;
try {
tasks = await query.getMany();
} catch (error) {
this.logger.error(
`Failed to get tasks for user "${
user.username
}", Filters: ${JSON.stringify(filterDto)}`,
error.stack,
);
throw new InternalServerErrorException();
}
return tasks;
}
I have a list of
{
date: specificCreatedAtDate,
userId: mysteryId
}
My solution is
.andWhere(
new Brackets((qb) => {
qb.where(
'userTable.date = :date0 AND userTable.type = :userId0',
{
date0: dates[0].date,
userId0: dates[0].type,
}
);
for (let i = 1; i < dates.length; i++) {
qb.orWhere(
`userTable.date = :date${i} AND userTable.userId = :userId${i}`,
{
[`date${i}`]: dates[i].date,
[`userId${i}`]: dates[i].userId,
}
);
}
})
)
That will produce something similar
const userEntity = await repository.find({
where: [{
userId: id0,
date: date0
},{
id: id1,
userId: date1
}
....
]
})

Sails query by joined table

In sails I need function to get rows by referenced table. Something like
Child.find({ parent.age: 30 })
or in SQL language
SELECT child.* FROM child JOIN parent ON child.parent_id = parent.id WHERE parent.age = 30
So far I have written uggly function which does the work in 3 steps:
get_children_in_celebration: function(req, res) {
// 1. Get parend ids
Parent.find({ select: ['id'], age: 30}).exec(function(err_parents, res_parents) {
// 2. Collect them to array suitable for next query
var parent_ids = [];
for(var i = 0; i < res_parents.length; i++) {
parent_ids[i] = res_parents[i].id;
}
// Get their children
Child.find({ parent_id: parent_ids }).exec(function(err_children, res_children) {
return res.json(res_children);
});
});
},
The first query returns ~5000 parents and therefore together it means heavy load for db. Does sails offer any nicer solution?
No, Sails does not support filtering on associated table. I had the similar issue, had used raw queries for the same.
Model.query("select * from table", function (error, response) {})

Counting cascaded distincts in RavenDB

I'm facing an index problem for which I can't see a solution yet.
I have the following document structure per board:
{
"Name": "Test Board",
...
"Settings": {
"Admins": [ "USER1", "USER2" ],
"Members": [ "USER3", "USER4", "USER5" ]
...
},
...
"CreatedBy": "USER1",
"CreatedOn": "2014-09-26T18:14:20.0858945"
...
}
Now I'd like to be able to retrieve the count of all users which are somewhere registered in a board. Of course this should not only count the number of user occurences but rather count the number of distinct users. One user can be member of multiple boards.
This operation should perform as fast as possible since it is displayed in a global statistics dashboard visible on each page. Therefor I chose to try it with an index instead of retrieving all boards and their users and do the work on client side.
Trying to achieve this by using a Map/Reduce index:
Map = boards => from board in boards
select new
{
Aggregation = "ALL",
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
},
NumberOfUsers = 1
};
Reduce = results => from res in results
group res by new
{
res.Aggregation
}
into g
select new
{
g.Key.Aggregation,
Users = g.Select(x => x.Users),
NumberOfUsers = g.Sum(x => x.Users.Length)
};
Obviously this results in a wrong count. I don't have any experience with Reduce yet so I appreciate any tip! The solution will be probably pretty easy...
What would be the best way to globally distinct CreatedBy, Admins and Members of all documents and return the count?
Use an index like this:
from board in docs.Boards
select new
{
Users = board.Settings.Admins.Count + board.Settings.Members.Count + 1 /* created by */
}
from r in results
group r by "all" into g
select new
{
Users = g.Sum(x=>x.Users)
}
The best I could come up so far is:
Map = boards => from board in boards
select new
{
Users = new object[]
{
board.CreatedBy,
board.Settings.Admins,
board.Settings.Members
}
};
Reduce = results => from r in results
group r by "all" into g
select new
{
Users = g.SelectMany(x => x.Users)
};
And then query for the distinct user count:
var allUsersQuery = _documentSession.Query<AllUsersIndex.Result, AllUsersIndex>();
return allUsersQuery.Any() ? allUsersQuery.First().Users.Distinct().Count() : 0;
At least the query only returns a list of all usernames on all boards instead of bigger object trees. But the uniqueness still has to be done client-side.
If there is any better way please let me know. It would be beautiful to have only one integer returned from the server...
Then use this:
from board in docs.Boards
from user in board.Settings.Admins.Concat(board.Settings.Members).Concat(new[]{board.CreatedBy})
select new
{
User = user,
Count = 1
}
from r in results
group r by r.User into g
select new
{
User = g.Key,
Count = g.Sum(x=>x.Count)
}
I'm not really happy about the fanout, but this will give you all the discint users and the number of times they appear.
If you want just the number of distinct users, just get the total results from the index.

RavenDB Get document count after BulkInsertOperations

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.
By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query

jqGrid pager only returning first page

I want to have client side paging.
But for some reason, I only seem to get back the first page? Even though I know I have two pages worth of data (IE... I step through my code, and I definitely have two...)... What is more baffling is that my links to navigate through the pages never seem to be correct... For instance I would expect the following screen to say 1 of 2...
Also I would expect the bottom right hand side to say View 1-15 of 21?
My feeling is that I am doing something wrong in my data layer to give this pager it's info.
So It only returns the first page.
public static string JsonifyEnc(IEnumerable<TemplateModel> model, int popId, int page, int rows) {
TemplateModel variable = model.ToArray()[0];
ArrayList al = new ArrayList();
//foreach (PatientACOModel patMod in variable.Template) {
int i = 1;
int rowstart = (page * rows + 1) - rows;
int rowend = page * rows;
//Here is where I create the rows... nothing special here
var griddata = new {
total = variable.Template.Count % rows > 0 ? (variable.Template.Count / rows) + 1 : (variable.Template.Count / rows),
page = page,
records = al.Count,
rows = al.ToArray()
};
When I quick wath the total variable it says two?
This would be the first part of my json string that is returned...
{"total":2,"page":1,"records":15,"rows":
So it's there. Also, this is how I am building up my jqGrid...
$(document).ready(function () {
jQuery("#frTable").jqGrid ({
cmTemplate: { sortable: false },
caption: '#TempData["POPNAME"]' + ' Population',
datatype: 'json',
mtype: 'GET',
url: '/Encounters/GetAjaxPagedGridData/', //'Url.Action("GetAjaxPagedGridData", "Encounters", new { popId = TempData["POPULATIONID"] })',//
postData: { popId: '#TempData["POPULATIONID"]'},
pager: '#pager',
jsonReader: {repeatitems: false},
loadonce: true,
height: 'auto',
gridview: true,
viewrecords: true,
rowNum: 15,
shrinkToFit: false,
autowidth: true,
If you use loadonce: true on the client side you should change the server code so that it ignores page and rows options and returns all data. You should just sort the data corresponds to sidx and sord parameter (see sortname and sortorder in jqGrid). You don't need to fill total, page and records parts in the response.
If you use loadonce: true jqGrid load the data and save it in internal data and _index parameters. After that jqGrid change datatype option of jqGrid to "local". So all later sorting, filtering (searching) and paging of data will be done locally.