I use a count() function many times in my program:
var housewith2floor = from qry in houses
where qry.floor == 2
select qry;
var counthousewith2floor = housewith2floor.count();
var housecolorwhite = from qry in house
where qry.color == "white"
select qry;
var countwhotehouse = housecolorwhite.count();
Each count method takes a lot of time to be executed. The database has 2 millions rows of data. I already put an unclustered index for floor column and color column, but the count still takes too long. Is there any other way I can make my count run much faster?
It's not so much the count that takes time. The initial statement won't actually execute until it needs to (called deferred execution). So it's the query generated by
var housewith2floor = from qry in houses
where qry.floor == 2
select qry;
that takes the time.
Edit to remove statment about indexes as I see you have already created them.
Are there any tables which reference or are referenced by "houses", and do they use Lazy Loading?
Related
I have a ten million level database. The client needs to read data and perform calculation.
Due to the large amount of data, if it is saved in the application cache, memory will be overflow and crash will occur.
If I use select statement to query data from the database in real time, the time may be too long and the number of operations on the database may be too frequent.
Is there a better way to read the database data? I use C++ and C# to access SQL Server database.
My database statement is similar to the following:
SELECT TOP 10 y.SourceName, MAX(y.EndTimeStamp - y.StartTimeStamp) AS ProcessTimeStamp
FROM
(
SELECT x.SourceName, x.StartTimeStamp, IIF(x.EndTimeStamp IS NOT NULL, x.EndTimeStamp, 134165256277210658) AS EndTimeStamp
FROM
(
SELECT
SourceName,
Active,
LEAD(Active) OVER(PARTITION BY SourceName ORDER BY TicksTimeStamp) NextActive,
TicksTimeStamp AS StartTimeStamp,
LEAD(TicksTimeStamp) OVER(PARTITION BY SourceName ORDER BY TicksTimeStamp) EndTimeStamp
FROM Table1
WHERE Path = N'App1' and TicksTimeStamp >= 132165256277210658 and TicksTimeStamp < 134165256277210658
) x
WHERE (x.Active = 1 and x.NextActive = 0) OR (x.Active = 1 and x.NextActive = null)
) y
GROUP BY y.SourceName
ORDER BY ProcessTimeStamp DESC, y.SourceName
The database structure is roughly as follows:
ID Path SourceName TicksTimeStamp Active
1 App1 Pipe1 132165256277210658 1
2 App1 Pipe1 132165256297210658 0
3 App1 Pipe1 132165956277210658 1
4 App2 Pipe2 132165956277210658 1
5 App2 Pipe2 132165956277210658 0
I use the ExecuteReader of C #. The same SQL statement runs on SQL Management for 4s, but the time returned by the ExecuteReader is 8-9s. Does the slow time have anything to do with this interface?
I don't really 'get' the entire query but I'm wondering about this part:
WHERE (x.Active = 1 and x.NextActive = 0) OR (x.Active = 1 and x.NextActive = null)
SQL doesn't really like OR's so why not convert this to
WHERE x.Active = 1 and ISNULL(x.NextActive, 0) = 0
This might cause a completely different query plan. (or not)
As CharlieFace mentioned, probably best to share the query plan so we might get an idea of what's going on.
PS: I'm also not sure what those 'ticksTimestamps' represent, but it looks like you're fetching a pretty wide range there, bigger volumes will also cause longer processing time. Even though you only return the top 10 it still has to go through the entire range to calculate those durations.
I agree with #Charlieface. I think the index you want is as follows:
CREATE INDEX idx ON Table1 (Path, TicksTimeStamp) INCLUDE (SourceName, Active);
You can add both indexes (with different names of course) and see which one the execution engine chooses.
I can suggest adding the following index which should help the inner query using LEAD:
CREATE INDEX idx ON Table1 (SourceName, TicksTimeStamp, Path) INCLUDE (Active);
The key point of the above index is that it should allow the lead values to be rapidly computed. It also has an INCLUDE clause for Active, to cover the entire select.
I have two queries, first query return top 10/20 recoords, second query return total record count from first query. Both queries need to use same filter condition.
How can I write filter condition and parameter used in filter condition in one place and use in both the queries.
Condition I can store in string variable and use in both the queries but how to share parameters?
I am using HQL
Check this similar Q & A: Nhibernate migrate ICriteria to QueryOver
There is a native support in NHiberante for row count. Let's have some query
// the QueryOver
var query = session.QueryOver<MyEntity>();
It could have any amount of where parts, projections... Now we just take its underlying criteria and use a transformer to create brand new criteria - out-of-box ready to get total rowcount
// GET A ROW COUNT query (ICriteria)
var rowCount = CriteriaTransformer.TransformToRowCount(query.UnderlyingCriteria);
Next step is to use FUTURE to get both queries in one round trip to DB
// ask for a list, but with a Future, to combine both in one SQL statement
var list = query
.Future<MyEntity>()
.ToList();
// execute the main and count query at once
var count = rowCount
.FutureValue<int>()
.Value;
// list is now in memory, ready to be used
var list = futureList
.ToList();
Ok so I'm having a bit of a learning moment here and after figuring out A way to get this to work, I'm curious if anyone with a bit more postgres experience could help me figure out a way to do this without doing a whole lotta behind the scene rails stuff (or doing a single query for each item i'm trying to get)... now for an explaination:
Say I have 1000 records, we'll call them "Instances", in the database that have these fields:
id
user_id
other_id
I want to create a method that I can call that pulls in 10 instances that all have a unique other_id field, in plain english (I realize this won't work :) ):
Select * from instances where user_id = 3 and other_id is unique limit 10
So instead of pulling in an array of 10 instances where user_id is 3 and you can get multiple instances with the other_id is 5, I want to be able to run a map function on those 10 instances and get back something like [1,2,3,4,5,6,7,8,9,10].
In theory, I can probably do one of two things currently, though I'm trying to avoid them:
Store an array of id's and do individual calls making sure the next call says "not in this array". The problem here is I'm doing 10 individual db queries.
Pull in a large chunk of say, 50 instances and sorting through them in ruby-land to find 10 unique ones. This wouldn't allow me to take advantage of any optimizations already done in the database and I'd also run the risk of doing a query for 50 items that don't have 10 unique other_id's and I'd be stuck with those unless I did another query.
Anyways, hoping someone may be able to tell me I'm overlooking an easy option :) I know this is kind of optimizing before it's really needed but this function is going to be run over and over and over again so I figure it's not a waste of time right now.
For the record, I'm using Ruby 1.9.3, Rails 3.2.13, and Postgresql (Heroku)
Thanks!
EDIT: Just wanted to give an example of a function that technically DOES work (and is number 1 above)
def getInstances(limit, user)
out_of_instances = false
available = []
other_ids = [-1] # added -1 to avoid submitting a NULL query
until other_ids.length == limit || out_of_instances == true
instance = Instance.where("user_id IS ? AND other_id <> ALL (ARRAY[?])", user.id, other_ids).limit(1)
if instance != []
available << instance.first
other_ids << instance.first.other_id
else
out_of_instances = true
end
end
end
And you would run:
getInstances(10, current_user)
While this works, it's not ideal because it's leading to 10 separate queries every time it's called :(
In a single SQL query, it can be achieved easily with SELECT DISTINCT ON... which is a PostgreSQL-specific feature.
See http://www.postgresql.org/docs/current/static/sql-select.html
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first
With your example:
SELECT DISTINCT ON (other_id) *
FROM instances
WHERE user_id = 3
ORDER BY other_id LIMIT 10
I have a featured section in my website that contains featured posts of three types: normal, big and small. Currently I am fetching the three types in three separate queries like so:
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :big).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :small).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :normal).limit(5)
Basically I am looking for a query that will combine those three in to one and fetch 1 big, 1 small, 5 normal posts.
I'd be surprised if you don't want an order. As you have it, it is supposed to find a random small, random large, and 5 random normal.
Yes, you can use a UNION. However, you will have to do an execute SQL. Look at the log for the SQL for each of your three queries, and do an execute SQL of a string which is each of the three queries with UNION in between. It might work, or it might have problems with the limit.
It is possible in SQL by joining the table to itself, doing a group by on one of the aliases for the table, a where when the other aliased table is <= the group by table, and adding a having clause where count of the <= table is under the limit.
So, if you had a simple query of the posts table (without the visible and pinged conditions) and wanted the records with the latest created_at date, then the normal query would be:
SELECT posts1.*
FROM posts posts1, posts posts2
WHERE posts2.created_at >= posts1.create_at
AND posts1.overlay_type = 'normal'
AND posts2.overlay_type = 'normal'
GROUP BY posts1.id
HAVING count(posts2.id) <= 5
Take this SQL, and add your conditions for visible and pinged, remembering to use the condition for both posts1 and posts2.
Then write the big and small versions and UNION it all together.
I'd stick with the three database calls.
I don't think this is possible but you can use scope which is more rails way to write a code
Also it may just typo but you are reassigning the #featured_big_first so it will contain the data of the last query only
in post.rb
scope :overlay_type_record lamda{|type| joins(:visible).where(["visible.pinged=1 AND visible.overlay_type =", type])}
and in controller
#featured_big_first = Post.overlay_type_record(:big).limit(1)
#featured_small_first = Post.overlay_type_record(:small).limit(1)
#featured_normal_first = Post.overlay_type_record(:normal).limit(5)
When I need to get the number of row(data) inside a SQLite database, I run the following pseudo code.
cmd = "SELECT Count(*) FROM benchmark"
res = runcommand(cmd)
read res to get result.
But, I'm not sure if it's the best way to go. What would be the optimum way to get the number of data in a SQLite DB? I use python for accessing SQLite.
Your query is correct but I would add an alias to make it easier to refer to the result:
SELECT COUNT(*) AS cnt FROM benchmark
Regarding this line:
count size of res
You don't want to count the number of rows in the result set - there will always be only one row. Just read the result out from the column cnt of the first row.