SQL Server 2012 Database Slowness - sql

So we have a system that uses two columns as the unique ID the userid as well as the a date. We have to keep every record that has ever been associated with a particular subject so there are no deleted records. So one subject can have 50 records. The database designer created views to get the latest row for a subject. Database is really not that huge in terms of record count we are roughly at 750000 records.
The view is written for every table very similar to:
Select Username,
UserID
From users
where USerID = 000
and UserUpdatedDate = (
Select MAX(UserUpdatedDate)
FROM
users a
WHERE a.USerID = UserID
)
We are seeing a major slowness, any suggestions would be welcomed?
We are rewriting some queries using temp tables, it seems to be quicker. Is this a good thing or bad in long haul

Replace this subquery
(Select MAX(UserUpdatedDate) FROM users a WHERE a.USerID = UserID ) with a join - subqueries are slow

Related

Best approach to ocurrences of ids on a table and all elements in another table

Well, the query I need is simple, and maybe is in another question, but there is a performance thing in what I need, so:
I have a table of users with 10.000 rows, the table contains id, email and more data.
In another table called orders I have way more rows, maybe 150.000 rows.
In this orders I have the id of the user that made the order, and also a status of the order. The status could be a number from 0 to 9 (or null).
My final requirement is to have every user with the id, email, some other column , and the number of orders with status 3 or 7. it does not care of its 3 or 7, I just need the amount
But I need to do this query in a low-impact way (or a performant way).
What is the best approach?
I need to run this in a redash with postgres 10.
This sounds like a join and group by:
select u.*, count(*)
from users u join
orders o
on o.user_id = u.user_id
where o.status in (3, 7)
group by u.user_id;
Postgres is usually pretty good about optimizing these queries -- and the above assumes that users(user_id) is the primary key -- so this should work pretty well.

Getting all entries from N unique users

I am working with Google BigQuery and I have two tables as described by the following link:
[Tables].
user_metric contains entries with the lifetime information of all users.
user_daily_metric contains entries for each user and each of the days they have been active
My challenge is that I wish to take the first 500 unique users (represented by the candidate key user_metric.userid) and I want to create a table with entries for each of these 500 unique users and all of their days active. Resulting in a table similar to this: [Resulting table]
(Consider the user with userid = 0690894780 as not being a part of the first 500 unique users)
My current query works for creating the table I desire, in terms of columns, but I have not been able to figure out how to limit it to only entries from the 500 unique users.
Current query:
SELECT
user_metrics.userid, user_metrics.userProgression, user_daily_metrics.missionSecondsPlayed_sum, user_daily_metrics.missionMovesUsed_sum
FROM
user_metric
JOIN user_daily_metric
ON user_metric.userid = user_daily_metric.userid
ORDER BY
user_metrics.userid
In advance, thank you very much for taking the time to read my question (and if I'm lucky, even reply to it) :)
Use a subquery:
SELECT um.userid, um.userProgression, user_daily_metrics.missionSecondsPlayed_sum,
udm.missionMovesUsed_sum
FROM (SELECT um.*
FROM user_metric um
ORDER BY um.userid
LIMIT 500
) um JOIN
user_daily_metric udm
ON um.userid = udm.userid
ORDER BY um.userid

How to design for higher scalability, in SQL Server databases that need set operations?

Imagine a movie application that reommends the next movie to users based on this very simple algorithm:
Movie should be new to user
User has not marked the movie as "not interested"
This is a simple design of SQL Server's database:
Movies:
Id bigint
Name nvarchar(100)
SeenMovies:
Id bigint
UserId bigint
MovieId bigint
NotInterestedFlags:
Id bigint
UserId bigint
MovieId bigint
To get the next movie we run this query:
select top 1 *
from Movies
where Id not in
(
select MovieId
from SeenMovies
where UserId = 89283
)
and Id not in
(
select MovieId
from NotInterestedFlags
where UserId = 89283
)
This design is getting slower and slower by more usage of the application and more data.
So with an imaginary database with 100K movies and over 10 million customers, how to change this design to make it scale horizontally?
The following is something like the code I would recommend.
I assume SeenMovies and NotInterestedFlags are clustered, or at the very least indexed, on UserId. And that Movies is clustered on MovieId. If not, adding such indexes will be the first place to start.
I certainly don't see any reason why there should be poor performance per individual query with the sort of volumes you're talking about, because once we have confined the query to a specific user, both SeenMovies and NotInterestedFlags should only have at most a few thousand rows each for that user.
SELECT TOP 1
Movies.*
FROM
Users
CROSS JOIN
Movies
WHERE
NOT EXISTS
(
SELECT NULL
FROM SeenMovies
WHERE
SeenMovies.UserId = Users.Id
AND
SeenMovies.MovieId = Movies.Id
)
AND
NOT EXISTS
(
SELECT NULL
FROM NotInterestedFlags
WHERE
NotInterestedFlags.UserId = Users.Id
AND
NotInterestedFlags.MovieId = Movies.Id
)
AND
Users.Id = 89283
If this still performs poorly even with appropriate indexes, I could only imagine that first maybe first UNIONing the MovieId entries in the SeenMovies and NotInterestedFlags for that UserId, and then EXCEPTing these against Movies, might provide better performance.
On the other hand, if the problem is that the overall performance of the system is degrading under the load of many users, you might have to look at pre-preparing a list for every user of un-seen and un-blacklisted movies, from which you query the TOP 1.
And then, when a user watches a movie or blacklists it (or a new movie is added), this new table is modified at the same time as the separate SeenMovies and NotInterestedFlags tables.
Again, if that doesn't help performance enough, then you'd have to look at implementing a daily batch job, maybe, that pre-prepares a list of say 10 unseen and non-blacklisted movies per user, and this table is then queried and offered to the user one-at-a-time.
I think frankly though, if there is a prospect of you having 10 million users, you could probably afford an expert to write the code or evaluate the existing system.
Create a cache for each user with "shortlisted" movies. An indexed view might work well for that. The point is not to run the full query each time a user wants to see the list but occasionally update the shortlist. Those individual lists together with User flags tables can be scaled horizontally by some user attribute. User Location might be a good choice here for cloud migration in the future.

What's the most efficient way to exclude possible results from an SQL query?

I have a subscription database containing Customers, Subscriptions and Publications tables.
The Subscriptions table contains ALL subscription records and each record has three flags to mark the status: isActive, isExpire and isPending. These are Booleans and only one flag can be True - this is handled by the application.
I need to identify all customers who have not renewed any magazines to which they have previously subscribed and I'm not sure that I've written the most efficient SQL query. If I find a lapsed subscription I need to ignore it if they already have an active or pending subscription for that particular magazine.
Here's what I have:
SELECT DISTINCT Customers.id, Subscriptions.publicationName
FROM Subscriptions
LEFT JOIN Customers
ON Subscriptions.id_Customer = Customers.id
LEFT JOIN Publications
ON Subscriptions.id_Publication = Publications.id
WHERE Subscriptions.isExpired = 1
AND NOT EXISTS
( SELECT * FROM Subscriptions s2
WHERE s2.id_Publication = Subscriptions.id_Publication
AND s2.id_Customer = Subscriptions.id_Customer
AND s2.isPending = 1 )
AND NOT EXISTS
( SELECT * FROM Subscriptions s3
WHERE s3.id_Publication = Subscriptions.id_Publication
AND s3.id_Customer = Subscriptions.id_Customer
AND s3.isActive = 1 )
I have just over 50,000 subscription records and this query takes almost an hour to run which tells me that there's a lot of looping or something going on where for each record the SQL engine is having to search again to find any 'isPending' and 'isActive' records.
This is my first post so please be gentle if I've missed out any information in my question :) Thanks.
I don't have your complete database structure, so I can't test the following query but it may contain some optimization. I will leave it to you to test, but will explain why I have changed, what I have changed.
select Distinct Customers.id, Subscriptions.publicationName
from Subscriptions
join Customers on Subscriptions.id_Customer = Customer.id
join Publications
ON Subscriptions.id_Publication = Publications.id
Where Subscriptions.isExpired = 1
And Not Exists
(select * from Subscriptions s2
join Customers on s2.id_Customer = Customer.id
join Publications
ON s2.id_Publication = Publications.id
where s2.id_Customer = s2.id_customer and
(s2.isPending = 1 or s2.isActive = 1))
If you have no resulting data in Customer or Publications DB, then the Subscription information isn't useful, so I eliminated the LEFT join in favor of simply join. Combine the two Exists subqueries. These are pretty intensive if I recall so the fewer the better. Last thing which I did not list above but may be worth looking into is, can you run a subquery with specific data fields returned and use it in an Exists clause? The use of Select * will return all data fields which slows down processing. I'm not sure if you can limit your result unfortunately, because I don't have an equivalent DB available to me that I can test on (the google probably knows).
I suspect there are further optimizations that could be made on this query. Eliminating the Exists clause in favor of an 'IN' clause may help, but I can't think of a way right now, seeing how you've got to match two unique fields (customer id and the relevant subscription). Let me know if this helps at all.
With a table of 50k rows, you should be able to run a query like this in seconds.

Oracle database check reservation with SQL

Hi am I am creating a database which allows users to make a reservation to a restaurant. Below is my data model for the database.
My question is i am a little confused with how i would check for tables that are available on a given night. The restaurant has 15 tables for any night with 4 people to a table (Groups can be 4 - 6 big, groups larger than 4 will take up two tables).
How would i query the database to return the tables which are available on a given night.
Thanks.
EDIT::
This is what i have tried. (Some of it is pseudo as i am not quite sure how to do it)
SELECT tables.table_id
FROM tables
LEFT JOIN table_allocation
ON tables.table_id = table_allocation.table_id
WHERE table_allocation.table_id is NULL;
This returns the well empty rows as it is checking for the none presence of the table. I am not quite sure how i would do the date bit test.
To find TABLE rows that have no TABLE_ALLOCATION rows on a given THEMED_NIGHT.TEME_NIGHT_DATE, you should be able to do something like this:
SELECT *
FROM TABLES
WHERE
TABLE_ID NOT IN (
SELECT TABLE_ALLOCATION.TABLE_ID
FROM
TABLE_ALLOCATION
JOIN RESERVATION
ON TABLE_ALLOCATION.RESERVATION_ID = RESERVATION.RESERVATION_ID
JOIN THEMED_NIGHT
ON RESERVATION.THEME_ID = THEMED_NIGHT.THEME_ID
WHERE
THEME_NIGHT_NAME = :the_date
)
In plain English:
Join TABLE_ALLOCATION, RESERVATION and THEMED_NIGHT and accept only those that are on the given date (:the_date).
Discard the TABLE rows that are related to the tuples above (NOT IN).
Those TABLE rows that remain are free for the night.
Try:
SELECT tables.table_id
FROM tables t
WHERE NOT EXISTS
(SELECT NULL
FROM table_allocation a
JOIN reservation r
ON a.reservation_id = r.reservation_id and
r.`TIME` between :Date and :Date+1
WHERE t.table_id = a.table_id)
Note: will only return tables that are not booked at any point on the day in question.