What's the most efficient way to exclude possible results from an SQL query? - sql

I have a subscription database containing Customers, Subscriptions and Publications tables.
The Subscriptions table contains ALL subscription records and each record has three flags to mark the status: isActive, isExpire and isPending. These are Booleans and only one flag can be True - this is handled by the application.
I need to identify all customers who have not renewed any magazines to which they have previously subscribed and I'm not sure that I've written the most efficient SQL query. If I find a lapsed subscription I need to ignore it if they already have an active or pending subscription for that particular magazine.
Here's what I have:
SELECT DISTINCT Customers.id, Subscriptions.publicationName
FROM Subscriptions
LEFT JOIN Customers
ON Subscriptions.id_Customer = Customers.id
LEFT JOIN Publications
ON Subscriptions.id_Publication = Publications.id
WHERE Subscriptions.isExpired = 1
AND NOT EXISTS
( SELECT * FROM Subscriptions s2
WHERE s2.id_Publication = Subscriptions.id_Publication
AND s2.id_Customer = Subscriptions.id_Customer
AND s2.isPending = 1 )
AND NOT EXISTS
( SELECT * FROM Subscriptions s3
WHERE s3.id_Publication = Subscriptions.id_Publication
AND s3.id_Customer = Subscriptions.id_Customer
AND s3.isActive = 1 )
I have just over 50,000 subscription records and this query takes almost an hour to run which tells me that there's a lot of looping or something going on where for each record the SQL engine is having to search again to find any 'isPending' and 'isActive' records.
This is my first post so please be gentle if I've missed out any information in my question :) Thanks.

I don't have your complete database structure, so I can't test the following query but it may contain some optimization. I will leave it to you to test, but will explain why I have changed, what I have changed.
select Distinct Customers.id, Subscriptions.publicationName
from Subscriptions
join Customers on Subscriptions.id_Customer = Customer.id
join Publications
ON Subscriptions.id_Publication = Publications.id
Where Subscriptions.isExpired = 1
And Not Exists
(select * from Subscriptions s2
join Customers on s2.id_Customer = Customer.id
join Publications
ON s2.id_Publication = Publications.id
where s2.id_Customer = s2.id_customer and
(s2.isPending = 1 or s2.isActive = 1))
If you have no resulting data in Customer or Publications DB, then the Subscription information isn't useful, so I eliminated the LEFT join in favor of simply join. Combine the two Exists subqueries. These are pretty intensive if I recall so the fewer the better. Last thing which I did not list above but may be worth looking into is, can you run a subquery with specific data fields returned and use it in an Exists clause? The use of Select * will return all data fields which slows down processing. I'm not sure if you can limit your result unfortunately, because I don't have an equivalent DB available to me that I can test on (the google probably knows).
I suspect there are further optimizations that could be made on this query. Eliminating the Exists clause in favor of an 'IN' clause may help, but I can't think of a way right now, seeing how you've got to match two unique fields (customer id and the relevant subscription). Let me know if this helps at all.
With a table of 50k rows, you should be able to run a query like this in seconds.

Related

SQLite selecting transactions that do / do not meet a particular criteria

I am trying to extract data from a GnuCash SQLite database. Relevant tables include accounts, transactions, and splits. Simplistically, accounts contain transactions which contain splits, and each split points back to an account.
The transactions need to be processed differently depending on whether each one does or does not include a particular kind of transaction fee—in this case whether or not the transaction contains a split linked to account 8190-000.
I've set up two queries, one that handles transactions with the transaction fee, and one that handles transactions without the transaction fee. The queries work, but they are awkward and wordy, and I'm sure there is a better way to do this. I did see not exists in this answer, but could not figure out how to make it work in this situation.
My current queries look like this:
-- Find all transactions containing a split with account code 8190-000
select tx_guid from transactions
inner join
(select tx_guid from
(splits inner join accounts on splits.account_guid = accounts.guid)
where accounts.code = "8190-000") fee_transactions
on fee_transactions.tx_guid = transactions.guid;
-- Find all transactions not containing a split with account code 8190-000
select guid from transactions
except
select tx_guid from transactions
inner join
(select tx_guid from
(splits inner join accounts on splits.account_guid = accounts.guid)
where accounts.code = "8190-000") fee_transactions
on fee_transactions.tx_guid = transactions.guid;
Given that I need to use these results in other queries, what is a simpler and more succinct way to obtain these lists of transactions?
You can use EXISTS for your 1st query like this:
SELECT t.*
FROM transactions t
WHERE EXISTS (
SELECT 1
FROM splits s INNER JOIN accounts a
ON s.account_guid = a.guid
WHERE a.code = '8190-000' AND ?.tx_guid = t.guid
);
Change ? to s or a, depending on which table contains the column tx_guid (splits or accounts), since it is not clear in your question.
Also, change to NOT EXISTS for your 2nd query.

Multiply total number of values in column by value in a different table

I am trying to count all the values in one column and then multiply this number by a value in a different table. So far I have:
SELECT CLUB_FEE * COUNT(MEMBER_ID) AS VALUE
FROM CLUB, SUBSCRIPTION
WHERE CLUB_ID = 'CLUB1';
This is not working however, can anyone please help?
I also need help doing this for multiple clubs. Is it possible to do it all in one statement for all clubs and then get the average?
Presumably, you intend something like this:
SELECT MAX(c.CLUB_FEE) * COUNT(MEMBER_ID) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
You can also write this as:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
I thought the first version would be clearer, because the OP specifies COUNT() in the question.
If you want it for all clubs that have subscribers:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
GROUP BY c.CLUB_ID;
From inspecting the explain plans, it seems the following version may be a bit more efficient (since it avoids a join and uses only one aggregation). If you need this for ALL clubs at the same time, then probably all solutions will have the same "optimizer cost" (they will all do a join at some point).
select club_fee * (select count(member_id) from subscription where club_id = 'CLUB1')
from club
where club_id = 'CLUB1'
So now the only aggregate function is pushed into a subquery and the rest does not need either a join or another aggregate function.
Of course, this only matters if performance is important; it may very well not be.

Simple.Data - How to apply WHERE clauses to joined tables

I'm trying to use Simple.Data as my ADO, but I've run into a problem trying to put together a query that joins a couple of tables, then filters the results based on values in the non-primary tables.
Scenario is a job application app (but jobs are like a specific task to be done on a given day). There are 3 relevant tables, jobs, applications and application_history. The can be many applications for each record in the jobs tables, and many application_history records for each applications. In the application_history table, there's a status column as each application gets sent, offered and finally accepted.
So I want a query that returns all the accepted applications that are for jobs in the future; i.e. where the date column in the jobs table is in the future and where there's an associated record in the application_history table where the status column is 5 (meaning accepted).
If this was plain old SQL, I'd use this query:
SELECT A.* FROM application AS A
INNER JOIN application_history AS AH ON AH.application_id = A.id
INNER JOIN job AS J ON J.id = A.job_id
WHERE AH.status_id = 3 AND J.date > date('now')
But I want to know how to achieve the same thing using Simple.Data. For bonus points, if you could start by ignoring the 'job must be in the future' step, that will help me understand what's going on.
As a reference: Simple.Data documentation especially the part about explicit joins.
You should be able to do something like this:
//db is your Simple.Data Database object
db.application
.Join(db.application_history)
.On(db.application.id == db.application_history.application_id)
.Join(db.job )
.On(db.Applications.job_id == db.job.id)
.Where(db.application_history.status_id == 3 && db.job.date > DateTime.Now());
I'm not sure whether or not Simple.Data knows how to handle the Date part.

Activerecord query returning doubles while using uniq

I am running the following query with the goal of returning a unique set of customer objects:
Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
However, this code will generate duplicates if a customer has more than one project active (e.g. closed = true). If I remove the projects.finish_date from the select clause this query works as intended. However, I need this to be in there to be able to sort on that column.
How can I make this query return a unique set of customers?
How can I make this query return a unique set of customers?
This doesn't completely make sense, and probably isn't what you want.
The problem is that you're joining against the projects table, at which point there may be several rows for the same customer with different project finish_dates. These rows are unique and will be returned as multiple unique Customer objects, each with different a finish_date.
If you only want one of these, how is Rails to determine which one? Wouldn't it be a problem if you only had one customer object with one finish_date returned if there are really 10 projects for that customer, each with a different finish_date?
Instead, you probably want something like this:
customers = Customer.joins(:projects).select('customers.*, projects.finish_date').where("projects.closed = false").uniq
customers.group_by(&:id)
This groups all of your same customers together.
OR, you might want:
projects = Project.where(closed: false).includes(:user)
users = projects.map(&:user).uniq
In either case, you're producing a unique set of users from the superset of all user-project joins.
RE Your comments:
If you want to get a list of customers with their most recent associated project, you could use a sub query in your where:
select customers.*, projects.finish_date from customers
inner join projects on projects.customer_id = customers.id
where projects.id = (
select id from projects
where customer.id = project.customer_id
and closed = false
order by finish_date desc
limit 1
)
You can express this using ActiveRecord by embedding the sub-query in a where:
Customer.joins(:projects)
.select('customers.*, projects.finish_date as finish_date')
.where('select id from projects where customer.id = project.customer_id and closed = false order by finish_date desc limit 1')
I have no idea how this will perform for you, but I suspect poorly.
I would always stick to a simple includes and in-Ruby filter before attempting to optimize with SQL.

Oracle database check reservation with SQL

Hi am I am creating a database which allows users to make a reservation to a restaurant. Below is my data model for the database.
My question is i am a little confused with how i would check for tables that are available on a given night. The restaurant has 15 tables for any night with 4 people to a table (Groups can be 4 - 6 big, groups larger than 4 will take up two tables).
How would i query the database to return the tables which are available on a given night.
Thanks.
EDIT::
This is what i have tried. (Some of it is pseudo as i am not quite sure how to do it)
SELECT tables.table_id
FROM tables
LEFT JOIN table_allocation
ON tables.table_id = table_allocation.table_id
WHERE table_allocation.table_id is NULL;
This returns the well empty rows as it is checking for the none presence of the table. I am not quite sure how i would do the date bit test.
To find TABLE rows that have no TABLE_ALLOCATION rows on a given THEMED_NIGHT.TEME_NIGHT_DATE, you should be able to do something like this:
SELECT *
FROM TABLES
WHERE
TABLE_ID NOT IN (
SELECT TABLE_ALLOCATION.TABLE_ID
FROM
TABLE_ALLOCATION
JOIN RESERVATION
ON TABLE_ALLOCATION.RESERVATION_ID = RESERVATION.RESERVATION_ID
JOIN THEMED_NIGHT
ON RESERVATION.THEME_ID = THEMED_NIGHT.THEME_ID
WHERE
THEME_NIGHT_NAME = :the_date
)
In plain English:
Join TABLE_ALLOCATION, RESERVATION and THEMED_NIGHT and accept only those that are on the given date (:the_date).
Discard the TABLE rows that are related to the tuples above (NOT IN).
Those TABLE rows that remain are free for the night.
Try:
SELECT tables.table_id
FROM tables t
WHERE NOT EXISTS
(SELECT NULL
FROM table_allocation a
JOIN reservation r
ON a.reservation_id = r.reservation_id and
r.`TIME` between :Date and :Date+1
WHERE t.table_id = a.table_id)
Note: will only return tables that are not booked at any point on the day in question.