Perform SQL query and then join - sql

Lets say I have two tables:
ticket with columns [id,date, userid] userid is a foreign key that references user.id
user with columns [id,name]
Owing to really large tables I would like to first filter the tickets table by date
SELECT id FROM ticket WHERE date >= 'some date'
then I would like to do a left join with the user table. Is there a way to do it. I tried the follwoing but it doesnt work.
select ticket.id, user.name from ticket where ticket.date >= '2015-05-18' left join user on ticket.userid=user.id;
Apologies if its a stupid question. I have searched on google but most answers involve subqueries after the join instead of what I want which is to perfrom the query first and then do the join for the items returned
To make things a little more clear, the problem I am facing is that I have large tables and join takes time. I am joining 3 tables and the query takes almost 3 seconds. Whats the best way to reduce time. Instead of joining and then doing the where clause, I figured I should first select a small subset and then join.

Simply put everything in the right order:
select - from - where - group by - having - order by
select ticket.id, user.name
from ticket left join user on ticket.user_id=user.id
where ticket.date >= '2015-05-18'
Or put it in a Derived Table:
select ticket.id, user.name
from
(
select * from ticket
where ticket.date >= '2015-05-18'
) as ticket
left join user on ticket.user_id=user.id

Related

PostgreSQL - Optimize subquery by referencing outer query

I have two tables: users and orders. Orders is a massive table (>100k entries) and users is relatively small (around 400 entries).
I want to find the number of orders per user. The column linking both tables is the email column.
I can achieve this with the following query:
SELECT sub_1.num, u.id FROM users AS u,
(SELECT cust_email AS email, COUNT(purchaseid) AS num
FROM orders AS o
WHERE o.status = 'COMPLETED'
GROUP BY cust_email) sub_1
WHERE u.email = sub_1.email
ORDER BY createdate DESC NULLS LAST
However, as mentioned previously, the order table is very large, so I would ideally want to add another condition to the WHERE clause in the Subquery to only retrieve those emails that exist in the User table.
I can simply add the user table to the subquery like this:
SELECT sub_1.num, u.id FROM users AS u,
(SELECT cust_email AS email, COUNT(purchaseid) AS num
FROM orders AS o, users AS u
WHERE o.status = 'COMPLETED'
and o.cust_email = u.email
GROUP BY cust_email) sub_1
WHERE u.email = sub_1.email
ORDER BY createdate DESC NULLS LAST
This does speed up the query, but sometimes the outer query is much more complex than just selecting all entries from the user table. Therefore, this solution does not always work. The goal would be to somehow link the outer and the inner query. I've thought of joint queries but cannot figure out how to get it to work.
I noticed that the first query seems to perform faster than I expected, so perhaps PostgreSQL is already smart enough to connect the outer and inner tables. However, I was hoping that someone could shed some light on how this works and what the best way to perform these types of subqueries is.

trouble with inner joining 2 tables

I have a database with 2 tables in it one is 'enlistments' and the other one is 'users'. In the enlistments table I have a user_id and in the users table I have a name. I want to get the name of the user which belongs to the id.
I know I need to do this with an inner join like this:
SELECT enlistments.round_id, users.name
FROM enlistments
INNER JOIN users
ON enlistments.user_id=users.name
WHERE enlistments.activity_id = 1;
However I get this error: Warning: #1292 Truncated incorrect DOUBLE value
I did some research and found out it has to do with comparing an int with a string but I don't know how to solve the problem.
This is how my database looks like
join on is the condition you use to join the tables. Here it's enlistments.user_id=users.id.
select e.round_id
,u.name
from enlistments e join users u on u.id = e.user_id
where activity_id = 1
round_id
name
1
test2
Fiddle
To validate and be sure you are pulling back the exact data desired, I usually provide aliases for each column brought back and make sure to bring back the join columns also. It's good practice to label where the columns returned originated.
SELECT
Enlistments.UserID as Enlistments_UserID,
Users.ID as Users_ID,
enlistments.round_id as Enlistments_RoundID,
users.name as Users_Name
FROM enlistments
INNER JOIN users
ON enlistments.user_id=users.id
WHERE enlistments.activity_id = 1;
SELECT EN.round_id, US.name
FROM enlistments EN
INNER JOIN users US
ON US.name= CAST(EN.user_id AS VARCHAR)
WHERE EN.activity_id = 1
What you are needing is the function cast that can convert any kind of data into another, so you'll pass your integer value as the first argument followed by "AS '%DATATYPE'" where %DATATYPE is the kind of data you want to achieve.
In your case:
SELECT CAST(123456 AS VARCHAR)
-- RETURNS : '123456'
Anyway, I’m not sure that you can be able to join these two tables with the join you are using.
For more help please share some data.

Most efficient way to get records from a table for which a record exists in another table for each month

I have two tables as below:
User: User_ID, User_name and some other columns (has approx 1000 rows)
Fee: Created_By_User_ID, Created_Date and many other columns (has 17 million records)
Fee table does not have any index (and I can't create one).
I need a list of users for each month of a year (say 2016) who have created at least one fee record.
I do have a working query below which is taking long time to execute. Can someone help me with a better query? May be using EXIST clause (I tried one but still takes time as it scans Fee table)
SELECT MONTH(f.Created_Date), f.Created_By_User_ID
FROM Fees f
JOIN [User] u ON f.Created_By_User_ID= u.User_ID
WHERE f.Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
You will require a full scan of the fee table once in the original query you are using. If you use just the join directly, as you have in the original query, you will require multiple scans of the fee table, many of which will go through redundant rows while the join occurs. Same scenario will occur when you use an inner query as suggested by Mansoor.
An optimization could be to decrease the number of rows on which the joins are happening.
Assuming that the user table contains only one record per user and the Fee table has multiple records per person, we can attempt to find distinct months users made a purchase for by using a CTE.
Then we can make a join on top of this CTE, this will reduce the computation performed by the join and should give a slightly better output time when performing over a large data set.
Try this:
WITH CTE_UserMonthwiseFeeRecords AS
(
SELECT DISTINCT Created_By_User_ID, MONTH(Created_Date) AS FeeMonth
FROM Fee
WHERE Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
)
SELECT User_name, FeeMonth
FROM CTE_UserMonthwiseFeeRecords f
INNER JOIN [User] u ON f.Created_By_User_ID= u.User_ID
Also, you have not mentioned that you require the user names and all, if only id is required for the purpose of finding distinct users making purchases per month, then you can just use the query within the CTE and not even require the JOIN as:
SELECT DISTINCT Created_By_User_ID, MONTH(Created_Date) AS FeeMonth
FROM Fee
WHERE Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
Try below query :
SELECT MONTH(f.Created_Date), f.Created_By_User_ID
FROM Fees f
WHERE EXISTS(SELECT 1 FROM [User] u WHERE f.Created_By_User_ID= u.User_ID
AND DATEDIFF(DAY,f.Created_Date,'2016-01-01') <= 0 AND
DATEDIFF(DAY,f.Created_Date,'2016-12-31') >= 0
You may try this approach to reduce the query run time. however, It does duplicate the huge data and store a instance of table (Temp_Fees), On every DML performed on table Fees/User require truncate and fresh load of table Temp_Fees.
Select * into Temp_Fees from (SELECT MONTH(f.Created_Date) as Created_MONTH, f.Created_By_User_ID
FROM Fees f
WHERE f.Created_Date BETWEEN '2016-01-01' AND '2016-12-31' )
SELECT f.Created_MONTH, f.Created_By_User_ID
FROM Temp_Fees f
JOIN [User] u ON f.Created_By_User_ID= u.User_ID

Merge 2 columns from different SQL server table with empty lines

I have this query to achieve my goal:
SELECT
Datum, LeavetypeID
FROM
Kalender
INNER JOIN
VerlofLijn ON Kalender.ID = VerlofLijn.DagID
WHERE
VerlofLijn.Persnummer = #pnummer;
So basically I have a calendar in the table Kalender.
Every day has a unique id(Kalender.ID). In a second table (VerlofLijn) the matching Kalender.ID is stored in VerlofLijn.DagID, together with the leavetype and the unique employee number Persnummer.
What I want to achieve is a query that loads all dates from the calendar, and - if the current logged-on employee has leave in the database - show this next to the correct date.
So if there is no leave at all in the database for this employee, I still need the calendar to show up so in a next step he can add leave to his personal calendar.
I could create a personal calendar for every employee, but there has to be a better way, without the overhead of storing way to much data in the database, which will make the query to take much more time to search and complete.
Please try this query
SELECT
Datum, LeavetypeID
FROM
Kalender
LEFT JOIN
VerlofLijn
ON Kalender.ID = VerlofLijn.DagID AND VerlofLijn.Persnummer = #pnummer;
Explanation: As you mentioned
i want to achieve is a query that loads all dates from the calendar
you should use Kalender LEFT JOIN VerlofLijn
Have you tried LEFT OUTER JOIN like this?
SELECT
Datum, LeavetypeID
FROM
Kalender
LEFT OUTER JOIN
VerlofLijn ON Kalender.ID = VerlofLijn.DagID
WHERE
VerlofLijn.Persnummer = #pnummer;
You get all the values from Kalendar even if there are no matches from the table VerlofLijn, and all the common values from VerlofLijn.
You can find some examples here: What is the difference between "INNER JOIN" and "OUTER JOIN"?

Combining table information

I have a simple database with three tables:
contributes
payment
user
Whereby contributes is a relationship table between the two user and payment tables. My problem is that when executing an SQL statement to retrieve relationship properties - such as the 'paid' value - and thus include the contributes table in the statement, the results from the query seem to be returned twice. For example, SELECT * FROM user, payment, contributes; produces:
Whereas SELECT * FROM user, payment; produces:
My only guess is that the SELECT statement is simply combining EVERY row of users with EVERY row of payments with EVERY row of contributes, much like a power set?
Forgive me if I'm missing anything obvious, any help would be much appreciated. Also, apologies for the weird table name formatting in the images, that's just how phpMyAdmin exported them!
SELECT u.id, u.email, u.first_name, u.last_name, c.host, c.paid, p.name, p.total, p.portion
FROM user u
INNER JOIN contributes c
ON u.id = c.user_id
INNER JOIN payment p
ON c.payment_id = p.id