How to optimize a TSQL query? - sql

"activity" is a bit field. I need to set it to true if one of the rows with this client_id has value true
SELECT c.client_id, u.branch_id, a.account_id, activity
FROM Clients c INNER JOIN
accounts a ON c.id=a.client_id INNER JOIN uso u ON a.uso_id = u.uso_id,
(SELECT MAX(CONVERT(int,accounts.activity)) as activity, client_id
FROM accounts GROUP BY client_id) activ
WHERE activ.client_id = c.id
This query executes about 2 minutes. Please help me to optimize it.

Seems activity field is a BIT and you cannot do a MIN or MAX on it.
Instead of this, use TOP:
SELECT c.client_id, u.branch_id, a.account_id,
(
SELECT TOP 1 activity
FROM accounts ai
WHERE ai.client_id = c.id
ORDER BY
activity DESC
)
FROM clients c
JOIN accounts a
ON c.id = a.client_id
JOIN uso u
ON a.uso_id = u.uso_id
Create an index on accounts (client_id, activity) for this to work fast.
You may want to read this article:
Minimum and maximum on bit fields: SQL Server

Join is expensive. Instead of Join, use memcache and make separate requests.

Related

Writing a subquery instead of using a spreadshet

Pretty basic SQL uses, I usually do some basic joins, and then pull data into Sheets to pivot or filter it to get what I want, but know I can do it quicker all in SQL.
For this query, I want to only return data if the c2.id count is greater than 0. I tried writing a subquery in the where clause, but feels like I need to group by task_id for this to be right...can someone help me understand what I should do and why?
select t.inserted_at::date, count (distinct c2.id), t.id, t.conversation_id
from tasks t
left join users u on u.id = t.creator_id
left join "comments" c2 on t.id = c2.task_id
left join conversations c on c.id = t.conversation_id
where u.include_in_metrics = true
and c.type = 'PROJECT_FEED'
group by 1,3,4
order by t.inserted_at::date desc;
Just add this after the group by before order by
having count(distinct c2.id)>0

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

Bigquery SQL code to pull earliest contact

I have a copy of our salesforce data in bigquery, I'm trying to join the contact table together with the account table.
I want to return every account in the dataset but I only want the contact that was created first for each account.
I've gone around and around in circles today googling and trying to cobble a query together but all roads either lead to no accounts, a single account or loads of contacts per account (ignoring the earliest requirement).
Here's the latest query. that produces no results. I think I'm nearly there but still struggling. any help would be most appreciated.
SELECT distinct
c.accountid as Acct_id
,a.id as a_Acct_ID
,c.id as Cont_ID
,a.id AS a_CONT_ID
,c.email
,c.createddate
FROM `sfdcaccounttable` a
INNER JOIN `sfdccontacttable` c
ON c.accountid = a.id
INNER JOIN
(SELECT a2.id, c2.accountid, c2.createddate AS MINCREATEDDATE
FROM `sfdccontacttable` c2
INNER JOIN `sfdcaccounttable` a2 ON a2.id = c2.accountid
GROUP BY 1,2,3
ORDER BY c2.createddate asc LIMIT 1) c3
ON c.id = c3.id
ORDER BY a.id asc
LIMIT 10
The solution shared above is very BigQuery specific: it does have some quirks you need to work around like the memory error you got.
I once answered a similar question here that is more portable and easier to maintain.
Essentially you need to create a smaller table(even better to make it a view) with the ID and it's first transaction. It's similar to what you shared by slightly different as you need to group ONLY in the topmost query.
It looks something like this
select
# contact ids that are first time contacts
b.id as cont_id,
b.accountid
from `sfdccontacttable` as b inner join
( select accountid,
min(createddate) as first_tx_time
FROM `sfdccontacttable`
group by 1) as a on (a.accountid = b.accountid and b.createddate = a.first_tx_time)
group by 1, 2
You need to do it this way because otherwise you can end up with multiple IDs per account (if there are any other dimensions associated with it). This way also it is kinda future proof as you can have multiple dimensions added to the underlying tables without affecting the result and also you can use a where clause in the inner query to define a "valid" contact and so on. You can then save that as a view and simply reference it in any subquery or join operation
Setup a view/subquery for client_first or client_last
as:
SELECT * except(_rank) from (
select rank() over (partition by accountid order by createddate ASC) as _rank,
*
FROM `prj.dataset.sfdccontacttable`
) where _rank=1
basically it uses a Window function to number the rows, and return the first row, using ASC that's first client, using DESC that's last client entry.
You can do that same for accounts as well, then you can join two simple, as exactly 1 record will be for each entity.
UPDATE
You could also try using ARRAY_AGG which has less memory footprint.
#standardSQL
SELECT e.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.createddate ASC LIMIT 1
)[OFFSET(0)] e
FROM `dataset.sfdccontacttable` t
GROUP BY t.accountid
)

Oracle SQL MINUS on 3 tables

I need to create an Oracle SQL query possibly using MINUS
Booking(BookID, MotelID, ClientID, Date)
Motel(MotelID, MotelName)
Client(ClientID, ClientName)
I can show the names of clients who have stayed at either motel (I think!!!)
SELECT DISTINCT ClientName
FROM (Client INNER JOIN Booking
ON Client.ClientID = Booking.ClientID)
INNER JOIN Motel
ON Booking.MotelID = Motel.MotelID
WHERE (MotelName = 'MotelOne' OR MotelName='MotelTwo');
But I now need to show the clients who have stayed at MotelOne but NOT MotelTwo.
Very new to this, and trying to get my head around it so any help will be gratefully accepted!
Oracle has a MINUS operator --> http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries004.htm
It returns only unique rows returned by the first query but not by the second.
SELECT c.clientid, c.clientname
FROM booking b JOIN client c
ON b.clientid = c.clientid JOIN motel m
ON b.motelid = m.motelid
WHERE m.motelname = 'MotelOne'
MINUS
SELECT c.clientid, c.clientname
FROM booking b JOIN client c
ON b.clientid = c.clientid JOIN motel m
ON b.motelid = m.motelid
WHERE m.motelname = 'MotelTwo'
MINUS operator sorts rows and eliminates duplicates, so SELECT DISTINCT is not required.

Order by join column but use distinct on another

I'm building a system in which there are the following tables:
Song
Broadcast
Station
Follow
User
A user follows stations, which have songs on them through broadcasts.
I'm building a "feed" of songs for a user based on the stations they follow.
Here's the query:
SELECT DISTINCT ON ("broadcasts"."created_at", "songs"."id") songs.*
FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
ORDER BY broadcasts.created_at desc
LIMIT 18
Note: shared_id is the same as id.
As you can see I'm getting duplicate results, which I don't want. I found out from a previous question that this was due to selecting distinct on broadcasts.created_at.
My question is: How do I modify this query so it will return only unique songs based on their id but still order by broadcasts.created_at?
Try this solution:
SELECT a.maxcreated, b.*
FROM
(
SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
FROM follows aa
INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
WHERE aa.user_id = 2
GROUP BY bb.song_id
) a
INNER JOIN songs b ON a.song_id = b.id
ORDER BY a.maxcreated DESC
LIMIT 18
The FROM subselect retrieves distinct song_ids that are broadcasted by all stations the user follows; it also gets the latest broadcast date associated with each song. We have to encase this in a subquery because we have to GROUP BY on the columns we're selecting from, and we only want the unique song_id and the maxdate regardless of the station.
We then join that result in the outer query to the songs table to get the song information associated with each unique song_id
You can use Common Table Expressions (CTE) if you want a cleaner query (nested queries make things harder to read)
I would look like this:
WITH a as (
SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
FROM follows aa
INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
INNER JOIN songs cc ON bb.song_id = cc.shared_id
WHERE aa.user_id = 2
GROUP BY bb.song_id
)
SELECT
a.maxcreated,
b.*
FROM a INNER JOIN
songs b ON a.song_id = b.id
ORDER BY
a.maxcreated DESC
LIMIT 18
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.
Try by adding GROUP BY Songs.id
I had a very similar query I was doing between listens, tracks and albums and it took me a long while to figure it out (hours).
If you use a GROUP_BY songs.id, you can get it to work by ordering by MAX(broadcasts.created_at) DESC.
Here's what the full SQL looks like:
SELECT songs.* FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
GROUP BY songs.id
ORDER BY MAX(broadcasts.created_at) desc
LIMIT 18;