BigQuery query extremely slow when adding JOIN

BigQuery query extremely slow when adding JOIN - sql

This is my first post in here, so please let me know if I've done anything wrong when posting my question.
I started learning SQL from scratch about three weeks ago, and so I'm fairly new to the whole concept and community and therefore I've probably made a lot of mistakes in my code, but here goes.
I'm struggling with a query, that I'm writing in BigQuery. BigQuery's "validator" has validated the code, so on 'paper' it's seems good, but it takes forever to run. It runs to a point where I stop it, because it has passed an hour. I've been looking in to streamlining my sql-coding so that the proces could run smoother and therefore run faster, but I've hit a wall, where I think I'm out of questions, that could provide me with a useful answer.
(Edit)
What I wan't from this query is a dataset that can help me make a visualisation that creates a timeline based on the dates/timestamps that read_started_at provides.
On this timeline I want a distinct count of reader_id's on the given day/DATE_TRUNC(timestamp). Google Data Studio can make a distinct count of the reader_id's, so I'm in doubt, whether making the distinct count in my query, will slow down or speed up the process in the long run?
Lastly I wanna divide the reader_id's into two groups(dimensions) based on whether they are on a monthly- or yearly-based subscription to see, if one group is more represented at the given read_started_at's, and therefore more active on the website, than the other. This division is supposed to be provided by the chargebee_plan_id where multiple subscriptions are available therefore there's the condition 'yearly' or 'monthly'. The reader_id and membership_id contains the same data and are therefore JOINED upon.
(Edit end)
I really hope that somebody here can help me out. Any advice is appreciated.
My query is the following:
WITH memberships AS (
SELECT im.chargebee_plan_id, im.membership_id
FROM postgres.internal_reporting_memberships AS im
WHERE (im.chargebee_plan_id LIKE 'yearly' OR im.chargebee_plan_id LIKE 'monthly')
AND im.started_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
),
readers AS (
SELECT ip.reader_id, DATE_TRUNC(CAST(ip.read_started_at AS DATE), DAY) read_start
FROM postgres.internal_reporting_read_progresses AS ip
WHERE ip.reader_id LIKE '%|%' AND ip.read_started_at >= (TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
))
SELECT reader_id, read_start, m.chargebee_plan_id
FROM readers AS r
JOIN memberships AS m
ON r.reader_id LIKE m.membership_id
Cheers

Reposting my comment as an answer as it solved the problem.
Use an = instead of a LIKE for the join condition.

Related

Problems with TempDb on the SQL Server

I got some problems with my SQL Server. Some external queries write into the Temp db and every 2-3 days it is full and we have to restart the SQL database. I got who is active on it. And also we can check monitor it over grafana. So I get a exact time when the query starts to write a lot of data into the temp db. Can someone give me a tip on how I can search for the user when I get the exact time?
select top 40 User_Account, start_date, tempdb_allocations
from Whoisactive
order by tempdb_allocation, desc
where start_date between ('15-02-2023 14:12:14.13' and '15-02-2023 15:12:14.13')
User_Account
Start_Date
tempdb_allocations
kkarla1
15-02-2023 14:12:14.13
12
bbert2
11-02-2023 12:12:14.13
0
ubert5
15-02-2023 15:12:14.13
888889

I would add this as a comment but I don’t have the necessary reputation points.
At any rate - you might find this helpful.
https://dba.stackexchange.com/questions/182596/temp-tables-in-tempdb-are-not-cleaned-up-by-the-system
It isn’t without its own drawbacks but I think that if the alternative is restarting the server every 2 or 3 days this may be good enough.
It might also be helpful if you add some more details about the jobs that are blowing up your tempdb.
Is this problematic job calling your database once a day? Once a minute? More?
I ask because if it’s more like once a day then I think the answer in the link is more likely to be helpful.

How to get a count of the number of times a sql statement has executed in X hours?

I'm using oracle db. I want to be able to count the number of times that a SQL statement was executed in X hours. For instance, how many times has the statement Select * From ExampleTable been executed in the past 5 hours?
I tried looking in V$SQL, V$SQLSTATS, V$SQLAREA, but they only keep a record of a statement's total amount of executions. They don't store what times the individual executions occurred. Is there any view I missed, or something else that does keep track of each individual statement execution + timestamp so that I can query by which have occurred X hours ago? Thanks for the help.

The views in the Active Workload Repository store historical SQL execution information, specifically the view DBA_HIST_SQLSTAT.
The view is not perfect; it contains a summary of the top SQL statements. This is almost perfect information for performance tuning - in practice, sampling will catch any performance problem. But if you're looking for a perfect record of every SQL execution, as far as I know the only way to get that information is through tracing, which is buggy and slow.
Hopefully this query is good enough:
select begin_interval_time, end_interval_time, executions_delta, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number = dba_hist_snapshot.instance_number
order by begin_interval_time desc, sql_id;

Apologies for putting this in an answer instead of a comment (I don't have the required reputation), but I think you may be out of luck. Here is an AskTOM asking basically the same question: AskTOM. Tom says unless you are using ASH that just isn't something the database is designed to do.

Oracle SQL -- Can I timestamp a query?

I have a simple query:
Select Count(p.Group_ID)
From Player_Source P
Inner Join Feature_Group_Xref X On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
which spits out the current number of people in a specific test group at the current moment in time.
If I wanted to see what this number was, say, on 9/10/12 instead, could I add something in to the query to time phase this information as the database had it 2 days ago?

No. If you want to store historical information, you will need to incorporate that into your schema. For example, you might extend Feature_Group_Xref to add the columns Effective_Start_Timestamp and Effective_End_Timestamp; to find which groups currently have a given feature, you would write AND Effective_End_Timestamp > CURRENT_TIMESTAMP() (or AND Effective_End_Timestamp IS NULL, depending how you want to define the column), but to find which groups had a given feature at a specific time, you would write AND ... BETWEEN Effective_Start_Timestamp AND Effective_End_Timestamp (or AND Effective_Start_Timestamp < ... AND (Effective_End_Timestamp > ... OR Effective_End_Timestamp IS NULL)).
Wikipedia has a good article on various schema designs that people use to tackle this sort of problem: see http://en.wikipedia.org/wiki/Slowly_changing_dimension.

It depends...
It is at least theoretically possible that you could use flashback query
Select Count(p.Group_ID)
From Player_Source as of timestamp( date '2012-09-10' ) P
Join Feature_Group_Xref as of timestamp( date '2012-09-10' ) X
On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
This requires, though, that you have the privileges necessary to do a flashback query and that there is enough UNDO for Oracle to apply to be able to get back to the state those tables were in at midnight two days ago. It is unlikely that the database is configured to retain that much UNDO though it is generally possible. This query would also work if you happen to be using Oracle Total Recall.
More likely, though, you will need to modify your schema definition so that you are storing historical information that you can then query as of a point in time. There are a variety of ways to accomplish this-- adding effective and expiration date columns to the table as #ruakh suggests is one of the more popular options. Which option(s) are appropriate in your particular case will depend on a variety of factors including how much history you want to retain, how frequently data changes, etc.

Using 'HINTS' in sql query

i am sorry if i sound silly asking but i haven't been using sql hints long and i am going over some chapter review work for school. I am having trouble getting my head wrapped around them.
For instance, one question i did in oracle on a test database i had made was "Show the top 10% of the daily total number of auctions. My answer was(which worked):
SELECT DAYOFWEEK, DAILY_TOTAL
FROM (
SELECT T.DAYOFWEEK,
SUM(AF.TOTAL_NUM_OF_AUCTIONS) AS DAILY_TOTAL,
CUME_DIST() OVER (ORDER BY SUM(AF.TOTAL_NUM_OF_AUCTIONS) ASC) AS Percentile
FROM TIME_DIM T, AUCT_FACT AF
WHERE AF.TIME_ID = T.TIME_ID
GROUP BY T.DAYOFWEEK)
WHERE Percentile > .9
ORDER BY Percentile DESC;
The problem i have now is, it says, for me to try and achieve this output with a different query, which i asked my teacher and they said that they mean to use hints, i looked over notes i have on them and it really doesn't explain thoroughly enough how to optimise this query with hints, or to do it in a simpler manner.
Any help would really be appreciated
=) thanks guys!

Hints are options you include in your query to direct the cost base optimizer which indexes to use.
It looks like daily total is something you can implement a summary index on.

Select optimization in Access

I'm in serious trouble, I've a huge subtle query that takes huge time to execute. Actually it freezes Access and sometimes I have to kill it the query looks like:
SELECT
ITEM.*,
ERA.*,
ORDR.*,
ITEM.COnTY1,
(SELECT TOP 1 New FROM MAPPING WHERE Old = ITEM.COnTY1) AS NewConTy1,
ITEM.COnValue1,
(SELECT TOP 1 KBETR FROM GEN_KUMV WHERE KNUMV = ERA.DOCCOND AND KSCHL = (SELECT TOP 1 New FROM MAPPING WHERE Old = ITEM.COnTY1)) AS NewCOnValue1
--... etc: this continues until ConTy40
FROM
GEN_ITEMS AS ITEM,
GEN_ORDERS AS ORDR,
GEN_ERASALES AS ERA
WHERE
ORDR.ORDER_NUM = ITEM.ORDER_NUM AND -- link between ITEM and ORDR
ERA.concat = ITEM.concat -- link between ERA and ITEM
I won't provide you with the tables schema since the query works, what I'd like to know is if there's a way to add the NewConTy1 and NewConValue1 using another technique to make it more efficient. The thing is that the Con* fields goes from 1 to 40 so I've to align them along (NewConTy1 next to ConTy1 with NewConValue1 next to new ConValue2... etc until 40).
ConTy# and ConTyValue# are in ITEMS (each in a field)
NewConty# and NewConValue# are in ERA (each in a record)
I really hope my explanation is enough to figure out my issue,
Looking forward to hearing from you guys
EDIT:
Ignore the TOP 1 in the SELECTS, it's because current dumps of data I have aren't accurate it's going to be removed later
EDIT 2:
Another thing my query returns up to 230 fields also lol
Thanks
Miloud

Have you considered a union query to normalize items?
SELECT "ConTy1" As CTName, Conty1 As CTVal,
"ConTyValue1" As CTVName, ConTyValue1" As CTVVal
FROM ITEMS
UNION ALL
SELECT "ConTy2" As CTName, Conty2 As CTVal,
"ConTyValue2" As CTVName, ConTyValue2" As CTVVal
FROM ITEMS
<...>
UNION ALL
SELECT "ConTy40" As CTName, Conty40 As CTVal,
"ConTyValue40" As CTVName, ConTyValue40" As CTVVal
FROM ITEMS
This can either be a separate query that links in to your main query, or a sub query of your main query, if that is more convenient. It should then be easy enough to draw in the relationship to the NewConty# and NewConValue# in ERA.

Remou's answer gives what you want - significantly different approach. It's been a while since I've meddled with MS Access query optimization, and had forgot about the details of its planner, but you might want to try a trivial suggestion to actually make your
WHERE conditions
into
INNER JOIN ON conditions
You are firing 40ish correlated subqueries so the above probably will not help (again Remou's answer takes significantly different approach and you might see real improvements there), but do let us know as it is trivial to test.
Another approach that you can take is to materialize expensive part and take Remou's idea but split it into different parts where you can join directly.
For example your first subquery is correlated on ITEM.COnTY1, your second is correlated on ERA.DOCCOND and ITEM.ConTY1.
If you classify your subqueries according to correlated keys then you can save them as queries (or materialize them as make table queries) and join on them (or the newly created tables), which should might perform much faster (and in the case of make tables will perform much faster, at the expense of materializing - so you'll have to run some queries before getting latest data - this can be encapsulated in a macro or VBA function/sub).
Otherwise (for example if you run the above query regularly as a part of your normal business use case) - redesign your DB.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery query extremely slow when adding JOIN - sql

Reposting my comment as an answer as it solved the problem. Use an = instead of a LIKE for the join condition.

Related

Problems with TempDb on the SQL Server

How to get a count of the number of times a sql statement has executed in X hours?

Oracle SQL -- Can I timestamp a query?

Using 'HINTS' in sql query

Select optimization in Access

Categories

Resources