i am sorry if i sound silly asking but i haven't been using sql hints long and i am going over some chapter review work for school. I am having trouble getting my head wrapped around them.
For instance, one question i did in oracle on a test database i had made was "Show the top 10% of the daily total number of auctions. My answer was(which worked):
SELECT DAYOFWEEK, DAILY_TOTAL
FROM (
SELECT T.DAYOFWEEK,
SUM(AF.TOTAL_NUM_OF_AUCTIONS) AS DAILY_TOTAL,
CUME_DIST() OVER (ORDER BY SUM(AF.TOTAL_NUM_OF_AUCTIONS) ASC) AS Percentile
FROM TIME_DIM T, AUCT_FACT AF
WHERE AF.TIME_ID = T.TIME_ID
GROUP BY T.DAYOFWEEK)
WHERE Percentile > .9
ORDER BY Percentile DESC;
The problem i have now is, it says, for me to try and achieve this output with a different query, which i asked my teacher and they said that they mean to use hints, i looked over notes i have on them and it really doesn't explain thoroughly enough how to optimise this query with hints, or to do it in a simpler manner.
Any help would really be appreciated
=) thanks guys!
Hints are options you include in your query to direct the cost base optimizer which indexes to use.
It looks like daily total is something you can implement a summary index on.
Related
I'm trying to query 10th, 25th, 75th percentile for each row in druid from an integer column value. I came across some solutions ( http://druid.io/docs/latest/development/extensions-core/datasketches-quantiles.html ) but not sure how they can be implemented. Can somebody explain it in simpler terms?
If you're using druid SQL, it's pretty easy (once you load the extension). Eg, you can use
SELECT APPROX_QUANTILE_DS(myNum, .25) FROM myData
to get the 25th percentile of myNum. (If you search for 'approx_quantile_ds' on this doc page, there's also a link to a known issue.)
For native queries, I'm not sure offhand, but maybe this will help: https://www.druidforum.org/t/quantile-calculation/4929
This is my first post in here, so please let me know if I've done anything wrong when posting my question.
I started learning SQL from scratch about three weeks ago, and so I'm fairly new to the whole concept and community and therefore I've probably made a lot of mistakes in my code, but here goes.
I'm struggling with a query, that I'm writing in BigQuery. BigQuery's "validator" has validated the code, so on 'paper' it's seems good, but it takes forever to run. It runs to a point where I stop it, because it has passed an hour. I've been looking in to streamlining my sql-coding so that the proces could run smoother and therefore run faster, but I've hit a wall, where I think I'm out of questions, that could provide me with a useful answer.
(Edit)
What I wan't from this query is a dataset that can help me make a visualisation that creates a timeline based on the dates/timestamps that read_started_at provides.
On this timeline I want a distinct count of reader_id's on the given day/DATE_TRUNC(timestamp). Google Data Studio can make a distinct count of the reader_id's, so I'm in doubt, whether making the distinct count in my query, will slow down or speed up the process in the long run?
Lastly I wanna divide the reader_id's into two groups(dimensions) based on whether they are on a monthly- or yearly-based subscription to see, if one group is more represented at the given read_started_at's, and therefore more active on the website, than the other. This division is supposed to be provided by the chargebee_plan_id where multiple subscriptions are available therefore there's the condition 'yearly' or 'monthly'. The reader_id and membership_id contains the same data and are therefore JOINED upon.
(Edit end)
I really hope that somebody here can help me out. Any advice is appreciated.
My query is the following:
WITH memberships AS (
SELECT im.chargebee_plan_id, im.membership_id
FROM postgres.internal_reporting_memberships AS im
WHERE (im.chargebee_plan_id LIKE 'yearly' OR im.chargebee_plan_id LIKE 'monthly')
AND im.started_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
),
readers AS (
SELECT ip.reader_id, DATE_TRUNC(CAST(ip.read_started_at AS DATE), DAY) read_start
FROM postgres.internal_reporting_read_progresses AS ip
WHERE ip.reader_id LIKE '%|%' AND ip.read_started_at >= (TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
))
SELECT reader_id, read_start, m.chargebee_plan_id
FROM readers AS r
JOIN memberships AS m
ON r.reader_id LIKE m.membership_id
Cheers
Reposting my comment as an answer as it solved the problem.
Use an = instead of a LIKE for the join condition.
I'm using oracle db. I want to be able to count the number of times that a SQL statement was executed in X hours. For instance, how many times has the statement Select * From ExampleTable been executed in the past 5 hours?
I tried looking in V$SQL, V$SQLSTATS, V$SQLAREA, but they only keep a record of a statement's total amount of executions. They don't store what times the individual executions occurred. Is there any view I missed, or something else that does keep track of each individual statement execution + timestamp so that I can query by which have occurred X hours ago? Thanks for the help.
The views in the Active Workload Repository store historical SQL execution information, specifically the view DBA_HIST_SQLSTAT.
The view is not perfect; it contains a summary of the top SQL statements. This is almost perfect information for performance tuning - in practice, sampling will catch any performance problem. But if you're looking for a perfect record of every SQL execution, as far as I know the only way to get that information is through tracing, which is buggy and slow.
Hopefully this query is good enough:
select begin_interval_time, end_interval_time, executions_delta, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number = dba_hist_snapshot.instance_number
order by begin_interval_time desc, sql_id;
Apologies for putting this in an answer instead of a comment (I don't have the required reputation), but I think you may be out of luck. Here is an AskTOM asking basically the same question: AskTOM. Tom says unless you are using ASH that just isn't something the database is designed to do.
I'm looking to find out how to write a SQL query that looks for store locations that contribute towards 80% of inventory adjustments and what their inventory accuracy calculation is. I'm not quite sure how to go about doing it. So far I have the total absolute value of their adjustments which will be used to base the calculation off of. Here's what I have so far.Any help would be appreciated.
SELECT sum(abs(Details.ValueDifference)) As writeoff, (sum(Details.NumberofPartsCounted) - sum(Details.NumberofPartsCountedwithErrors))/(sum(Details.NumberofPartsCounted)) As Accuracy
FROM Details;
Alright, well, I feel like I'm not 100% confident in terms of what you're looking for, but here's my best guess. It looks like you want to select location number along with what you're referring to as "writeoff" and "Accuracy" in your query. You'll have to group by the location number in order to get the sums for each location.
It sounds like the writeoff results column is supposed to help tell you how much a particular location has contributed towards total inventory adjustments? Assuming that's true, I would also order by writeoff in descending order so you can see the rows with the highest values for this first.
SELECT N_LOCATION
,sum(abs(ValueDifference)) As writeoff
,(sum(NumberofPartsCounted) - sum(Details.NumberofPartsCountedwithErrors))/(sum(NumberofPartsCounted)) As Accuracy
FROM Details
GROUP BY
N_LOCATION
ORDER BY
writeoff DESC;
Honestly, I think the best way to accomplish what I think you're trying to do would be with a stored procedure. I would use the query above and also a query to select the sum of the absolute value of the value difference for all locations. Then I would save the total for all locations in a variable to be used while looping through the results of the first query. That way, you can use the "writeoff" for the location with the total for all locations to find the percentage for each location.
I'm guessing you would then add locations to a result set until the locations that you've added have percentages totaling 80%? Not sure exactly how you're setting the rule for that 80%, but hopefully you can adjust as needed.
I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).