I've been asked for a school exercise to write the following query:
SELECT ACCOUNT_ID, AVG(AMOUNT)
FROM ACCOUNTS A
INNER JOIN TRANSACTIONS T ON A.ACCOUNT_ID = T.ACCOUNT_ID
GROUP BY ACCOUNT_ID
The focus of the exercise is to perform the whole statement without including new transactions registered while it is in progress. My first thought was to use the highest isolation level Serializable or set the transaction Readonly. But I'm wondering however if it's really necessary.
Can be a single statement like this considered atomic? If it's the case, does it mean that the engine works on the datas as they are when the query starts?
If it's not the case, what is the correct way to do that?
Related
I've been provided the below schema for this problem and I'm trying to do two things:
Update the ACCOUNT table's average_eval row with the average of the evaluation row from the POST_EVAL table per account_id.
Update the ACCOUNT table with a count of the number of posts per account_id, with default value 0 if the account_id has no post_id associated to it.
Here's the kicker : I MUST use the UPDATE statement and I'm not allowed to use triggers for these specific problems.
I've tried WITH clauses and GROUP BY but haven't gotten anywhere. Using postresql's pgadmin for reference.
Any help setting up these queries?
The first question can be done using something like this:
update account a
set average_eval = t.avg_eval
from (
select account_id, avg(evaluation) as avg_eval
from post_eval
group by account_id
) t
where t.account_id = a.account_id
The second question needs a co-related sub-query as there is no way to express an outer join in an UPDATE statement like the above:
update account a
set num_posts = (select count(*)
from post p
where p.account_id = a.account_id);
The count() will return zero (0) if there are no posts for that account. If a join was used (as in the first statement), the rows would not be updated at all, as the "join" condition wouldn't match.
I have not tested either of those statements, so they can contain typos (or even logical errors).
Unrelated, but: I understand that this is some kind of assignment, so you have no choice. But as RiggsFolly has mentioned: in general you should avoid storing information in a relational database that can be derived from existing data. Both values can easily be calculated in a view and then will always be up-to-date.
I have this SQL diagram:
I want to get all the subscribers that likes exactly 0 reports.
started by: SELECT * FROM subscriber HAVING count(...)
How do I count how many reporters a subscriber likes?
A relationship should get it's own table in an SQL database?
I'm not completely sure I understand your last question, but this sounds like a nice time for a NOT IN clause.
SELECT *
FROM Subscriber
WHERE Id NOT IN (SELECT SubscriberId
FROM Likes
INNER JOIN Reporter ON Reporter.Id = Likes.ReporterId)
The inner query there simply finds all the subscriber ids that have been reported, then the outer one grabs all the other ones. You might be able to improve efficiency of this query by changing that INNER JOIN to another IN, but you'd have to play with it.
As far as the task of counting them, I'd probably just do this. You could group and such, but this is simple,
SELECT *, (SELECT COUNT(*)
FROM Likes
INNER JOIN Reporter ON Reporter.Id = Likes.ReporterId
WHERE Likes.SubscriberId = Subscriber.Id) AS ReportersCount
FROM Subscriber
Note that for your listed task of finding the ones with zero reporters, the first query will be faster, because it will be able to short-circuit, rather than having to count every reporter for every row. Of course, neither should be too bad as long as you've got the appropriate indexes.
I often find myself running a query to get the number of people who meet a certain criteria, the total number of people in that population and the finding the percentage that meets that criteria. I've been doing it for the same way for a while and I was wondering what SO would do to solve the same type of problem. Below is how I wrote the query:
select m.state_cd
,m.injurylevel
,COUNT(distinct m.patid) as pplOnRx
,x.totalPatientsPerState
,round((COUNT(distinct m.patid) /cast(x.totalPatientsPerState as float))*100,2) as percentPrescribedNarcotics
from members as m
inner join rx on rx.patid=m.PATID
inner join DrugTable as dt on dt.drugClass=rx.drugClass
inner join
(
select m2.state_cd, m2.injurylevel, COUNT(distinct m2.patid) as totalPatientsPerState
from members as m2
inner join rx on rx.patid=m2.PATID
group by m2.STATE_CD,m2.injuryLevel
) x on x.state_cd=m.state_cd and m.injuryLevel=x.injurylevel
where drugText like '%narcotics%'
group by m.state_cd,m.injurylevel,x.totalPatientsPerState
order by m.STATE_CD,m.injuryLevel
In this example not everyone who appears in the members table is in the rx table. The derived table makes sure that everyone whose in rx is also in members without the condition of drugText like narcotics. From what little I've played with it it seems that the over(partition by clause might work here. I have no idea if it does, just seems like it to me. How would someone else go about tackling this problem?
results:
This is exactly what MDX and SSAS is designed to do. If you insist on doing it in SQL (nothing wrong with that), are you asking for a way to do it with better performance? In that case, it would depend on how the tables are indexed, tempdb speed, and if the tables are partitioned, then that too.
Also, the distinct count is going to be one of larger performance hits. The like '%narcotics%' in the predicate is going to force a full table scan and should be avoided at all costs (can this be an integer key in the data model?)
To answer your question, not really sure windowing (over partition by) is going to perform any better. I would test it and see, but there is nothing "wrong" with the query.
You could rewrite the count distinct's as virtual tables or temp tables with group by's or a combination of those two.
To illustrate, this is a stub for windowing that you could grow into the same query:
select a.state_cd,a.injurylevel,a.totalpatid, count(*) over (partition by a.state_cd, a.injurylevel)
from
(select state_cd,injurylevel,count(*) as totalpatid, count(distinct patid) as patid
from
#members
group by state_cd,injurylevel
) a
see what I mean about not really being that helpful? Then again, sometimes rewriting a query slightly can improve performance by selecting a better execution plan, but rather then taking stabs in the dark, I'd first find the bottlenecks in the query you have, since you already took the time to write it.
In my application I have a table of application events that are used to generate a user-specific feed of application events. Because it is generated using an OR query, I'm concerned about performance of this heavily used query and am wondering if I'm approaching this wrong.
In the application, users can follow both other users and groups. When an action is performed (eg, a new post is created), a feed_item record is created with the actor_id set to the user's id and the subject_id set to the group id in which the action was performed, and actor_type and subject_type are set to the class names of the models. Since users can follow both groups and users, I need to generate a query that checks both the actor_id and subject_id, and it needs to select distinct records to avoid duplicates. Since it's an OR query, I can't use an normal index. And since a record is created every time an action is performed, I expect this table to have a lot of records rather quickly.
Here's the current query (the following table joins users to feeders, aka, users and groups)
SELECT DISTINCT feed_items.* FROM "feed_items"
INNER JOIN "followings"
ON (
(followings.feeder_id = feed_items.subject_id
AND followings.feeder_type = feed_items.subject_type)
OR
(followings.feeder_id = feed_items.actor_id
AND followings.feeder_type = feed_items.actor_type)
)
WHERE (followings.follower_id = 42) ORDER BY feed_items.created_at DESC LIMIT 30 OFFSET 0
So my questions:
Since this is a heavily used query, is there a performance problem here?
Is there any obvious way to simplify or optimize this that I'm missing?
What you have is called an exclusive arc and you're seeing exactly why it's a bad idea. The best approach for this kind of problem is to make the feed item type dynamic:
Feed Items: id, type (A or S for Actor or Subject), subtype (replaces actor_type and subject_type)
and then your query becomes
SELECT DISTINCT fi.*
FROM feed_items fi
JOIN followings f ON f.feeder_id = fi.id AND f.feeder_type = fi.type AND f.feeder_subtype = fi.subtype
or similar.
This may not completely or exactly represent what you need to do but the principle is sound: you need to eliminate the reason for the OR condition by changing your data model in such a way to lend itself to having performant queries being written against it.
Explain analyze and time query to see if there is a problem.
Aso you could try expressing the query as a union
SELECT x.* FROM
(
SELECT feed_items.* FROM feed_items
INNER JOIN followings
ON followings.feeder_id = feed_items.subject_id
AND followings.feeder_type = feed_items.subject_type
WHERE (followings.follower_id = 42)
UNION
SELECT feed_items.* FROM feed_items
INNER JOIN followings
followings.feeder_id = feed_items.actor_id
AND followings.feeder_type = feed_items.actor_type)
WHERE (followings.follower_id = 42)
) AS x
ORDER BY x.created_at DESC
LIMIT 30
But again explain analyze and benchmark.
To find out if there is a performance problem measure it. PostgreSQL can explain it for you.
I don't think that the query needs simplifying, if you identify a performance problem then you may need to revise your indexes.
I'm almost done with this, just a few last hiccups. I now need to delete all records from a table except for the top 1 where readings_miu_id is the "DISTINCT" column. In other words words i need to delete all records from a table other than the first DISTINCT readings_miu_id. I am assuming all I need to do is modify the basic delete statement:
DELETE FROM analyzedCopy2
WHERE readings_miu_id = some_value
But I can't figure out how to change the some_column=some_value part to something like:
where some_column notequal to (select top 1 from analyzedCopy2 as A
where analyzedCopy2.readings_miu_id = A.readings_miu_id)
and then I need to figure out how to use an UPDATE statement to update a table (analyzedCopy2) from a query (which is where all of the values I want stored into column RSSI in table analyzedCopy2 are currently located). I've tried this:
UPDATE analyzedCopy2 from testQuery3 SET analyzedCopy2.RSSI =
(select AvgOfRSSI from testQuery3 INNER JOIN analyzedCopy2 on analyzedCopy2.readings_miu_id = testQuery3.readings_miu_id where analyzedCopy2.readings_miu_id = testQuery3.readings_miu_id)
where analyzedCopy2.readings_miu_id = testQuery3.readings_miu_id
but apparently I can't use FROM inside of an update statement. Any thoughts?
I'm sure I'm going about this a very nonstandard (and possibly if not probably the flat out wrong) way but I'm not being allowed to use vb.net2008 to pull and manipulate then store the data like I would like to so I'm stuck right now using sql statements in ms-access which is a good learning experience (Even if trying to do such odd things as I've been having to do in sql statements is making me beat my head against my deck figuratively of course)
MS Access UPDATE sql statements cannot reference queries, but they can reference tables. So the thing to do is store the query results into a table.
SELECT YourQuery.*
INTO TempTable1
FROM YourQuery
Now you can use TempTable1 in an UPDATE query:
UPDATE TargetTable
INNER JOIN TempTable1 ON TempTable1.TargetTableId = TargetTable.Id
SET TargetTable.TargetField = TempTable1.SourceField
See my answer to this question.
I don't have a copy of access on this machine, and it's been a few years since I dabbled in access, so I'm taking a wild stab here, but can you do a
delete from analyzedCopy2
where readings_miu_id not in (select top 1 readings_miu_id from analyzedCopy2 order by...)
(you'll need the order by to get the proper top 1 record, order by the id maybe?)
I've got no hope of helping you with the second one without a copy of access. I know how I'd do it in TSQL, but access is a whole different kettle of wtf's :-)
I was trying to make too complicated, since all of the records that i needed to pull had the same information in each field that i needed all i had to do was use:
SELECT DISTINCT readings_miu_id, DateRange, RSSI, ColRSSI, Firmware, CFGDate, FreqCorr, Active, OriginCol, ColID, Ownage, SiteID, PremID, prem_group1, prem_group2
FROM analyzedCopy2
ORDER BY readings_miu_id;
in order to pull the top 1 record per readings_miu_id.