Making a query more efficient for reads - sql

I have a data model like the following:
username | product1 | product2
-------------------------------
harold abc qrs
harold abc def
harold def abc
kim abc def
kim lmn qrs
...
username | friend_username
---------------------------
john harold
john kim
...
I want to build a histogram of the most frequent product1 to product2 records there are, restricted to a given product1 id, and restricted only to friends of john. So something like:
What do friends of john link to for product1, when product1='abc':
Select all of john's friends from the friends table. For each friend, count and group the number of records where product1 = 'abc', sort results in desc order:
Results:
abc -> def (2 instances)
abc -> qrs (1 instance)
I know we can do the following in a relational database, but there will be some threshold where this kind of query will start utilizing a lot of resources. Users might have a large number of friend records (500+). If this query is running 5 times every time a user loads a page, I'm worried I'll run out of resources quickly.
Is there some other table I can introduce to my model to relieve the overhead of doing the above query everytime users want to see the histogram break down? All I can think of is to precompute the histograms when possible so that reads optimized.
Thanks for any ideas

Here's your query:
SELECT p.product2,
COUNT(p.product2) AS num_product
FROM PRODUCTS p
JOIN FRIENDS f ON f.friend_username = p.username
AND f.username = 'john'
WHERE p.product1 = 'abc'
GROUP BY p.product2
ORDER BY num_product DESC
To handle 5 products, use:
SELECT p.product1,
p.product2,
COUNT(p.product2) AS num_product
FROM PRODUCTS p
JOIN FRIENDS f ON f.friend_username = p.username
AND f.username = 'john'
WHERE p.product1 IN ('abc', 'def', 'ghi', 'jkl', 'mno')
GROUP BY p.product1, p.product2
ORDER BY num_product DESC
It's pretty simple, and the more you can filter the records down, the faster it will run because of being a smaller dataset.
If this query is running 5 times every time a user loads a page, I'm worried I'll run out of resources quickly.
My first question is why you'd run this query more than once per page. If it's to cover more than one friend, the query I posted can be updated to expose counts for products on a per friend or user basis.
After that, I'd wonder if the query can be cached at all. How fresh do you really need the data to be - is 2 hours acceptable? How about 6 or 12... We'd all like the data to be instantaneous, but you need to weigh that against performance and make a decision.

Related

Using Count and Group By in Power BI

I have a table that contains data about different benefit plans and users enrolled in one or more of those plans. So basically the table contains two columns representing the benefit plan counts and total users enrolled in those plans.
I need to create visualization in Power BI to represent the number of total users enrolled in 1 plan, 2 plans, 3 plans, ...etc.
I wrote the query in sql to get the desired result but not sure how do I do the same in power BI.
Below is my sql query:
SELECT S.PlanCount, COUNT(S.UserName) AS Participants
FROM (
SELECT A.Username, COUNT(*) AS PlanCount
FROM [dbo].[vw_BenefitsCount_Plan_Participants] AS A
GROUP BY A.username
)AS S
GROUP BY S.PlanCount
ORDER BY S.PlanCount
The query result is below image:
So here, PlanCount column represents the total different benefit plans that users are enrolled in. For e.g. the first row means that total of 6008 members are enrolled in only 1 plan, whereas row 2 displays that there are total of 3030 members who are enrolled in total of 2 plans and similarly row 5 means there are only 10 users who are enrolled in total of 6 plans.
I am new to Power BI and trying to understand DAX functions but couldn't find a reasonable example that could help me create my visualization.
I found a something similar here and here but they seem to be more towards single count and group by usage.
Here is a simple example. I have a table of home owners who have homes in multiple cities.
Now in this table, Alex, Dave and Julie have home in 1 city (basically we can say that these 3 people own just 1 home each). Similarly Jim owns a total of 2 homes and Bob and Pam each have 3 homes in total.
Now the output that I need is a table with total number of home owners that own 1 home, 2 homes and so on. So the resulting table in SQL is this.
Where NameCount is basically count of total home owners and Homes is the count of total homes these home owners have.
Please let me know if this helps.
Thanks.
If I understood fine, you have a table like this:
BenefitPlan | User
1 | Max
1 | Joe
2 | Max
3 | Anna
If it's ok, you can simply use a plot bar (for example) where the Axis is BenefitPlan and Value is User. When you drag some column in Value field, it will be grouped automaticaly (like group by in SQL), and by default the groupping method is count.
Hope it helps.
Regards.
You can use DAX to create a summary table from your data table:
https://community.powerbi.com/t5/Desktop/Creating-a-summary-table-out-of-existing-table-assistance/td-p/431485
Once you have counted plans by customer you will then have a field that will enable you to visualize the # of customers with each count.
Mock-up of the code:
PlanSummary = SUMMARIZE('vw_BenefitsCount_Plan_Participants',[Username],COUNT([PLAN_ID])

Multiple query vs Single (multiple has many joins)

Recently stumbled on this situation. Doing both queries might be "light" in my situation, I just want to know when it comes to big dataset on what is better. Better in overall (performance, speed, etc etc).
Currently I do single queries of 2 1:N (has-many) relationship and reduce/transform the data in the application.
It looks like this transformed/reduced:
[
'field' => 'value',
'hasMany-1' => [],
'hasMany-2' => []
]
I'm actually somehow tempted to just do separate queries as it eliminates the pain of reducing it if I had more than 2 hasMany queries and is more quite readable but code currently works so I'll maybe just do it next time.
Is the compromise worth it? Again, in my situation it might be very "light" as I only have few rows (< 100) and structure is not complex as it is on early stage yet.
But asked in case I stumble upon this next time and when dataset grows larger.
** EDIT **
So the has-many relationship I'm talking about are: A customer has-many phones and pets.
My current query returns me this result (simplified):
customer_id | pet_name | phone
1 | john | 1234
1 | john | 5678
2 | jane | 1357
2 | jane | 2468
2 | joe | 1357
2 | joe | 2468
I think my query is fine. It seems logical for some rows to repeat because the other field has different value.
In general, you should issue a single query and let the optimizer do the work for you. At the very least, this saves multiple round-trips to the database and query compilation.
There are cases where multiple queries can have better performance, but I think it is better to start with a single query.
You have a particular issue regarding joins along multiple many-to-many dimensions. There is no need to do the joins "generally" and then "reduce" the results. There are more efficient methods.
I would suggest that you ask another question. Provide sample data, desired results, and an explanation of the logic you are attempting. You may be able to learn a more efficient way to write a single query.
You did not describe your table structures so i assume few things.
If you want to have pets and phones as one row do:
select c.customer_id, c.name,
array_to_string((array_agg(p.pet_name)),',') pet_names
array_to_string((array_agg(ph.phone)),',') phones
from customer c, pet p, phones ph
where p.customer_id=1
and p.customer_id=c.customer_id
and ph.customer_id=c.customer_id
group by c.customer_id, c.name
If you want to have row per pet_name with all possible phone numbers:
select c.customer_id, c.name,
p.pet_name
array_to_string((array_agg(ph.phone)),',') phones
from customer c, pet p, phones ph
where p.customer_id=1
and p.customer_id=c.customer_id
and ph.customer_id=c.customer_id
group by c.customer_id, c.name, p.pet_name
If we talk about performance it will be faster to do 2 queries to pets and phones separetly by customer_id. But until you have milions of rows it is not so important.
Of course you should have indexes on customer_id.

Is this SELECT and ORDER BY query the most efficient way I could have done it?

In my journey to learn SQL, I'm writing various queries on an old database of mine, but getting into more complex things, I want to make sure I'm not over engineering this. I have a table Agent, with different agents offering different prices for cities. Multiple agents can serve the same city, each with different prices. I wanted to run a query which would return the total cost of hiring all of the agents for any given city, ordered by the most expensive.
WITH orderedPrices AS (
SELECT SUM(agtFMPrice)
OVER (PARTITION BY agtCity)
AS IX FROM Agent)
SELECT IX
FROM orderedPrices
ORDER BY IX DESC
I found that doing it without the view returned by orderedPrices, it wouldn't order the prices (I assume because it's an aggregate function, or whatever they're called). Did I do this in the best way I could have, or could it be simplified?
Also, if you're feeling particularly bored, go ahead and give me a new assignment/query to do on this table. I could use the practice.
What you have written in English doesn't seem to quite match qhat you have written in SQL.
English:
- One record per City
- One field per record, showing the total cost of all associated agents
SQL:
- One record per Agent
- One field per record, showing the total cost of all agents in the same city
AgentID | agtCity | agtFMPrice
---------+---------+------------
1 | 1 | 10
2 | 1 | 20
3 | 2 | 30
4 | 2 | 10
5 | 2 | 25
Results of SQL version Results of English version
------------------------ ----------------------------
30 30
30 65
65
65
65
If you want the English version, I'd do this...
SELECT
agtCity,
SUM(agtFMPrice) AS IX
FROM
Agent
GROUP BY
agtCity
ORDER BY
SUM(agtFMPrice) DESC
To assist performance, the table could (should?) also have an Index on (agtCity)

SQL filter search according to multiple column values

I am dealing with one table(3+ million rows,SQL Server)
I need to filter results according to the two columns below:
<code>
...FromID| ToID |Column5|....
...1001 2001
...1002 2020
...1003 5000
...1001 3000
...2001 1001
</code>
Now User1 can access records with FromID or ToId 1001.
FromID|ToID
1001|2001
1001|3000
2001|1001
User2 can access records with FromID or ToID 1002,1003,3000
FromID|ToID
1002|2020
1003|5000
1001|3000
What is the most efficient way to do this ?
Do i need to create a view for each user ?(this is working on enterprise,user count will be
max 100 )
Thanks.
PS. My very first question. O.o
Your access criteria seem to be fairly arbitrary. User1 gets 1001, user2 gets 1002, 1003, and 3000, and I assume users 3 through 99 have arbitrary access as well. In that case, I recommend that you create a table, call it useraccess for this example:
user |accessID
---------------
user1|1001
user2|1002
user2|1003
user2|3000
... |...
Now when you want to know what rows a user has, you can do this:
SELECT t.FromID, t.ToID, [[other columns you care about]]
FROM yourtable t
JOIN useraccess a ON t.FromID = a.accessID OR t.ToID = a.accessID
WHERE a.user = 'user2'
You can either run that query dynamically or you can create a view based on it. The usual tradeoffs between views and direct queries will apply as usual.
Edit: I just saw your note that you already have a UserRights table, so you already have step 1 completed.

Use Access SQL to do a grouped ranking

How do I rank salespeople by # customers grouped by department (with ties included)?
For example, given this table, I want to create the Rank column on the right. How should I do this in Access?
SalesPerson Dept #Customers Rank
Bill DeptA 20 1
Ted DeptA 30 2
Jane DeptA 40 3
Bill DeptB 50 1
Mary DeptB 60 2
I already know how to do a simple ranking with this SQL code. But I don't know how to rework this to accept grouping.
Select Count(*) from [Tbl] Where [#Customers] < [Tblx]![#Customers] )+1
Also, there's plenty of answers for this using SQL Server's Rank() function, but I need to do this in Access. Suggestions, please?
SELECT *, (select count(*) from tbl as tbl2 where
tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl
Just add the dept field to the subquery...
Great solution with subquery! Except for huge recordsets, the subquery solution gets very slow. Its better(quicker) to use a Self JOIN, look at the folowing solution: self join
SELECT tbl1.SalesPerson , count(*) AS Rank
FROM tbl AS tbl1 INNER JOIN tbl AS tbl2 ON tbl1.DEPT = tbl2.DEPT
AND tbl1.#Customers < tbl2.#Customers
GROUP BY tbl1.SalesPerson
I know this is an old thread. But since I spent a great deal of time on a very similar problem and was greatly helped by the former answers given here, I would like to share what I have found to be a MUCH faster way. (Beware, it is more complicated.)
First make another table called "Individualizer". This will have one field containing a list of numbers 1 through the-highest-rank-that-you-need.
Next create a VBA module and paste this into it:
'Global Declarations Section.
Option Explicit
Global Cntr
'*************************************************************
' Function: Qcntr()
'
' Purpose: This function will increment and return a dynamic
' counter. This function should be called from a query.
'*************************************************************
Function QCntr(x) As Long
Cntr = Cntr + 1
QCntr = Cntr
End Function
'**************************************************************
' Function: SetToZero()
'
' Purpose: This function will reset the global Cntr to 0. This
' function should be called each time before running a query
' containing the Qcntr() function.
'**************************************************************
Function SetToZero()
Cntr = 0
End Function
Save it as Module1.
Next, create Query1 like this:
SELECT Table1.Dept, Count(Table1.Salesperson) AS CountOfSalesperson
FROM Table1
GROUP BY Table1.Dept;
Create a MakeTable query called Query2 like this:
SELECT SetToZero() AS Expr1, QCntr([ID]) AS Rank, Query1.Dept,
Query1.CountOfSalesperson, Individualizer.ID
INTO Qtable1
FROM Query1
INNER JOIN Individualizer
ON Query1.CountOfSalesperson >= Individualizer.ID;
Create another MakeTable query called Query3 like this:
SELECT SetToZero() AS Expr1, QCntr([Identifier]) AS Rank,
[Salesperson] & [Dept] & [#Customers] AS Identifier, Table1.Salesperson,
Table1.Dept, Table1.[#Customers]
INTO Qtable2
FROM Table1;
If you have another field already that uniquely identifies every row you wouldn't need to create an Identifier field.
Run Query2 and Query3 to create the tables.
Create a fourth query called Query4 like this:
SELECT Qtable2.Salesperson, Qtable2.Dept, Qtable2.[#Customers], Qtable1.ID AS Rank
FROM Qtable1
INNER JOIN Qtable2 ON Qtable1.Rank = Qtable2.Rank;
Query4 returns the result you are looking for.
Practically, you would want to write a VBA function to run Query2 and Query3 and then call that function from a button placed in a convenient location.
Now I know this sounds ridiculously complicated for the example you gave. But in real life, I am sure your table is more complicated than this. Hopefully my examples can be applied to your actual situation. In my database with over 12,000 records this method is by FAR the fastest (as in: 6 seconds with 12,000 records compared to over 1 minute with 262 records ranked with the subquery method).
The real secret for me was the MakeTable query because this ranking method is useless unless you immediately output the results to a table. But, this does limit the situations that it can be applied to.
P.S. I forgot to mention that in my database I was not pulling results directly from a table. The records had already gone through a string of queries and multiple calculations before they needed to be ranked. This probably contributed greatly to the huge difference in speed between the two methods in my situation. If you are pulling records directly from a table, you might not notice nearly as big an improvement.
You need to do some math. I typically take advantage of the combination of a counter field and an "offset" field. You're aiming for a table which looks like this (#Customers isn't necessary, but will give you a visual that you're doing it properly):
SalesPerson Dept #Customers Ctr Offset
Bill DeptA 20 1 1
Ted DeptA 30 2 1
Jane DeptA 40 3 1
Bill DeptB 50 4 4
Mary DeptB 60 5 4
So, to give rank, you'd do [Ctr]-[Offset]+1 AS Rank
build a table with SalesPerson, Dept, Ctr, and Offset
insert into that table, ordered by Dept and #Customers (so that they're all sorted properly)
Update Offset to be the MIN(Ctr), grouping on Dept
Perform your math calculation to determine Rank
Clear out the table so you're ready to use it again next time.
To add to this and any other related Access Ranking or Rank Tie Breaker how-tos for other versions of Access, ranking should not be performed on crosstab queries if your FROM clause happens to NOT contain a table but a query that is either a crosstab query or a query that contains within it elsewhere a crosstab query.
The code referenced above where a SELECT statement within a SELECT statment is used (sub query),
"SELECT *, (select count(*) from tbl as tbl2 where tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl"
will not work and will always fail expressing a error on portion of the code where "tbl.customers > tbl2.customers" cannot be found.
In my situation on a past project, I was referencing a query instead of a table and within that query I had referenced a crosstab query thus failing and producing an error. I was able to resolve this by creating a table from the crosstab query first, and when I referenced the newly created table in the FROM clause, it started working for me.
So in final, normally you can reference a query or table in the FROM clause of the SELECT statement as what was shared previously above to do ranking, but be carefull as to if you are referencing a query instead of a table, that query must Not be a crosstab query or reference another query that is a crosstab query.
Hope this helps anyone else that may have had problems looking for a possible reason if you happen to reference the statements above and you are not referencing a table in your FROM clause within your own project. Also, performing subqueries on aliases with crosstab queries in Access probably isn't good idea or best practice either so stray away from that if/when possible.
If you found this useful, and wish that Access would allow the use of a scrolling mouse in a passthru query editor, give me a like please.
I normally pick tips and ideas from here and sometimes end up building amazing things from it!
Today, (well let’s say for the past one week), I have been tinkering with Ranking of data in Access and to the best of my ability, I did not anticipate what I was going to do something so complex as to take me a week to figure it out! I picked titbits from two main sites:
https://usefulgyaan.wordpress.com/2013/04/23/ranking-in-ms-access/ (seen that clever ‘>=’ part, and the self joins? Amazing… it helped me to build my solution from just one query, as opposed to the complex method suggested above by asonoftheMighty (not discrediting you… just didn’t want to try it for now; may be when I get to large data I might want to try that as well…)
Right here, from Paul Abott above ( ‘and tbl.dept = tbl2.dept’)… I was lost after ranking because I was placing AND YearID = 1, etc, then the ranking would end up happening only for sub-sets, you guessed right, when YearID = 1! But I had a lot of different scenarios…
Well, I gave that story partly to thank the contributors mentioned, because what I did is to me one of the most complex of the ranking that I think can help you in almost any situation, and since I benefited from others, I would like to share here what I hope may benefit others as well.
Forgive me that I am not able to post my table structures here, it is a lot of related tables. I will only post the query, so if you need to you may develop your tables to end up with that kind of query. But here is my scenario:
You have students in a school. They go through class 1 to 4, can either be in stream A or B, or none when the class is too small. They each take 4 exams (this part is not important now), so you get the total score for my case. That’s it. Huh??
Ok. Lets rank them this way:
We want to know the ranking of
• all students who ever passed through this school (best ever student)
• all students in a particular academic year (student of the year)
• students of a particular class (but remember a student will have passed through all classes, so basically his/her rank in each of those classes for the different years) this is the usual ranking that appears in report cards
• students in their streams (above comment applies)
• I would also like to know the population against which we ranked this student in each category
… all in one table/query. Now you get the point?
(I normally like to do as much of my 'programming' in the database/queries to give me visuals and to reduce the amount of code I will later have to right. I actually won't use this query in my application :), but it let's me know where and how to send my parameters to the query it came from, and what results to expect in my rdlc)
Don't you worry, here it is:
SELECT Sc.StudentID, Sc.StudentName, Sc.Mark,
(SELECT COUNT(Sch.Mark) FROM [StudentScoreRankTermQ] AS Sch WHERE (Sch.Mark >= Sc.Mark)) AS SchoolRank,
(SELECT Count(s.StudentID) FROM StudentScoreRankTermQ AS s) As SchoolTotal,
(SELECT COUNT(Yr.Mark) FROM [StudentScoreRankTermQ] AS Yr WHERE (Yr.Mark >= Sc.Mark) AND (Yr.YearID = Sc.YearID) ) AS YearRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS Yt WHERE (Yt.YearID = Sc.YearID) ) AS YearTotal,
(SELECT COUNT(Cl.Mark) FROM [StudentScoreRankTermQ] AS Cl WHERE (Cl.Mark >= Sc.Mark) AND (Cl.YearID = Sc.YearID) AND (Cl.TermID = Sc.TermID) AND (Cl.ClassID=Sc.ClassID)) AS ClassRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS C WHERE (C.YearID = Sc.YearID) AND (C.TermID = Sc.TermID) AND (C.ClassID = Sc.ClassID) ) AS ClassTotal,
(SELECT COUNT(Str.Mark) FROM [StudentScoreRankTermQ] AS Str WHERE (Str.Mark >= Sc.Mark) AND (Str.YearID = Sc.YearID) AND (Str.TermID = Sc.TermID) AND (Str.ClassID=Sc.ClassID) AND (Str.StreamID = Sc.StreamID) ) AS StreamRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS St WHERE (St.YearID = Sc.YearID) AND (St.TermID = Sc.TermID) AND (St.ClassID = Sc.ClassID) AND (St.StreamID = Sc.StreamID) ) AS StreamTotal,
Sc.CalendarYear, Sc.Term, Sc.ClassNo, Sc.Stream, Sc.StreamID, Sc.YearID, Sc.TermID, Sc.ClassID
FROM StudentScoreRankTermQ AS Sc
ORDER BY Sc.Mark DESC;
You should get something like this:
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| StudentID | StudentName | Mark | SchoolRank | SchoolTotal | YearRank | YearTotal | ClassRank | ClassTotal | StreamRank | StreamTotal | Year | Term | Class | Stream |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| 1 | Jane | 200 | 1 | 20 | 2 | 12 | 1 | 9 | 1 | 5 | 2017 | I | 2 | A |
| 2 | Tom | 199 | 2 | 20 | 1 | 12 | 3 | 9 | 1 | 4 | 2016 | I | 1 | B |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
Use the separators | to reconstruct the result table
Just an idea about the tables, each student will be related to a class. Each class relates to years. Each stream relates to a class. Each term relates to a year. Each exam relates to a term and student and a class and a year; a student can be in class 1A in 2016 and moves on to class 2b in 2017, etc…
Let me also add that this a beta result, I have not tested it well enough and I do not yet have an opportunity to create a lot of data to see the performance. My first glance at it told me that it is good. So if you find reasons or alerts you want to point my way, please do so in comments so I may keep learning!