How to improve the speed of this SQL update query?

How to improve the speed of this SQL update query? - sql

Sorry, this is my first time using this forum. Apparently people can edit my post which although helpful, has taken some information out.
I will try to make it more understandable.
I am using SQL Compact 3.5 as a local database.
The program is written in VB.NET.
The problem is with querying one of my tables that is taking too long.
The player table has, among other things, id, skill, school, weight, starter.
id is the player's id
skill is the player's skill level
school is a foreign key pointing to the id of the school table
weight is one of 14 different numbers
What I am trying to do is set the starter value = 'true' for the player with the highest skill at a given weight for a given school.
So if there are 100 players at a school, there will be 14 starters, one for each weight.
The player table has 170,000 players, each having 1 of 14 different weights, and each belongs to 1 of 4500 schools.
Someone commented below and showed this statement which appears to be on the right track. I am a novice and have not gotten it implemented quite yet.
"UPDATE p " &
"SET starter = 'TRUE' " &
"FROM player p" &
"JOIN (" &
"SELECT DISTINCT school, weight, MAX(skill) AS MaxSkill " &
"FROM player " &
"GROUP BY school, weight" &
") q ON q.school = p.school AND q.weight = p.weight AND q.MaxSkill =
p.skill"

Instead of doing a group-by-group, row-by-row approach, this update query does it all at once:
First, it gathers the highest skill for each school / weight combination.
It then joins that to the player that has the matching school / weight / skill combination, and then sets that player to the starter.
UPDATE p
SET starter = 'TRUE'
FROM player p
JOIN (
SELECT school, weight, MAX(skill) AS MaxSkill
FROM player
GROUP BY school, weight
) maxResults
ON maxResults.school = p.school
AND maxResults.weight = p.weight
AND maxResults.MaxSkill = p.skill
However, in the case of a tie in skill, all players with the highest skill would be set to a starter...

There's some minor confusion over the use of weight, as I'm assuming you you're not doing this on a per-unit basis. You may want to extract out ranges to another table, then use the id's there instead of a numeric weight.
In any case, here's a query that should work for all RDBMSs
UPDATE player a SET starter = TRUE
WHERE NOT EXISTS (SELECT '1'
FROM player b
WHERE b.school = a.school
AND b.weight = a.weight
AND b.skill > a.skill)
The inner query should return null (thus setting starter true) if:
There are no other players at the school
There are no players at the same school, in the same weight class
There are no players with a higher skill level, for the same school and weight class

Related

Having SQL Server choose and show one record over other

Ok, hopefully I can explain this accurately. I work in SQL Server, and I am trying to get one row from a table that will show multiple rows for the same person for various reasons.
There is a column called college_attend which will show either New or Cont for each student.
My issue: my initial query narrows down the rows I'm pulling by Academic Year, which consists of two semesters: Fall of one year, and Spring of the following to create an academic year. This is why there are two rows returned for some students.
Basically, I need to generate an accurate count of those that are "New" and those that are "Cont", but I don't want both records for the same student counted. They will have two records because they will have one for spring and one for fall (usually). So if a student is "New" in fall, they will have a "Cont" record for spring. I want the query to show ONLY the "New" record if they have both a "New' and "Cont" record, and count it (which I will do in Report Builder). The other students will basically have two records that are "Cont": one for fall, and one "Cont" for spring, and so those would be considered the continuing ones or "Cont".
Here is the basic query I have so far:
SELECT DISTINCT
people.people_id,
people.last_name,
people.first_name,
academic.college_attend AS NewORCont,
academic.academic_year,
academic.academic_term,
FROM
academic
INNER JOIN
people ON people.people_id = academic.people_id
INNER JOIN
academiccalendar acc ON acc.academic_year = academic.academic_year
AND acc.academic_term = academic.academic_term
AND acc.true_academic_year = #Academic_year
I'm not sure if this can be done with a CASE statement? I thought of a GROUP BY, but then SQL Server will want me to add all of my columns to the GROUP BY clause, and that ends up negating the purpose of the grouping in the first place.
Just a sample of what I work with for each student:
People ID
Last
First
NeworCont
12345
Soanso
Guy
New
12345
Soanso
Guy
Cont
32345
Person
Nancy
Cont
32345
Person
Nancy
Cont
55555
Smith
John
New
55555
Smith
John
Cont
---------
------
-------
----------
Hopefully this sheds some light on the duplicate record issue I mentioned.

Without sample data its awkward to visualize the problem, and without the expected results specified it's also unclear what you want as the outcome. Perhaps this will assist, it will limit the results to only those who have both 'New' and 'Cont' in a single "true academic year" but the count seems redundant as this (I imagine) will always be 2 (being 1 New term and 1 Cont term)
SELECT
people.people_id
, people.last_name
, people.first_name
, acc.true_academic_year
, count(*) AS count_of
FROM academic
INNER JOIN people ON people.people_id = academic.people_id
INNER JOIN academiccalendar acc ON acc.academic_year = academic.academic_year
AND acc.academic_term = academic.academic_term
AND acc.true_academic_year = #Academic_year
GROUP BY
people.people_id
, people.last_name
, people.first_name
, acc.true_academic_year
HAVING MAX(academic.college_attend) = 'New'
AND MIN(academic.college_attend) = 'Cont'

Find entities sorted by aggregated property of their relations

Question is quite similar to Find entity with most relations filtered by criteria, but slightly different.
model Player {
id String #id
name String #unique
game Game[]
}
model Game {
id String #id
isWin Boolean
playerId String
player Player #relation(fields: [playerId], references: [id])
}
I would like to find 10 players with best win rate (games with isWin=true divided by total amount of games by that player).
Direct and slow way to do that is to find all the players who won at least once and count their wins (first query). Then for each of them count their total amount of games (second query). Then do the math and sorting on the application side while holding results in memory.
Is there simpler way to do that? How would I do that with prisma? If there is no prisma "native" way to do it, what is the most efficient way to do this with raw SQL?

This is simple aggregation:
SELECT p.id, p.name
, COUNT(CASE WHEN isWin THEN 1 END) AS wins
, COUNT(g.playerId) AS played
, 100 * COUNT(CASE WHEN isWin THEN 1 END)
/ COUNT(g.playerId) AS rate
FROM player AS p
JOIN game AS g
ON g.playerId = p.id
GROUP BY p.name -- since p.name is unique, not the id.
ORDER BY rate DESC
LIMIT 10
;
Adjust as needed for your database. I adjusted in case the join becomes a LEFT JOIN, to handle players with no games played.
Executable example with PG, as indicated by the comments below.
Working test case - no data

How can I select the highest counts attributes from different groups?

So I have a table with players data(name, team, etc..) and a table with goals (player who scored it, local team, etc...). What I need to do is, get from each team the highest scorer. So the result I'm getting is something like:
germany - whatever name - 1
germany - another dude - 5
spain - another name - 8
italy - one more name - 6
As you can see teams repeat, and I want them not to, just get the highest scorer of each team.
Right now I have this:
SELECT P.TEAM_PLAYER, G.PLAYER_GOAL, COUNT(*) AS "TOTAL GOALS" FROM PLAYER P, GOAL G
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
AND P.NAME = G.PLAYER_GOAL
GROUP BY G.PLAYER_GOAL, P.TEAM_PLAYER
HAVING COUNT(*)>=ALL (SELECT COUNT(*) FROM PLAYER P2 where P.TEAM_PLAYER = P2.TEAM_PLAYER GROUP BY P2.TEAM_PLAYER)
ORDER BY COUNT(*) DESC;
I am 100% sure I'm close, and I'm pretty sure I have to do this with the HAVING feature, but I can't get it right.
Without the HAVING it returns a list of all the players, their teams and how many goals have they scored, now I want to cut it down to only one player for each team.
PD: the teams in the table GOAL are local and visiting team, so I have to use the Player table to get the team. Also the Goal table is not a list of the players and how many goals they have scored, but a list of every individual goal and the player who scored it.

If I understand correctly you can try this query.
just get MAX of PLAYER_GOAL column,SUM(G.PLAYER_GOAL) instead of COUNT(*)
SELECT P.TEAM_PLAYER,
MAX(G.PLAYER_GOAL) "PLAYER_GOAL",
SUM(G.PLAYER_GOAL) AS "TOTAL GOALS"
FROM PLAYER P
INNER JOIN GOAL G
ON P.NAME = G.PLAYER_NAME
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
GROUP BY P.TEAM_PLAYER
ORDER BY SUM(G.PLAYER_GOAL) DESC;
NOTE :
Avoid using commas to join tables it's a old join style, You can use inner-join instead.
Edit
I don't know your table schema, but this query might be work.
use a subquery to contain your current result set. then get MAX function and GROUP BY
SELECT T.TEAM_PLAYER,
T.PLAYER_GOAL,
MAX(TOTAL_GOALS) AS "TOTAL GOALS"
FROM
(
SELECT P.TEAM_PLAYER, G.PLAYER_GOAL, COUNT(*) AS "TOTAL_GOALS" FROM
PLAYER P, GOAL G
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
AND P.NAME = G.PLAYER_GOAL
GROUP BY G.PLAYER_GOAL, P.TEAM_PLAYER
HAVING COUNT(*)>=ALL (SELECT COUNT(*) FROM PLAYER P2 where P.TEAM_PLAYER = P2.TEAM_PLAYER GROUP BY P2.TEAM_PLAYER)
) T
GROUP BY T.TEAM_PLAYER,
T.PLAYER_GOAL
ORDER BY MAX(TOTAL_GOALS) DESC

SQL - Remove Duplicates in Single Field

SELECT Company.CompanyName
,Student.Status
,Student.Level
,Student.PlacementYear
,Company.CompanyCode
,Company.HREmail
,Company.Telephone
,Company.HRContact
,PlacedStudents.DateAdded
FROM Student
RIGHT JOIN (Company INNER JOIN PlacedStudents
ON Company.CompanyCode = PlacedStudents.CompanyCode)
ON Student.StudentNo = PlacedStudents.StudentNo
WHERE (((Student.PlacementYear)=" & Year & "))
AND((Student.Status)<>'Still Seeking YOPE')
ORDER BY Company.CompanyName
I have this SQL Query which pulls HR Contacts from Companies where students are currently placed. However, there are multiple students at one company so when I run the query there are duplicates. I'm fairly new to SQL, I tried DISTINCT, however it didn't seem to do anything, the duplicates remained.
How can I remove duplicates in the CompanyCode field so that the Company only appears once when the query is run.
Below is an image of what happens when I run query. Hopefully this makes sense?
Any help would be appreciated.

This query should give you companies that have placed students:
SELECT Company.CompanyName
,Company.CompanyCode
,Company.HREmail
,Company.Telephone
,Company.HRContact
FROM Company
WHERE EXISTS (SELECT * FROM PlacedStudents INNER JOIN
Student ON Student.StudentNo = PlacedStudents.StudentNo
WHERE Company.CompanyCode = PlacedStudents.CompanyCode
AND Student.PlacementYear =" & Year & "
AND Student.Status <>'Still Seeking YOPE')
ORDER BY Company.CompanyName;

Your question is asking for HR Contacts from Companies where students are placed. I assume this means if you have 1, 2 or 1,000,000 students at a single company, you only want to see the company listed once?
Your current query is returning information from STUDENT and PLACEDSTUDENTS which is going to result in output like
COMPANY_A STUDENT01 .........
COMPANY_A STUDENT02 .........
COMPANY_A STUDENT03 .........
and so on.
If so, and taking a best guess (since I can't know what's in STUDENT or PLACEDSTUDENTS tables), try not including anything related to STUDENT in the SELECT.
SELECT DISTINCT Company.CompanyName, Company.CompanyCode, Company.HREmail,
Company.Telephone, Company.HRContact FROM
I'll be happy to help more if you can provide more information about the structure of the tables and some examples of data, AND what you actually want from the query.

Convert a SQL subquery into a join when looking at another record in the same table Access 2010

I have read that Joins are more efficient than subqueries, I have a query that is extremely slow and uses lots of subqueries, therefore I would like to improve it but do not know how.
I have the following tables:
People \\this table stores lists of individual people with the following fields
(
ID, \\Primary Key
aacode Text, \\represents a individual house
PERSNO number, \\represent the number of the person in the house e.g. person number 1
HRP number, \\the PERSNO of the Housing Reference Person (HRP) the "main" person in the house
DVHsize number, \\the number of people in the house
R01 number, \\the persons relationship to the person who is PERSNO=1
R02 number, \\the persons relationship to the person who is PERSNO=2
R03 number, \\the persons relationship to the person who is PERSNO=3
AgeCat text, \\the age range of the person e.g. 30-44
xMarSta number, \\representing the marital satus of the person
)
Relatives \\this table stores the possible R01 numbers and their text equivalents
(
ID Primary Key, \\all possible R01 values
Relationship text, \\meaning of the corisponding R01 values
)
xMarSta \\this table store the possible xMarSta values and their text equivalents
(
ID Primary Key \\all possible xMarSta values
Marital text, \\meaning of corresponding R01 values
)
The query is:
HsHld - the goal of this query is to produce for each house (i.e. each aacode) a text sting describing the house in the form [Marital][AgeCat][Relationship][AgeCat][Relationship][AgeCat] etc. So an output for a three person house might look like Married(30-44)Spouse(30-44)Child(1-4)
I know my current code for HsHld is terrible, but it is included below:
SELECT People.ID, People.aacode, People.PERSNO,
People.HRP, People.DVHsize, xMarSta.Marital,
[Marital] & " (" & [AgeCat] & ")" & [RAL2] & [RAge2] &
[RAL3] & [RAge3] & [RAL4] & [RAge4] & [RAL5] & [RAge5] &
[RAL6] & [RAge6] & [RAL7] & [RAge7] & [RAL8] & [RAge8] AS HsTyp,
(SELECT Fam2.R01 FROM People AS Fam2 WHERE Fam2.aacode = People.aacode
AND Fam2.PERSNO = 2) AS Rel2,
(SELECT Fam3.R01 FROM People AS Fam3 WHERE Fam3.aacode = People.aacode
AND Fam3.PERSNO = 3) AS Rel3,
Switch([Rel2] Is Null,Null,[Rel2]=-9,'DNA',[Rel2]=-8,'NoAns',
[Rel2]=1,'Spouse',[Rel2]=2,'Cohabitee',[Rel2]<7,'Child',
[Rel2]<10,'Parent',[Rel2]<15,'Sibling',[Rel2]=15,'Grandchild',
[Rel2]=16,'Grandparent',[Rel2]=17,'OtherRelative',
[Rel2]=20,'CivilPartner',True,'Other') AS RAL2,
Switch([Rel3] Is Null,Null,[Rel3]=-9,'DNA',[Rel3]=-8,'NoAns',
[Rel3]=1,'Spouse',[Rel3]=2,'Cohabitee',[Rel3]<7,'Child',
[Rel3]<10,'Parent',[Rel3]<15,'Sibling',[Rel3]=15,'Grandchild',
[Rel3]=16,'Grandparent',[Rel3]=17,'OtherRelative',
[Rel3]=20,'CivilPartner',True,'Other') AS RAL3,
(Select FAge2.AgeCat FROM People AS FAge2
WHERE FAge2.aacode = People.aacode
AND FAge2.PERSNO = 2
) AS RAge2,
(Select FAge3.AgeCat FROM People AS FAge3
WHERE FAge3.aacode = People.aacode AND FAge3.PERSNO = 3
) AS RAge3
FROM Relatives
RIGHT JOIN (xMarSta RIGHT JOIN People ON xMarSta.ID=People.xMarSta)
ON Relatives.ID=People.R01
WHERE (((People.HRP)=[People.PERSNO]))
ORDER BY People.aacode;
There are several key things that need to change.
At the moment I can't get a join from the Rel field to the Relatives
table to work, so I am using a Switch function called RAL there must
be a better way.
For simplicity in the post I have only included Rel2 & Rel3 etc but in the actual code it goes up to Rel13! So the problem of performance is even worse.
I want to replace these subqueries with joins, but as the subquery
looks into another record in the same table I am unsure how to go
about this.
I'm very out of my depth with this, I know a little SQL but the
complexity of this problem is too much for my limited knowledge

The very first thing is that you have a relational situation but the table structure you have is using columns to represent relationships. This gives you the R01, R02, R03 ... R13 columns on your table. Unfortunately you will not be able to change performance dramatically because your table structure is repetitive denormalized instead of relational. This means that your query will need all this repetitive code, as you mentioned repeating 13 times. That also means that your switch function can be replaced by a join but again will be repeated 13 times.
Right, now back to your query, you have multiple sub-selects on your query and you need to join the related tables on a left join on the FROM clause and use the new related alias on your select. now you will see on the example below that for each R01, R02 field you will have a Fam2, Fam3 relation and you will need to do this 13 times on your case, and for each one you need to link to the relatives table (as i did called Relat2, Relat3, etc). Now if you can change your database structure for a normalized structure, you could really simplify this query and use much simpler joins.
See if this one helps you understand the process:
SELECT People.ID, People.aacode, People.PERSNO,
People.HRP, People.DVHsize, xMarSta.Marital,
[Marital] & " (" & [People.AgeCat] & ")" & [RAL2] & [RAge2] &
[RAL3] & [RAge3] AS HsTyp,
Fam2.R01 AS Rel2,
Fam3.R01 AS Rel3,
Relat2.Relationship as RAL2,
Relat3.Relationship as RAL3,
Fam2.AgeCat AS RAge2,
Fam3.AgeCat AS RAge3
FROM (((((People
LEFT JOIN (People AS Fam2) ON (Fam2.aacode = People.aacode and Fam2.PERSNO = 2))
LEFT JOIN (Relatives as Relat2) on Relat2.Id = Fam2.R01)
LEFT JOIN (People as Fam3) ON (Fam3.aacode = People.aacode AND Fam3.PERSNO = 3))
LEFT JOIN (Relatives as Relat3) on Relat3.Id = Fam3.R01)
LEFT JOIN xMarSta ON xMarSta.ID=People.xMarSta)
WHERE (People.HRP=[People.PERSNO])
ORDER BY People.aacode;

Joining a table to it self is done with an alias
e.g.
Select * From [Table1] Join [Table1] t1 on T1.SomeField = Table1.SomeOtherField
etc..
Probaly won't have time to fix it, but the real problem is where you've denormalised with R01, R02 etc.
You should have an other table
RelationshipID
PersonFrom
PersonTo
You need to manage that though when creating relations and it will mean changes to your UI and logic.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas