How write sql query group by? - sql

I have a ERD. But I want to write a sql query.
The meaning is that you can select all columns of artgrp of regroupid 11 grouped by artdept.
I have this:
Select *
From artgrp
Where regroudid = "11"
Group by artdept;
My question is: how can I write: select all columns of artgrp group by the columns of artdept?
Here is my model

SELECT d.description, d.lifetime, d.name, COUNT(d.artdeptid) as Departments
FROM artgrp g INNER JOIN arddept d ON g.artdeptid = d.artdeptid
WHERE regroudid = "11"
GROUP BY d.description, d.lifetime, d.name

Group by is used to identify the count, max, min, avg, etc in a cluster of data. To be more clear say for instance you have Cars table with fields make, color, price. And you want to see the count of cars in different colors(this will be cluster of different colors) you can use the following query
select count(1),color from cars group by color;
Output will look like this
Blue 3
Grey 17
Red 5
Note: whatever column you use in group by will be used in select columns as well. In the above example I grouped by using color if add more fields say for instance make it will have two clusters(color,make) output would be
Ford Blue 3
Ford Grey 7
Honda Red 5
Honda Grey 10
So you can identify what is the function you need to perform before grouping your data like count, min, max, avg, rank etc. And if you want all the fields in your select clause you will have to use analytical function. Edit your question with sample data also with expected output, I can give you the answer with analytical query as well if required.
Thanks for editing your question sample data will be more clear, still I will go ahead with what I understood. I am using analytical function here as solution
Select artgrpid,description,descid,relgroupid,
artdeptid,default,name,brand,questions,
ROW_NUMBER() over (Prtition by artdeptid order by artdeptid) from
artgrp;
you can use rank() or count(1) etc instead of ROW_NUMBER().

Related

error when Im using group by in select query

I have this query:
select id, convert(nvarchar(10), pubdate, 102) as pubdate,
channel_title, title, description, link, vertinimas
from table1
where statusid > 0
and channel_title = 'channel1'
group by title
order by pubdate desc
to exclude duplicate entries in the field "title" i added group by title in the end, but an error occurs:
"is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
GROUP BY clause can only be used with aggregate functions like count(), min(), max(), sum() etc. The select query can only select the columns which are part of GROUP BY clause or on which you are applying an aggregate function.
For example you have a STUDENT table like below:
ID
NAME
SUBJECT
MARKS
1
FOO
ENGLISH
80
2
FOO
MATH
70
3
BAR
ENGLISH
100
4
BAR
MATH
50
5
ZIL
ENGLISH
90
6
ZIL
MATH
75
you can write a query like:
SELECT NAME, SUM(MARKS) AS TOTAL FROM STUDENT GROUP BY NAME;
Hear in the above query NAME is part of your GROUP BY clause and we are applying sum() aggregate function on column on MARKS. This will give us a result like below:
NAME
MARKS
FOO
150
BAR
150
ZIL
165
In your query above in the post, only title is part of GROUP BY column. Rest all the column like id, pubdate, channel_title, title, description, link, vertinimas, they are neither part of GROUP BY clause nor passed as a parameter in any aggregate function.
If you want to find / exclude / delete duplicate rows, you can checkout this blog post. This guy has explained it pretty well. Here is the like to find and delete duplicate records!

How do I SELECT minimum set of rows to cover all possible values of each columns in SQL?

I am running a SQL query to get data from a table to map all different possible values of all categories represented by each columns.
How do I run the SELECT query such that it returns the minimum number of rows just enough to include all possible values of all columns?
For example, if I have a table of 10 rows and 3 columns, each column containing 3 possible values:
TABLE sales
--------------------------------
brandID color size
--------------------------------
2 red big
3 blue big
2 blue big
2 red small
2 blue medium
3 green small
3 red big
1 green medium
2 red medium
2 blue big
Of course I could SELECT all rows from table without filter, but that would be an expensive query of 10 rows.
However, as you can see, if we filter the SELECT query to only return the following rows below, it is possible to cover all the possible values of all columns:
1,2,3 for brandID
red,blue,green for color
big,small,medium for size
--------------------------------
brandID color size
--------------------------------
3 blue big
2 red small
1 green medium
How do I do that in SQL query?
This one does what you expect:
select b.brandid, c.color, s.size
from (
select brandid, row_number() over (order by brandid) as rn
from sales
group by brandid
) b
full join (
select color, row_number() over (order by color) as rn
from sales
group by color
) c on b.rn = c.rn
full join (
select size, row_number() over (order by size) as rn
from sales
group by size
) s on b.rn = s.rn;
Online example: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e72e7d1dfed43825025c5703b5d3671a
But this only works properly, if you have the same number of (distinct) brands, colors and sizes. If you have e.g. 5 brands, 6 colors and 7 sizes the result is rather "strange":
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4417a4d97ecf7601364f09d65f6522fa
First, a query that returns ten rows is not "expensive".
Second, this is a very hard problem. It involves looking at all combinations of rows to see if the set has all combinations of columns. I suspect that any algorithm will need to basically search through all possible combinations -- although there may be some efficiencies, such as automatically including all rows with a unique value in any column.
As a hard problem involving comparing zillions of sets, SQL is not really an appropriate language for addressing the issue.
This is a rather weird requirement... But you might try something along this:
DECLARE #sales TABLE(BrandID INT, color VARCHAR(10),size VARCHAR(10));
INSERT INTO #sales VALUES
(2,'red', 'big'),
(3,'blue', 'big'),
(2,'blue', 'big'),
(2,'red', 'small'),
(2,'blue', 'medium'),
(3,'green', 'small'),
(3,'red', 'big'),
(1,'green', 'medium'),
(2,'red', 'medium'),
(2,'blue', 'big');
WITH AllBrands AS (SELECT ROW_NUMBER() OVER(ORDER BY BrandID) AS RowInx, BrandID FROM #sales GROUP BY BrandID)
,AllColors AS (SELECT ROW_NUMBER() OVER(ORDER BY color) AS RowInx, color FROM #sales GROUP BY color)
,AllSizes AS (SELECT ROW_NUMBER() OVER(ORDER BY size) AS RowInx, size FROM #sales GROUP BY size)
SELECT COALESCE(b.RowInx,c.RowInx,s.RowInx) AS RowInx
,b.BrandID
,c.color
,s.size
FROM AllBrands b
FULL OUTER JOIN AllColors c ON COALESCE(b.RowInx,c.RowInx)=c.RowInx
FULL OUTER JOIN AllSizes s ON COALESCE(b.RowInx,c.RowInx,s.RowInx)=s.RowInx;
This solution is similar to #a_horse_with_no_name's, but avoids gaps in the result in case of unequal counts of values per column.
The idea in short:
We create a numbered set of all distinct values per column and join all sets on this number. As we don't know the count in advance I use COALESCE to pick the first value, which is not null.
This is not a good problem if you demand ONE AND ONLY ONE query and ONE AND ONLY ONE of each result set, and ONE AND ONLY ONE instance of each result. As Gordon Linoff accurately put: that is not a problem for SQL.I get that maybe you have a MUCH larger table, but he's absolutely right.
But add another layer, and you can have exactly what you want, with all the efficiency you want, and a readable output. Use a cursor and some basic SELECT from dynamic SQL with a SELECT columns.name from sys.tables JOIN sys.columns ON tables.object_id = columns.object_id, if you absolutely have to do this with TSQL alone.
And if you're willing to build a basic application with any framework with a SQL driver, you can just SELECT DISTINCT FROM < and put the various results into arrays.
Alternatively: reword your question, with the understanding that the results of any SQL query are gonna be x rows by x columns. Not an array for each column.
I think your example confuses things by having exactly 3 values for each field, which makes the requested result seem like a reasonable thing to expect. But what happens when two more brands are added, or a new colour? Then what would you expect to be returned?
Really you are asking three questions, so I feel this should be done as three queries:
"What are the different brands?"
"What are the different colours?"
"What are the different sizes?"
If they need to be displayed in a neat table, stitch them together afterwards in your application layer. You could maybe do it in the SQL with something like a_horse_with_no_name suggests, but really its the wrong place.

sql count or sum?

I'm a newbie programmer, I want to sum a value of employee's attendance record
Anyway, what should I choose? COUNT or SUM?
I tried to use COUNT functions like this...
SELECT COUNT(jlh_sakit) AS sakit FROM rekap_absen
It shows value changed to "1" for 1 Record only.
And I try to use SUM functions like this...
SELECT SUM(jlh_sakit) AS sakit FROM rekap_absen
It shows all values changed ALL value to "1"
I want to display only 1 person for each sum
(e.g : John (2 sick, 2 permissions, 1 alpha)
Can you help me please?
If you are using any aggregate function like min/max/sum/count you should use group by. Now your question says "what should I choose? COUNT or SUM?" Assuming you have person_name, jlh_sakit which means sick/permission/alpha in your case you could use
select person_name, count(jhl_sakit) as attribute
from rekap_absen
group by person
This will give you output like:
person_name attribute
John 2
King 5
In order to sum by specified column, use group by statement.
SELECT SUM(sick),SUM(alpha),SUM(permissions),person FROM rekap_absen group by person
It will group your sums according to person.
You may name your sums like:
SELECT SUM(sick) as sick,SUM(alpha) as alpha,SUM(permissions) as permissions,person FROM rekap_absen group by person
Assuming that you have table rekap_absen with columns: person,sick,alpha,permissions

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.

MySQL: Getting highest score for a user

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks
Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.
Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!
Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'