Creating views with group by not grouping properly - sql

I have data that looks like Music92838, Entertainment298928, SPORTS2837 etc. in my Event_type column, and I'm trying to create a view that groups the number of performances by event_type
I tried to do
CREATE VIEW Performances_Type_Cnt
AS SELECT regexp_replace(E.event_type, '[^a-zA-Z]', '', 'g') AS Event_Type,
COUNT(*)
FROM Event_Type E, Performance P
WHERE E.event_id = P.event_id
GROUP BY Event_Type;
Using regex to only select characters, and then group by Music, Sports etc in the Group By Event_type. But something isn't working as in my results I'm getting
event_type | count
---------------+-------
MUSIC | 1
SPORTS | 5
MUSIC | 8
MUSIC | 3
where Music appears more than once in event_type, which isn't the correct result.
Any and all help appreciated!

The "problem" is that Postgres allows column aliases in the GROUP BY. So, it is confused as to whether the EVENT_TYPE comes from the table or the alias. One simple solution is to use positional notation:
CREATE VIEW Performances_Type_Cnt AS
SELECT regexp_replace(E.event_type, '[^a-zA-Z]', '', 'g') AS Event_Type,
COUNT(*) as cnt
FROM Event_Type E JOIN
Performance P
ON E.event_id = P.event_id
GROUP BY 1;
I made some other changes:
Replaced the implicit join with an explicit JOIN. This is the standard way to write joins in SQL.
Added a column alias for COUNT(*).

Related

Postgres: Count multiple events for distinct dates

People of Stack Overflow!
Thanks for taking the time to read this question. What I am trying to accomplish is to pivot some data all from just one table.
The original table has multiple datetime entries of specific events (e.g. when the customer was added add_time and when the customer was lost lost_time).
This is one part of two rows of the deals table:
id
add_time
last_mail_time
lost_time
5
2020-03-24 09:29:24
2020-04-03 13:20:29
NULL
310
2020-03-24 09:29:24
NULL
2020-04-03 13:20:29
I want to create a view of this table. A view that has one row for each distinct date and counts the number of events at this specific time.
This is the goal (times do not match with the example!):
I have working code, like this:
SELECT DISTINCT
change_datetime,
(SELECT COUNT(add_time) as add_time_count FROM deals WHERE add_time::date = change_datetime),
(SELECT COUNT(lost_time) as lost_time_count FROM deals WHERE lost_time::date = change_datetime)
FROM (
SELECT
add_time::date AS change_datetime
FROM
deals
UNION ALL
SELECT
lost_time::date AS change_datetime
FROM
deals
) AS foo
WHERE change_datetime IS NOT NULL
ORDER BY
change_datetime;
but this has some ugly O(n2) queries and takes a lot of time.
Is there a better, more performant way to achieve this?
Thanks!!
You can use a lateral join to unpivot and then aggregate:
select t::date,
count(*) filter (where which = 'add'),
count(*) filter (where which = 'mail'),
count(*) filter (where which = 'lost')
from deals d cross join lateral
(values (add_time, 'add'),
(last_mail_time, 'mail'),
(lost_time, 'lost')
) v(t, which)
group by t::date;

Select latest from joined table excluding duplicates

I have two joined tables, parent one shows unit's name, child shows recording temperatures, that can be inserted either by automatic process (AUTO) or by user. So for given unit reading records from simple join would look like
UNIT TEMP TIMESTAMP DATA_SOURCE
ABC -20 10:26 AUTO
ABC -19 11:27 USER
ABC -19 11:27 AUTO
The goal is to select the latest temp reading. I can use subquery to do so:
SELECT A.UNIT, B.TEMP, B.TIMESTAMP,B.DATA_SOURCE
FROM units_table A left outer join readings_table B on A.Gkey=B.unit_gkey
WHERE B.TIMESTAMP=
(SELECT MAX(TIMESTAMP) FROM readings_table B1
WHERE A.Gkey=B1.unit_gkey)
It would be simple but in the example above there are two exact timestamps, so I will get TWO readings. In such case I'd like to ignore the AUTO source. Is there an elegant way to do it?
Edit: to be clear I want only ONE ROW result:
ABC -19 11:27 USER
You can do this with row_number() instead:
SELECT ut.UNIT, rt.TEMP, rt.TIMESTAMP, rt.DATA_SOURCE
FROM units_table ut left outer join
(SELECT rt.*,
row_number() over (partition by rt.unit_Gkey
order by timestamp desc,
(case when rt.data_source = 'AUTO' then 1 else 0 end)
) as seqnm
FROM readings_table rt
) rt
on rt.unit_Gkey = ut.gkey
WHERE rt.seqnum = 1;
Note: if you wanted the duplicates, you would use rank() or dense_rank() instead of row_number() (and remove the second clause in the order by).
http://www.w3schools.com/sql/sql_distinct.asp
just use the distinct key word look at the example! :)

Oracle - group by of joined tables

I tried to look for an answer and I found more advices, but not anyone of them was helpful, so I'm trying to ask now.
I have two tables, one with distributors (columns: distributorid, name) and the second one with delivered products (columns: distributorid, productid, corruptcount, date) - the column corruptcount contains the number of corrupted deliveries. I need to select the first five distributors with the most corrupted deliveries in last two months. I need to select distributorid, name and sum of corruptcount, here is my query:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
GROUP BY del.distributorid
But Oracle returns error message: "not a GROUP BY expression".And when I edit my query to this:
SELECT del.distributorid, d.name, del.corruptcount-- , SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
--GROUP BY del.distributorid
It's working as you expect and returns correct data:
1 IBM 10
2 DELL 0
2 DELL 1
2 DELL 6
3 HP 3
8 ACER 2
9 ASUS 1
I'd like to group this data. Where and why is my query wrong? Can you help please? Thank you very, very much.
I think the problem is just the d.name in the select list; you need to include it in the group by clause as well. Try this:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d join
delivery del
on d.distributorid = del.distributorid
WHERE d.distributorid IN
(SELECT distributorid
FROM delivery
WHERE storeid = 1 AND
"date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE AND
ROWNUM <= 5
GROUP BY distributorid
ORDER BY SUM(corruptcount) DESC
)
GROUP BY del.distributorid, d.name;
I also switched the query to using explicit join syntax with an on clause, instead of the outdated implicit join syntax using a condition in the where.
I also removed the additional layer of subquery. It is not really necessary.
EDIT:
"Why does d.name have to be included in the group by?" The easy answer is that SQL requires it because it does not know which value to include from the group. You could instead use min(d.name) in the select, for instance, and there would be no need to change the group by clause.
The real answer is a wee bit more complicated. The ANSI standard does actually permit the query as you wrote it. This is because id is (presumably) declared as a primary key on the table. When you group by a primary key (or unique key), then you can use other columns from the same table just as you did. Although ANSI supports this, most databases do not yet. So, the real reason is that Oracle doesn't support the ANSI standard functionality that would allow your query to work.

INNER JOIN with position in second table

I'm running a CodeIgniter backend for a game, where the user info and a highscore table are saved in the database. Every user can only enter the highscore with his personal best, so there is only a single entry per user
Now I need to export that data for the customer so he can pick a random winner.
Basically I want to give him all the user data plus the users points and his position in the highscore table.
I'm having problems figuring out how to get the position in the highscore table.
Let's say my user tables looks like this
user_table
| id|name|
|...| ...|
highscore_table
|id |user_id|points|
|...| ... | ... |
The SQL statement, not including the highscore position, looks like this
SELECT user_table.id, name, points
FROM user_table
INNER JOIN highscore_table ON user_table.id = highscore_table.user_id
ORDER BY name
The exported data should look like this
|id | name|points|hs_position|
|...| ... | ... | ... |
Usually when I request the highscore, I do something similar but sort the data by points. This is just not an option here.
Can somebody tell me how to achieve this, lead me in the right direction or just plain and simple tell me that this is not possible?
Any help is greatly appreciated.
I'm having problems figuring out how to get the position in the
highscore table.
If you're using SQL-Server you can use DENSE_RANK for the HS-Rank and ROW_NUMBER to get the row with the highest point per user:
WITH cte
AS (SELECT user_table.id,
name,
points,
hs_position=Dense_rank()
OVER(
ORDER BY points DESC),
Personalbest_Num=Row_number()
OVER(
partition BY id
ORDER BY points DESC)
FROM user_table
INNER JOIN highscore_table
ON user_table.id = highscore_table.user_id)
SELECT id,
name,
points,
hs_position
FROM cte
WHERE personalbest_num = 1
ORDER BY hs_position ASC,
name ASC
If you are using a RDBMS that does not support ranking functions (such as SQLite), then one way of achieving this is via a subquery - like so:
SELECT u.id, u.name, h.points,
(select count(*) + 1
from highscore_table h2
WHERE h2.points > h.points) hs_position
FROM user_table u
INNER JOIN highscore_table h ON u.id = h.user_id
ORDER BY u.name
SQLFiddle here.
(If you want a dense-ranked high score, change count(*) + 1 to count(distinct h2.points) + 1.)
This is a solution for MySQL:
SELECT user_table.id, name, points, hs_table.rank
FROM user_table
INNER JOIN
(
SELECT hs.*, IF(#prev != points, #pos:=#pos + 1, #pos) as rank
, #prev:=points
FROM highscore_table hs,
(SELECT #pos:=0, #prev:=NULL) vars
ORDER BY points DESC
) hs_table
ON user_table.id = hs_table.user_id
ORDER BY name
People with the same amount of points will have the same rank.

how to create this query

how to create a query if i need to include two aggregate function in select row and per each function i need different group by and where conditions
in my example i need to returns the playerName, and how many the player win the this can be checked if the results in table game result= first, and how many times he played
but do not know how to deal with two aggregate functions .
simply i want to join the result of this two queries
1.
select playeName,count(*)
from player,game
where player.playerId=game.playerId and result="first"
group by game.playerId
2.
select count(*)
from game, player
where game.playerId=player.playerId
group by game.playerId
the set of attributes for table game are
playerId , result
the set of attributes for table player are
playerName,playerId
any idea???
Use:
SELECT p.playername,
SUM(CASE WHEN g.result = 'first' THEN 1 ELSE 0 END),
COUNT(*)
FROM PLAYER p
JOIN GAME g ON g.playerid = p.playerid
GROUP BY p.playername
Along with solutions proposed by OMG Ponies and Bnjmn, you can also get desired results by using WITH ROLLUP
select result, count(*)
from game, player
where game.playerId=player.playerId
group by game.playerId, result WITH ROLLUP
Then, on client side, find records with result equals 'first' and and result is null(which is #games played).