How to extract repeated values from a database table - sql

Is it possible extract the values of 'score' as shown below when multiple values of passageId are identical for each value of userId.
userId passageId score
1 1 2
1 2 3
1 1 4
1 1 5
2 1 3
2 3 3
2 3 4
Result:
userId passageId scores
1 1 2, 4, 5
1 2 3
2 1 3
2 3 3, 4
I was advised to use the following code, but I need more than two values to be extracted:
SELECT
userId,
passageId,
min(score) as score_1,
max(score) as score_2
FROM mytable
GROUP BY
userId,
passageId
HAVING COUNT(*)>=2;
I was also advised to use string_agg but could not make it work in pgAdmin.

pgadmin suggests that you are using Postgres. That in turn suggests that you should use string_agg() or array_agg():
SELECT userId, passageId, ARRAY_AGG(score)
FROM mytable
GROUP BY userId, passageId
HAVING COUNT(*) >= 2;

Related

SUM a column in SQL, based on DISTINCT values in another column, GROUP BY a third column

I'd appreciate some help on the following SQL problem:
I have a table of 3 columns:
ID Group Value
1 1 5
1 1 5
1 2 10
1 2 10
1 3 20
2 1 5
2 1 5
2 1 5
2 2 10
2 2 10
3 1 5
3 2 10
3 2 10
3 2 10
3 4 50
I need to group by ID, and I would like to SUM the values based on DISTINCT values in Group. So the value for a group is only accounted for once even though it may appear multiple for times for a particular ID.
So for IDs 1, 2 and 3, it should return 35, 15 and 65, respectively.
ID SUM
1 35
2 15
3 65
Note that each Group doesn't necessarily have a unique value
Thanks
the CTE will remove all duplicates, so if there a sdiffrenet values for ID and Group, it will be counted.
The next SELECT wil "GROUP By" ID
For Pstgres you would get
WITH CTE as
(SELECT DISTINCT "ID", "Group", "Value" FROM tablA
)
SELECT "ID", SUM("Value") FROM CTE GROUP BY "ID"
ORDER BY "ID"
ID | sum
-: | --:
1 | 35
2 | 15
3 | 65
db<>fiddle here
Given what we know at the moment this is what I'm thinking...
The CTE/Inline view eliminate duplicates before the sum occurs.
WITH CTE AS (SELECT DISTINCT ID, Group, Value FROM TableName)
SELECT ID, Sum(Value)
FROM CTE
GROUP BY ID
or
SELECT ID, Sum(Value)
FROM (SELECT DISTINCT * FROM TableName) CTE
GROUP BY ID

How to find the most frequently repeated column?

ID UserID LevelID
1 1 1
2 1 2
3 1 2
4 1 2
5 2 1
6 2 3
7 3 2
8 4 1
9 4 1
The query should return: LevelID: 1 (3 times) - the LevelID column that is most frequently repeated by different Users (UserID).
I have the following query:
SELECT LevelID, COUNT(LevelID) AS 'Occurrence'
FROM
(
SELECT DISTINCT * FROM
(
SELECT UserID, LevelID
FROM SampleTable
) cv
) levels
GROUP BY LevelID
ORDER BY 'Occurrence' DESC
Which returns:
LevelID Occurence
1 3
2 2
3 1
But it doesn't let me to add LIMIT 1; at the bottom to retrieve the first top row of the selection. What's wrong with the query?
There is no need for these several levels of nesting. Consider using aggregation, count(distinct ...), ordering the results and using a row-limiting clause to keep the top record only:
select top(1) levelID, count(distinct userID) cnt
from mytable
group by levelID
order by cnt desc
If you want to allow possible top ties, then use top (1) with ties instead of just top (1).

SQL: how to use row_number() function to assign the same number for rows with duplicate ids in a repeating format

I have a table with two columns personid and taskid and want to use the ROW_NUMBER function to add a row that counts up to 3 but will duplicate the number as it counts if there are multiple rows for a personid.
The code below is only ordering by personid and repeating after the number 3, but I need it to order by personid and only go to the next number after all the taskid's for the personid are assigned to one number, or essentially any duplicate personid's I want to make sure they all only get one number assigned to it.
Select
personid,
taskid,
1 + ( (row_number() over (order by personid) - 1) % 3) as numberCount
from taskTable
Current Table Being Queried From:
PersonId Taskid
1 1
1 2
1 6
2 3
3 8
3 10
4 9
4 4
4 5
5 7
5 11
5 12
Expected Results After Query:
PersonId Taskid numberCount
1 1 1
1 2 1
1 6 1
2 3 2
3 8 3
3 10 3
4 9 1
4 4 1
4 5 1
5 7 2
5 11 2
5 12 2
Try this below script using DENSE_RANK -
SELECT *,
(DENSE_RANK() OVER(ORDER BY PersonId)-1)%3 + 1 AS numberCount
FROM your_table
I think you want dense_rank() and modulo arithmetic:
select t.*,
(dense_rank() over (order by personId) - 1) % 3) + 1 as numberCount
from t;
Note: The syntax for modulo arithmetic may vary in your database. Typically it is one of mod(), the % operator, or using mod as an operator.

SQL query to take top elements of ordered list on Apache Hive

I have the table below in an SQL database.
user rating
1 10
1 7
1 6
1 2
2 8
2 3
2 2
2 2
I would like to keep only the best two ratings by user to get:
user rating
1 10
1 7
2 8
2 3
What would be the SQL query to do that? I am not sure how to do it.
It will work
;with cte as
(select user,rating, row_number() over (partition by user order by rating desc) maxval
from yourtable)
select user,rating
from cte
where maxval in (1,2)

Sum rows with the same ID

The following is an example of what I have to work with.
Sample data :
ID RANK
---------
1 2
1 3
2 4
2 1
3 2
2 3
4 2
SQLFiddle
I am trying to combine the rows with like IDs and sum the RANKs for these IDs into a single row:
ID SUM(rank)
1 5
2 8
3 2
4 2
You can use sum aggregate function together with the group by clause:
select [ID]
, sum([RANK])
from [STUFF]
group by [ID]
SQLFiddle