Complex SQL query aggregation and grouping on athena - sql

I have a table like this:
| db | chat_id | Admin | user |
+-------------+-------------------+------------+---------------+
| db_1 | chat_id1 | max | greg |
| db_1 | chat_id2 | max | bob |
| db_1 | chat_id3 | max | greg |
| db_1 | chat_id2 | helen | greg |
| db_2 | chat_id1 | alan | greg |
I would like to retrieve the number of chat performed by users for each database (db) and the last part where I fail, retrieve also a list of all mentors by users.
The final output should be like this for example (notice there is only one time max for greg in the admin column)
| db | user | nb_of_chat | admins |
+-------------+---------------+--------------+---------------+
| db_1 | greg | 3 | max, helen |
| db_1 | bob | 1 | max |
| db_2 | greg | 1 | alan |
I wrote the following query but it doesn't aggregate the admins and i have separated nb_of chats/mentors.
SELECT db, user, COUNT(chat_id), admins
FROM "chat_db"."chats"
GROUP BY db, user, admins;
As expected I am getting the following result (but I only want it on one line by db/user with grouped admin in the same column):
| db | user | nb_of_chat | admins |
+-------------+---------------+--------------+---------------+
| db_1 | greg | 2 | max |
| db_1 | greg | 1 | helen |
| ... | ... | ... | ... |
Have you an idea how to perform it ?
Thank you for your time !
Regards.

Firsly, remove adminsfrom the group by clause, since you want to aggregate it. Then, in Presto, you can do string aggregation as follows:
select db,user, count(*) no_of_chats,
array_join(array_agg(admins), ', ') all_admins
from "chat_db"."chats"
group by db, user;
You can add an order by clause to array_agg() if needed:
select db,user, count(*) no_of_chats,
array_join(array_agg(admins order by admins), ', ') all_admins
from "chat_db"."chats"
group by db, user;
Note that I changed count(chat_id) to count(*): both are equivalent (since chat_id probably is a non-nullable column), and the former is (sligthly) more efficient, and makes the intent clearer in my opinion.

Try using array_agg():
select db, user, count(chat_id), array_agg(admins)
from "chat_db"."chats"
group by db, user;
If you want one row per db:
select db, count(*) as num_chats, count(distinct user) as num_users, array_agg(admins)
from "chat_db"."chats"
group by db;

Related

How do I get values that are themselves not unique, but are linked to unique fields(in SQL)?

I can't give the actual table, but my problem is something like this:
Assuming that there is a table called Names with entries like these:
+--------------+
| name | id |
+--------------+
| Jack | 1001 |
| Jack | 1022 |
| John | 1010 |
| Boris | 1092 |
+--------------+
I need to select all the unique names from that table, and display them(only names, not ids). But if I do:
SELECT DISTINCT name FROM Names;
Then it will return:
+-------+
| name |
+-------+
| Jack |
| John |
| Boris |
+-------+
But as you can see in the table, the 2 people named "Jack" are different, since they have different ids. How do I get an output like this one:
+-------+
| name |
+-------+
| Jack |
| Jack |
| John |
| Boris |
+-------+
?
Assuming that some ids can or will be repeated(not marked primary key in question)
Also, in the question, the result will have 1 column and some number of rows(exact number is given, its 18,013). Is there a way to check if I have the right number of rows? I know I can use COUNT(), but while selecting the unique values I used GROUP BY, so using COUNT() would return the counts for how many names have that unique id, as in:
SELECT name FROM Names GROUP BY id;
+------------------+
| COUNT(name) | id |
+------------------+
| 2 | 1001 |
| 1 | 1022 |
| 1 | 1092 |
| 3 | 1003 |
+------------------+
So, is there something to help me verify my output?
You can use group by:
select name
from names
group by name, id;
You can get all the distinct persons with:
SELECT DISTINCT name, id
FROM names
and you can select from the above query only the names:
SELECT name
FROM (
SELECT DISTINCT name, id
FROM names
)

Audit data migration into Oracle

I am having a task to migrate data from another database to Oracle database.
And data from previous database has audit information, i.e. tracking of create/update of records with update_time and update_user. For simplicity, let's assume the previous database I am talking about is an excel file of the following format:
Key | Value | Update_Time | Update_User |
----|-------|-------------|-------------|
a | 1 | 23/04/2020 | user1 |
b | 2 | 21/04/2020 | user2 |
a | 3 | 20/04/2020 | user1 |
a | 4 | 19/04/2020 | user5 |
a | 5 | 18/04/2020 | user2 |
What is the best practice to move data into Oracle such that user can still query those audit info along with the new audit given that the data is now being saved to a new table in Oracle below? Does Oracle provide any native solution for this? I try Oracle Flashback, but not sure how to include those previous audit, because as I understand, we can only query Flashback for data change from now on. Ideally, I want to store only the latest data table in Oracle like this, as they are the actual active data:
Key | Value | Last_Update_Time | Last_Update_User |
----|-------|------------------|------------------|
a | 1 | 23/04/2020 | user1 |
b | 2 | 21/04/2020 | user2 |
Let's say user continue edit row with key b on 24/04/2020, then I want to fetch those result for UI display (currently I am using python sqlalchemy to access the db, but a solution with a sql query should be fine for the start)
Key | Value | Update_Time | Update_User |
----|-------|-------------|-------------|
b | 7 | 24/04/2020 | user2 | ---> this is an update on the new oracle table above
a | 1 | 23/04/2020 | user1 | ---> those rows below I want to somehow load into the oracle without explicitly create a new table for it
b | 2 | 21/04/2020 | user2 |
a | 3 | 20/04/2020 | user1 |
a | 4 | 19/04/2020 | user5 |
a | 5 | 18/04/2020 | user2 |
After the change, the main data table in Oracle should look below
Key | Value | Last_Update_Time | Last_Update_User |
----|-------|------------------|------------------|
a | 1 | 23/04/2020 | user1 |
b | 7 | 24/04/2020 | user2 |
YOu can use the below select query
SELECT AD.* FROM Audit_table AD,
(SELECT Key,Max(Update_time) Updated_Time,Last_updated_USer
From Audit_table
group by Key,Last_updated_USer)rec
where AD.Key=rec.Key
AND AD.Updated_Time=rec.Updated_Time
AND AD.Last_updated_USer=rec.Last_updated_USer;

Set inclusion in SQL

The quest is to check if one set fully includes another. As simplified example we can take four tables:
worker (id, name),
worker_skills (worker_id, skill),
job (id, type)
job_required_skills (job_id, skill)
I want to match the worker to the job but only if job required skills are fully match worker skills, i. e. if worker has some skills which are not required on job it's ok, but if job has at least one skill which worker doesn't then they don't match.
All I can think of includes ridiculous amount of joins and can't be used as a serious solution, so any advices are highly appreciated. Database is postgres 9.6. Thanks!
EDIT:
Some sample data:
+------+---------------+
| name | worker_skills |
+------+---------------+
| John | java |
| John | sql |
| John | ruby |
| Jane | js |
| Jane | html |
+------+---------------+
+---------------------+-------------+
| type | job_skills |
+---------------------+-------------+
| Writing_queries | sql |
| Writing_queries | black_magic |
| Generic_programming | java |
| Frontend_stuff | js |
| Frontend_stuff | html |
+---------------------+-------------+
Result:
+------+---------------------+
| John | Generic_programming |
+------+---------------------+
| Jane | Frontend_stuff |
+------+---------------------+
John is perfectly qualified for Generic_programming (the only needed skill is in his skillset) but can't do Writing_queries as it requires some black_magic; Jane can do Frontend_stuff as she has both required skills.
You can use a left join and aggregation:
select jrs.id, ws.id
from job_required_skills jrs left join
worker_skills ws
on jrs.skill = ws.skill
group by jrs.id, ws.id
having count(*) = count(ws.skill)

Postgres: How to combine columns into the same row value

How would I combine multiple columns that could fit into the same row instead of having the same row display many times?
flight | Manager | Lead | Worker
---------------------|-----------|-------|--------
Arizona_BGS_Flight_2 | John | |
Arizona_BGS_Flight_2 | | Will |
Arizona_BGS_Flight_2 | | | James
Utah_UTS_Flight_5 | John | |
Into:
flight | Manager | Lead | Worker
---------------------|-----------|-------|--------
Arizona_BGS_Flight_2 | John | Will | James
You can use aggregation:
select flight, max(manager) as manager, max(lead) as lead, max(worker) as worker
from t
group by flight;

Aggregate function in Tuple Relational Calculus

How do you translate a COUNT or a GROUP BY or any other aggregate function you find in SQL into TRC, i can't find any way on internet.
So I have a table User
+----+----------+--+
| | User | |
+----+----------+--+
| pk | email | |
| | password | |
| | ... | |
+----+----------+--+
And a table frienship
+----+-------------+--+
| | FriendShip | |
+----+-------------+--+
| pk | user1_email | |
| pk | user2_email | |
| | date | |
| | accepted | |
+----+-------------+--+
And the following query in SQL:
SELECT *
FROM user u
LEFT OUTER JOIN friendship f ON (f.user1_email = u.email
OR f.user2_email = u.email)
GROUP BY u.email
HAVING COUNT(u.email) < 3
I would like to transform this query into tuple relational Calculus, the JOIN and the SELECT are pretty straightforward, but for the GROUP BY and the COUNT I don't know.
Thanks,
As Lennart says, it's not possible to express those functions so, I decided to transform the count in another way.
First let's assert the following predicate:
Then we can say that having 2 or less friends, is having 0 friends, 1, or 2. To have 1 friend is like saying that there exists a friend (friend1) for wich Friends(me, friend1) is true.
To have 2 friends, you must have 1 friend and another, different. And finally you must not have any more friend.
All this can be express like this:
I don't think you can express aggregate functions in neither TRC nor RA. However, there have been proposals to extend them, see for example:
http://cis.csuohio.edu/~matos/notes/cis-612/NestedRelations/Extending%20Relational%20Algebra%20and%20Relational%20Calculus%20with%20Se.pdf