how to sql query 2 tables and group by? - sql

i have 2 tables: activities and users.
users has columns: name, active
activities: name, type, time, user_id.
for example i have these tables:
users
-----
id | name | active
1 | marc | true
2 | john | true
3 | mary | true
4 | nico | true
activities
-----
id | name | type | time | user_id
1 | morn | walk | 90 | 2
2 | morn | walk | 22 | 2
3 | morn | run | 12 | 2
4 | sat | walk | 22 | 1
5 | morn | run | 13 | 1
6 | mond | walk | 22 | 3
7 | morn | walk | 22 | 2
8 | even | run | 42 | 1
9 | morn | walk | 22 | 3
10 | morn | walk | 62 | 1
11 | morn | run | 22 | 3
now i would like to get table that would sum time spent on each type of activity and would group it by user name. so:
result
------
user name | type | time
marc | walk | 84
marc | run | 55
john | walk | 134
john | run | 12
mary | walk | 44
mary | run | 2
nico | walk | 0
nico | run | 0
how should i write this query to get this result?
thanks in advance
gerard

you can use coalesce to get 0 for empty activities and distinct to get all type of possible activities
select
u.name, c.type,
coalesce(sum(a.time), 0) as time
from (select distinct type from activities) as c
cross join users as u
left outer join activities as a on a.user_id = u.id and a.type = c.type
group by u.name, c.type
order by u.name, c.type
sql fiddle demo

Select u.name, a.type, SUM(a.time) FROM
activities a
LEFT JOIN users u
ON a.user_id = u.id
GROUP BY u.name, a.type
FIDDLE
Use this to get zero count as well
SELECT c.name,c.type,aa.time FROM
(Select u.id,u.name, b.type FROM
users u
CROSS JOIN (SELECT DISTINCT type FROM activities) b) c
LEFT JOIN (SELECT a.user_id, a.type, SUM(a.time) as time FROM
activities a
GROUP BY a.user_id, a.type) aa ON
aa.user_id = c.id and c.type = aa.type
Fiddle2

this might work :
select users.name,activities.type,sum(activities.time)
from users left join activities on users.id = activities.user_id
where users.active group by users.name,activities.type

Related

Query to show top 3 records per user where the user has submitted a minimum of 3?

I have in a table in MS SQL with multiple entries per user. I am trying to get the top 3 entries by date for each user. I have a query that returns returns the maximum top 3 entries per user but is also returning users which have submitted 2 or 1 entries. I have a join with another table only to get the email address. I would like it to return only the entries by john and dave as they have 3 entries. If they have more than 3 just return the top 3 by submitmonth.
select * from (
select m.Email, q.submitmonth, q.A2, q.A7, q.C7, q.C8, q.C16, q.F9, q.F10, q.G4, q.H1, q.H2, q.J2, q.J13, q.K18, q.N1, q.P6,
row_number() over (partition by q.userid order by q.submitmonth desc) as Submitted
from dbo.submission q
left join dbo.users m
on q.UserId = m.UserId ) ranks
where Submitted < 4
this returns
| Email | submitmonth | A2 | A7 | Submitted
| | | | |
| john#yahoo.com | 01/08/2020 | 2 | 4 | 1
| john#yahoo.com | 01/07/2020 | 8 | 8 | 2
| john#yahoo.com | 01/06/2020 | 2 | 1 | 3
| bob#gmail.com | 01/08/2020 | 1 | 3 | 1
| bob#gmail.com | 01/07/2020 | 9 | 7 | 2
| pete#yahoo.co.uk | 01/08/2020 | 8 | 5 | 1
| dave#gmail.com | 01/06/2020 | 3 | 6 | 1
| dave#gmail.com | 01/04/2020 | 5 | 6 | 2
| dave#gmail.com | 01/02/2020 | 1 | 6 | 3
Thanks for your help.
Add the count window function and then filter on it.
select *
from (
select m.Email, q.submitmonth, q.A2, q.A7, q.C7, q.C8, q.C16, q.F9, q.F10, q.G4, q.H1, q.H2, q.J2, q.J13, q.K18, q.N1, q.P6
, row_number() over (partition by q.userid order by q.submitmonth desc) as Submitted
, count(*) over (partition by q.userid) TotalSubmitted
from dbo.submission q
left join dbo.users m on q.UserId = m.UserId
) ranks
where Submitted < 4 and TotalSubmitted >= 3

Remove Duplicate Result on Query

could help me solve this duplication problem where it returns more than 1 result for the same record I want to bring only 1 result for each id, and only the last history of each record.
My Query:
SELECT DISTINCT ON(tickets.ticket_id,ticket_histories.created_at)
ticket.id AS ticket_id,
tickets.priority,
tickets.title,
tickets.company,
tickets.ticket_statuse,
tickets.created_at AS created_ticket,
group_user.id AS group_id,
group_user.name AS user_group,
ch_history.description AS ch_description,
ch_history.created_at AS ch_history
FROM
tickets
INNER JOIN company ON (company.id = tickets.company_id)
INNER JOIN (SELECT id,
tickets_id,
description,
user_id,
MAX(tickets.created_at) AS created_ticket
FROM
ch_history
GROUP BY id,
created_at,
ticket_id,
user_id,
description
ORDER BY created_at DESC LIMIT 1) AS ch_history ON (ch_history.ticket_id = ticket.id)
INNER JOIN users ON (users.id = ch_history.user_id)
INNER JOIN group_users ON (group_users.id = users.group_user_id)
WHERE company = 15
GROUP BY
tickets.id,
ch_history.created_at DESC;
Result of my query, but returns 3 or 5 identical ids with different histories
I want to return only 1 id of each ticket, and only the last recorded history of each tick
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:35.724485 | SAME COMPANY | people | 5 | TEST 1 | 2019-12-10 09:31:45.780667
49706 | 2 | INCLUDE DATA | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 2 | 2019-12-10 09:38:52.769515
49706 | 2 | ANY TITLE | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 3 | 2019-12-10 09:39:22.779473
49706 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TESTE 4 | 2019-12-10 09:42:59.50332
49706 | 2 | WHITESTRIPES | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 5 | 2019-12-10 09:44:30.675434
wanted to return as below
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:10.724485 | SAME COMPANY | people | 5 | TEST 1 | 2020-01-01 18:31:45.780667
49707 | 2 | INCLUDE DATA | 1 | f | 2019-12-11 19:22:21.320701 | SAME COMPANY | people | 5 | TEST 2 | 2020-02-05 16:38:52.769515
49708 | 2 | ANY TITLE | 1 | f | 2019-12-15 07:15:57.320950 | SAME COMPANY | people | 5 | TEST 3 | 2020-02-06 07:39:22.779473
49709 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-16 08:30:28.320881 | SAME COMPANY | people | 5 | TESTE 4 | 2020-01-07 11:42:59.50332
49701 | 2 | WHITESTRIPES | 1 | f | 2019-12-21 11:04:00.320450 | SAME COMPANY | people | 5 | TEST 5 | 2020-01-04 10:44:30.675434
I wanted to return as shown below, see that the field ch_description, and ch_history bring only the most recent records and only the last of each ticket listed, without duplication I wanted to bring this way could help me.
Two things jump out at me:
You have listed "created at" as part of your "distinct on," which is going to inherently give you multiple rows per ticket id (unless there happens to be only one)
The distinct on should make the subquery on the ticket history unnecessary... and even if you chose to do it this way, you again are going on the "created at" column, which will give you multiple results. The ideal subquery, should you choose this approach, would have been to group by ticket_id and only ticket_id.
Slightly related:
An alternative approach to the subquery would be an analytic function (windowing function), but I'll save that for another day.
I think the query you want, which will give you one row per ticket_id, based on the history table's created_at field would be something like this:
select distinct on (t.id)
<your fields here>
from
tickets t
join company c on t.company_id = c.id
join ch_history ch on ch.ticket_id = t.id
join users u on ch.user_id = u.ud
join group_users g on u.group_user_id = g.id
where
company = 15
order by
t.id, ch.created_at -- this is what tells distinct on which record to choose

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

SQL get all Data from one table connected with another

I got an little DB with two tables Weeks and Users.
Every User is able to have a Week, so let's say he can have a Week but must not.
Every Week has a WeekNr. I want to do a table like that for an specific WeekNr, where all users are shown even those without a Week with that WeekNr:
-------------------------------------------+
| Users | KM Driven | Goal | CarID |
+----------+------------+------------------+
| Driver1 | 555 | Spain | 1 |
+----------+------------+------------------+
| Driver2 | 0 | 0 | 0 |
+----------+------------+------------------+
| Driver3 | 777 | Germany | 9 |
+----------+------------+------------------+
| Driver4 | 888 | UK | 86 |
+----------+------------+------------------+
If there is a user which have no Week for a WeekNr and I want all columns expected his name filled with 0. See Driver2 in table above for an example.
My query looks actually like that:
SELECT *
FROM User
INNER JOIN Week ON Week.UserId = User.UserID
WHERE WeekNr = 22;
I totally understand why I'm only getting only the drivers with weeks for the specific WeekNr, but I have no clue how to solve the issue with filling the empty one with 0 and all of this in one query.
I hope my question got clear.
Thanks for your help in advance!
EDIT:
My table look like this
Users:
---------------------------------+
| User | PW | UserID |
+----------+------------+--------+
| Driver1 | *** | 1 |
| Driver2 | *** | 2 |
| Driver3 | *** | 3 |
| Driver4 | *** | 4 |
+----------+------------+--------+
Weeks:
------------------------------------------------------+
| WeekNr | KM Driven | Goal | WeekID | UserID |
+----------+------------+-----------------------------+
| 22 | 555 | Spain | 1 | 1 |
| 22 | 0 | USA | 2 | 3 |
| 22 | 777 | Germany | 3 | 4 |
| 23 | 888 | UK | 44 | 2 |
+----------+------------+-----------------------------+
I hope to have correctly understood your question: your problem is in the INNER JOIN, you get only rows present in both the tables. Try with LEFT OUTER:
EDIT:
That should do the trick, given tha additional info :)
;WITH Drivers AS (SELECT u.[User], w.*
FROM Users u LEFT OUTER JOIN Weeks w On w.UserId = u.UserID
WHERE WeekNr = 22)
SELECT u.[User], ISNULL(d.[KM Driven], 0) [KM Driven] , ISNULL(d.Goal, 0) Goal --, *
FROM Users u LEFT OUTER JOIN Drivers d ON u.UserID = d.UserID
Is there a typo in your question? I presume CarID is meant to be a column of Weeks. If so,
SELECT
u.[User]
, ISNULL(w.[KM Driven], 0) [KM Driven]
, ISNULL(w.[Goal], '0') [Goal]
, ISNULL(w.[CarID], 0) [CarID]
FROM dbo.[Users] u
LEFT JOIN Weeks w ON u.UserID = w.UserID AND WeekNr = 22

SQL query to find the friends in a table

I have two tables, Users and Friends. The tables look like:
USERS
| ID | Name |
| 1 | Sam |
| 2 | Harry |
| 3 | Vivi |
| 4 | sabrina |
FRIENDS
| UId | FriendID|
| 1 | 2 |
| 2 | 3 |
| 4 | 1 |
| 5 | 4 |
| 1 | 3 |
I need to find the names of all the friends for Sam. I tried doing the same using a Union in an SQL query, but I couldn't get the desired output. Can I possibly get the required output doing the same?
declare
#answer nvarchar(max)='{'
select #answer=#answer+u1.Name+',' from USERS u
inner join FRIENDS f on f.UId=u.ID
inner join USERS u1 on u1.ID=f.FriendID
where u.ID=<what ever you want> //1 or 2 or 3 or 4
set #answer=SUBSTRING(#answer,0,len(#answer)-1)+'}'
select #answer
select u.name from users
join friends f on users.id=f.uid
join users u on u.id=f.friendid
where users.name='Sam';