I'm new to sql and keen to learn but i'm having trouble with this problem. I would appreciate any help. I have a login table with following column and i have explained the columns below.
id cur_date user_id stream user_signup tot_sec mov_sec ser_sec
1 19-MAY-20 5right TV 12-MAY-20 73430 73430 0
2 19-MAY-20 5right TV 12-JAN-16 3430 3430 0
3 19-MAY-20 5left MOBILE 03-JAN-20 3457 3430 45
4 19-MAY-20 7left MOBILE 04-JAN-20 4980 100 4880
5 19-MAY-20 7right Tv 04-FEB-20 15731 0 15731
6 19-MAY-20 7right WEB 04-APR-20 16731 1000 15731
7 19-MAY-20 7left TV 04-MAR-20 2731 1000 1731
8 19-MAY-20 5left TV 03-APR-20 12731 11000 1731
cur_date is when we take the user metrics
user_id (have same id on different stream service)
user_signup (user sign up date)
stream( streaming ways, contains 3 values mobile, web, TV)
tot_sec (total active time seconds)
mov_sec (watching movie time in seconds)
ser_sec(watching tv series time in seconds)
Quesion: write the SQL to which two stream have the greatest overlap of users? The users have the same id across stream service?
I wrote this:
select r1.user_id, r1.stream
from login as r1
inner join login as r2
on r1.user_id = r2.user_id
order by r1.stream;
However this is not quite what im asking for. Any ideas?
Thanks!
Hmmm . . . If you mean the counts of logged in users, then you need some sort of aggregation to count. So:
select s1.stream, s2.stream, count(*)
from login s1 join
login s2
on s1.user_id = s2.user_id and s1.stream <> s2.stream
group by s1.stream, s2.stream
order by count(*) desc
fetch first 1 row only;
If the users can be repeated for a given stream, then you want count(distinct s1.user_id) instead of count(*).
Related
I am working on a SQL query, where considering the following dataset:
clientid
visited
channel
purchase
visit_order
123
abc133
google
0
1
123
efg446
facebook
0
2
123
gij729
instagram
1
3
456
klm183
google
0
1
456
nop496
linkedin
0
2
456
qrs729
pinterest
1
3
456
tuv894
google
0
1
456
wyz634
instagram
0
2
I want to get the following output:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, interest
1
456
google, instagram
0
where the user_jorney column is composed of the channels that participated in a conversion journey. Note that the journey of users who, until then, have not made a purchase is also built.
Looking for commands that can help with this task, I found concat_ws, where I wrote the code below:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from table_name group by clientid;
I get this result:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, pinterest, google, instagram
1
Now I'm trying to consider a condition to get the desired result but so far I haven't been able to find.
Could you help me how can i solve this task?
Note: you are missing very important data point that most likely available in your data - which is timestamp or data or something that allows determine global order of visits
Having this in mind consider below (ts is reference to the missing in your question column)
select clientid,
string_agg(channel, ', ' order by visit_order) user_journey,
sum(purchase) as conversion
from (
select *, countif(visit_order = 1) over(partition by clientid order by ts) grp
from your_table
)
group by clientid, grp
if applied to sample data in your question - output is
I am working on a SQL query (Azure Databricks environment), where considering the following dataset:
clientid
visited
channel
purchase
visit_order
123
abc133
google
0
1
123
efg446
facebook
0
2
123
gij729
instagram
1
3
456
klm183
google
0
1
456
nop496
linkedin
0
2
456
qrs729
pinterest
1
3
456
tuv894
google
0
1
456
wyz634
instagram
0
2
I want to get the following output:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, interest
1
456
google, instagram
0
where the user_jorney column is composed of the channels that participated in a conversion journey. Note that the journey of users who, until then, have not made a purchase is also built.
Looking for commands that can help with this task, I found concat_ws, where I wrote the code below:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from table_name group by clientid;
I get this result:
clientid
user_journey
conversion
123
google, facebook, instagram
1
456
google, linkedin, pinterest, google, instagram
1
Now I'm trying to consider a condition to get the desired result but so far I haven't been able to find.
Could you help me how can i solve this task?
I tried to repro your scenario,
where I gave sub query instead of original table in that query, I am selecting original table along with one extra column as row number where I am giving row number two every row partitioned by visit_order column and orderd by visited column.
My Query:
select
clientid,
concat_ws(',', collect_list(channel)) as user_journey,
sum(purchase) as conversion
from (SELECT *,ROW_NUMBER() OVER (PARTITION BY visit_order ORDER BY visited) AS RowNumber FROM docs) as docstb group by clientid , RowNumber order by clientid asc
Execution and Output:
So I've looked through a lot of questions about subtraction and all that for SQL but haven't found the exact same use.
I'm using a single table and trying to find an average response time between two people talking on my site. Here's the data sample:
id created_at conversation_id sender_id receiver_id
307165 2017-05-03 20:03:27 96557 24 1755
307166 2017-05-03 20:04:22 96557 1755 24
303130 2017-04-20 18:03:53 102458 2518 4475
302671 2017-04-18 20:11:20 102505 3100 1079
302670 2017-04-18 20:09:38 103014 3100 2676
350570 2017-09-18 20:59:56 103496 5453 929
290458 2017-02-16 13:38:47 103575 2841 2282
300001 2017-04-08 16:42:16 104159 2740 1689
304204 2017-04-24 17:31:25 104531 5963 1118
284873 2017-01-12 22:33:19 104712 3657 3967
284872 2017-01-12 22:31:38 104712 3967 3657
What I want is to find an Average Response Time based on the conversation_id
Hmmm . . . You can get the "response" for a given row by getting the next row between the two conversers. The rest is getting the average -- which is database dependent.
Something like this:
select avg(next_created_at - created_at) -- exact syntax depends on the database
from (select m.*,
(select min(m2.created_at)
from messages m2
where m2.sender_id = m.receiver_id and m.sender_id = m2.receiver_id and
m2.conversation_id = m.conversation_id and
m2.created_at > m.created_at
) next_created_at
from messages m
) mm
where next_created_at is not null;
A CTE will take care of bringing the conversation start and end into the same row.
Then use DATEDIFF to compute the response time, and average it.
Assumes there are only ever two entries per conversation (ignores others with 1 or more than 2).
WITH X AS (
SELECT conversation_id, MIN(created_at) AS convstart, MAX(created_at) AS convend
FROM theTable
GROUP BY conversation_id
HAVING COUNT(*) = 2
)
SELECT AVG(DATEDIFF(second,convstart,convend)) AS AvgResponse
FROM X
I have a simple SQL log table (named market_history in SQLite) for US markets it looks something like this:
Sample table (market_history)
id datetime market percent
1 9/5/2014 7:50 ARIZONA 50.0
2 9/5/2014 7:50 ATLANTA 97.4
3 9/5/2014 7:50 AUSTIN 78.8
4 9/5/2014 7:50 BOSTON 90.9
6 9/5/2014 7:50 CHARLOTTE 100.0
7 9/5/2014 7:50 CHICAGO 90.3
This table is an hourly snapshot of network capacity in various systems in each market. What I would like to do is set up an alert system that if any one particular market is below a threshold percent (say 50) for more than 2 consecutive hours (each row is recorded every hour), it triggers an alert email.. So the query should show me a a unique list of Market names where the percents is < 50.0 for more than the last 2 consecutive entries
Here's the SQL I'm trying, but it's not working:
Sample SQL (not working):
SELECT
mh.datetime, mh.market, mh.percent
FROM markets_history mh
WHERE
(SELECT mh1.precent FROM markets_history mh1 WHERE mh1.datetime BETWEEN "2015-03-23 00:00:00" AND "2015-03-23 00:59:59" AND mh.market=mh1.market ) < 50 AND (SELECT mh2.precent FROM markets_history mh2 WHERE mh2.datetime BETWEEN "2015-03-23 01:00:00" AND "2015-03-23 01:59:59" AND mh.market=mh2.market ) < 50
ORDER by mh.datetime
I know I'm missing something.. any sugggestions
If the time windows are fixed and reliable, just make sure the largest one isn't more than the threshold. It wouldn't really matter how far back you look either if you needed to extend this to more than two.
select market
from markets_history mh
where mh.datetime between <last_two_hours> and <now>
group by mh.market
having max(percent) < 50.0
-- and count(*) = 2 /* if you need to be sure of two ... */
Here is an approach that should work in SQLite. Find the last good id (if any) in each market. Then count the number of rows larger than than id.
select lastgood.market,
sum(case when lastgood.market is null then 1
when lastgood.id < mh.id then 1
else 0
end) as NumInRow
from market_history mh left join
(select market, max(id) as maxid
from market_history mh
where percent < 50.0
group by market
) as lastgood
on lastgood.market = mh.market and lastgood.id < mh.id;
This query is a little bit complicated because it needs to take into account the possibility of there not being any good id. If that is the case, then all rows for the market count.
I'm working on a project for my University with Rails 3/PostgreSQL, where we have Users, Activities and Venues. An user has many activities, and a venue has many activities. An activity belongs to an user and to a venue and has therefore an user_id and a venue_id.
What I need is a SQL query (or even a method from Rails itself?) to find mutual venues between several users. For example, I have 5 users that have visited different venues. And only 2 venues got visited by the 5 users. So I want to retrieve the 2 venues.
I've started by retrieving all activities from the 5 users:
SELECT a.user_id as user, a.venue_id as venue
FROM activities AS a
WHERE a.user_id=116 OR a.user_id=227 OR a.user_id=229 OR a.user_id=613 OR a.user_id=879
But now I need a way to find out the mutual venues.
Any idea?
thx,
tux
I'm not entirely familiar with sql syntax for postgresql, but try this:
select venue_id, COUNT(distinct user_id) from activities
Where user_id in (116,227,229,613,879)
group by venue_id
having COUNT(distinct user_id) = 5
EDIT:
You will need to change the '5' to however many users you care about (how many you are looking for).
I tested this on a table structure like so:
user_id venue_id id
----------- ----------- -----------
1 1 1
2 6 2
3 3 3
4 4 4
5 5 5
1 2 6
2 2 7
3 2 8
4 2 9
5 2 10
The output was:
venue_id
----------- -----------
2 5
You would have to come up with some parameters for your search. For example, 5 user may have 2 Venues in common, but not 3.
If you want to see what Venues these five users have in common, you can start by doing this:
SELECT a.venue_id, count(1) as NoOfUsers
FROM activities AS a
WHERE a.user_id=116 OR a.user_id=227 OR a.user_id=229 OR a.user_id=613 OR a.user_id=879
group by a.venue_id
That would bring you, for those users, how many users have that venue. So you have degrees of "Venue sharing".
But if you want to see ONLY the venues who were visited by the five users, you'd add a line in the end:
SELECT a.venue_id, count(1) as NoOfUsers
FROM activities AS a
WHERE a.user_id=116 OR a.user_id=227 OR a.user_id=229 OR a.user_id=613 OR a.user_id=879
group by a.venue_id
having count(1) = 5 --the number of users in the query
You should also consider changing your WHERE statement from
WHERE a.user_id=116 OR a.user_id=227 OR a.user_id=229 OR a.user_id=613 OR a.user_id=879
to
WHERE a.user_id in (116, 227, 229, 613, 879)
in sql it would be something like:
Select distinct v.venue_id
from v.venues
join activities a on a.venue_id = v.venue_id
Join users u on u.user_id = a.user_id
Where user_id in (116,227,229,613,879)
You need to join up your tables so to get all the venues that have had activities that have had users. When you are just learning it is sometimes simpler to visualize it if you use subqueries. At leasts thats what I found for me.