How do I query previous rows? - sql

I have a page audit table that records which pages a user has accessed. Given an specific page, I need to find what previous page the user has accessed and what was the most accessed.
For example, the FAQ Page_ID is 3. I want to know if it is more frequently accessed from the First Access page (ID 1) or Home page (ID 5).
Example:
Page Audit Table (SQL Server)
ID | Page_ID | User_ID
1 | 1 | 6
2 | 3 | 6
3 | 5 | 4
4 | 3 | 4
5 | 1 | 7
6 | 3 | 7
7 | 1 | 5
8 | 3 | 2 --Note: previous user is not 2
9 | 3 | 5 --Note: previous page for user 5 is 1 and not 3
Looking for Page_ID = 3, I want to retrieve:
Previous Page | Count
1 | 3
5 | 1
Note: I've looked some similar questions here (like that one), but it didn't help me to solve this problem.

You can use window functions as one way to figure this out:
with UserPage as (
select
User_ID,
Page_ID,
row_number() over (partition by User_ID order by ID) as rn
from
PageAudit
)
select
p1.Page_ID,
count(*)
from
UserPage p1
inner join
UserPage p2
on p1.User_ID = p2.User_ID and
p1.rn + 1 = p2.rn
where
p2.Page_ID = 3
group by
p1.Page_ID;
SQLFiddle Demo
If you have SQL2012, the answers using lag will be a lot more efficient. This one works on SQL2008 too.
For reference, as I think one of the lag solutions is over complicated, and one is wrong:
with prev as (
select
page_id,
lag(page_id,1) over (partition by user_id order by id) as prev_page
from
PageAudit
)
select
prev_page,
count(*)
from
prev
where
page_id = 3 and
prev_page is not null -- people who landed on page 3 without a previous page
group by
prev_page
SQLFiddle Example of Lag

select prev_page, count(*)
from (select id,
page_id,
user_id,
lag(page_id, 1) over(partition by user_id order by id) as prev_page
from page_audit_table) x
where page_id = 3
and prev_page <> page_id
group by prev_page
Fiddle:
http://sqlfiddle.com/#!6/c0037/23/0

You could use the LAG function (It is available only in MS SQL Server 2012+).
Test with this fiddle.
Query:
SELECT
previous_page, count(previous_page) as count
FROM
(SELECT
Page_id,
LAG(Page_ID, 1, NULL) OVER (PARTITION BY User_ID ORDER BY ID) as previous_page,
User_ID as current_usr,
LAG(User_ID, 1, NULL) OVER (PARTITION BY User_ID ORDER BY ID) as previous_usr
FROM
Page_Audit) p
WHERE
Page_ID = 3 AND current_usr = previous_usr
GROUP BY
previous_page
ORDER BY
count DESC

Related

Find the count of IDs that have the same value

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks
You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle
You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |
First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

sql aggregate data

this is not a specific dbms question, but a generic sql problem.
i have this dataset
userid | objecteid| count
--------------------------
1 | 1 | 12
1 | 2 | 15
1 | 3 | 6
2 | 4 | 30
2 | 1 | 1
2 | 5 | 9
with one query i need to find: for each user, the object with the maximum count
looking for a result like this:
userid | objecteid| count
--------------------------
1 | 2 | 15
2 | 4 | 30
because the object 2 has the max count for user 1 and the object 4 has the max count for user 2
This can easily be solved using window functions.
The following is standard ANSI SQL:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
max("count") over (partition by userid) as max_cnt
from the_table
) t
where "count" = max_cnt;
If there are two objects with the same count, both will be returned.
Alternatively this can also be done using row_number() instead:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
row_number() over (partition by userid order by "count" desc) as rn
from the_table
) t
where rn = 1;
Unlike the first query, this will only pick one row if a user has more than one object with the same count. If you want those duplicates returned, use dense_rank() instead of row_number()
SQLFiddle: http://sqlfiddle.com/#!15/f02a9/1
try this
Select * from tableName
where count in (
Select Max(count)
from tableName
group by userid
)

Group by groups of 3 by column

Say I have a table that looks like this:
| id | category_id | created_at |
| 1 | 3 | date... |
| 2 | 4 | date... |
| 3 | 1 | date... |
| 4 | 2 | date... |
| 5 | 5 | date... |
| 6 | 6 | date... |
And imagine there are a lot more entries. I'd like to grab these in a way that they are fresh, so ordering them by created_at DESC - but I'd also like to group them by category, in groups of 3!
So in pseudocode it looks something like this:
Go to category 1
-> Pick last 3
Go to category 2
-> Pick last 3
Go to category 3
-> Pick last 3
And so forth, starting over from category_id 1 when there's no other category to grab from. This will then be paginated as well so I need to make it work with offset & limit as well somehow.
I'm not at all sure where to start or what they keywords to google for are. I'd be happy with some nudges in the right direction so I can find the answer myself, or a full answer.
Another case for the window function row_number().
Just the latest 3 rows per category
SELECT id, category_id, created_at
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
WHERE rn < 4
ORDER BY category_id, rn;
The latest 3 rows per category, plus later rows
If you want to append the rest of the rows (your question gets fuzzy if and how):
SELECT *
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
ORDER BY (rn > 3), category_id, rn;
One can sort by the outcome of a boolean expression (rn > 3):
FALSE (0)
TRUE (1)
NULL (because default is NULLS LAST - not applicable here)
This way, the latest 3 rows per category come first and all the rest later.
Or use a CTE and UNION ALL:
WITH cte AS (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
)
)
SELECT id, category_id, created_at
FROM cte
WHERE rn < 4
ORDER BY category_id, rn
)
UNION ALL
)
SELECT id, category_id, created_at
FROM cte
WHERE rn >= 4
ORDER BY category_id, rn
);
Same result.
All parentheses required to attach ORDER BY in individual legs of a UNION query.

Count distinct rows via a pair of known values

I wasn't even sure how to phrase this question. I'll give example content and wanted output, I'm looking for a query to do this.
Let's say I have table called "flagged" with this content:
content_id | user_id
1 | 1
1 | 2
1 | 3
2 | 1
2 | 3
2 | 4
3 | 2
3 | 3
4 | 1
4 | 2
5 | 1
6 | 1
6 | 4
And I have a a-symmetrical relationship between content_ids:
master_content_id | slave_content_id
1 | 2
3 | 4
5 | 6
For each "master" content_id (1, 3 and 5), I want to count how many distinct users have flagged either the master or the slave content, but count someone who flagged both as a single flag - which means that in the above example, content_id=1 was counted by user_id=1 (as content_id=1 and content_id=2), by user_id=2 (as content_id=1), by user_id=3 (as content_id=1 and content_id=2), and by user_id=4 (as content_id=2!)
An example of the output of the query I want to make is:
content_id | user_count
1 | 4 # users 1, 2, 3, 4
3 | 3 # users 1, 2, 3
5 | 2 # users 1, 4
I can't assume that the related content_ids are always a consecutive odd/even (i.e. 66 can be the master of the slave 58)
I am using MySQL and don't mind using its extensions to SQL (but rather the query be ANSI, or at least portable to the most databases)
The query below worked for me.
I'm using a sub-query with a UNION ALL to treat your mapped contents equal to the direct contents.
SELECT master_content_id AS content_id,
COUNT(DISTINCT user_id) AS user_count
FROM (
SELECT master_content_id, slave_content_id
FROM relationship
UNION ALL
SELECT master_content_id, master_content_id
FROM relationship
) r
JOIN flagged f ON ( f.content_id = r.slave_content_id )
GROUP BY master_content_id
Result:
content_id user_count
1 4
3 3
5 2
I think something like this will work for you (although GROUP_CONCAT is MySQL specific, similar concatenation can be achieved in other RDBMS)
SELECT COALESCE(Master_Content_ID, Content_ID) AS Content_ID,
COUNT(DISTINCT User_ID) AS Users,
CONCAT('#Users ', GROUP_CONCAT(DISTINCT User_ID ORDER BY User_ID)) AS UserList
FROM Flagged
LEFT JOIN MasterContent
ON Content_ID = Slave_Content_ID
GROUP BY COALESCE(Master_Content_ID, Content_ID)
Sample SQL Fiddle here: http://www.sqlfiddle.com/#!2/d09be/2
Output:
CONTENT_ID USERS USERLIST
1 4 #Users 1,2,3,4
3 3 #Users 1,2,3
5 2 #Users 1,4
From the samples given, does this do the job (I don't have MySQL available to test)?
SELECT
ms.master_content_id,
(SELECT COUNT(DISTINCT f.user_id) FROM flagged f WHERE
f.content_id = ms.slave_content_id OR
f.content_id = ms.master_content_id)
FROM
master_slave ms
It would be better not to have the DISTINCT, but I can't see a way around it.
SELECT master_content_id AS content_id
, COUNT(*) AS user_count
, GROUP_CONCAT(user_id) AS flagging_users
FROM
( SELECT r.master_content_id
, f.user_id
FROM relationship AS r
JOIN flagged AS f
ON f.content_id = r.master_content_id
UNION
SELECT r.master_content_id
, f.user_id
FROM relationship AS r
JOIN flagged AS f
ON f.content_id = r.slave_content_id
) AS un
GROUP BY master_content_id

Find previous/next rows with order by when querying for a specific id

I have a table such as (simplified to the extreme to make it clearer)
create table mytable (
id integer not null,
owner text not null,
order_field_1 integer not null,
order_field_2 integer not null
)
I'm trying to get the next and previous elements' ids every time I get a row from the database, to allow navigation. The rows are not ordered by id, but by ORDER BY order_field_1 DESC, order_field_2 DESC.
When getting the last entries for an owner, I have no problem to find what I want using a window and lead/lag
SELECT
id,
owner,
lag(id) over w AS previous_id,
lead(id) over w AS next_id
FROM
mytable
WHERE
owner = 'someuser'
WINDOW w AS (
ORDER BY order_field_1 DESC,
order_field_2 DESC
)
ORDER BY
order_field_1 DESC,
order_field_2 DESC
LIMIT
5
This is written from memory but that's the gist of it, and it works perfectly.
My problem is when I want to get a specific row, using owner AND id, yet I still want to find the previous and next ids, I can not use a window function anymore since only one row is returned by the where, and my current solution of doing a subquery to get both navigation id is not very good performance wise
For exemple (I only put previous id since it's the same for next)
SELECT
m1.id,
m1.owner,
(
SELECT
m2.id
FROM
mytable m2
WHERE
m2.owner = m1.owner
AND m2.id != m1.id
AND (
m2.order_field_1 < m1.order_field_1
OR (
m2.order_field_1 = m1.order_field_1
AND m2.order_field_2 <= m1.order_field_2
)
ORDER BY
m2.order_field_1 DESC,
m2.order_field_2 DESC
LIMIT
1
) AS previous_id
FROM
mytable m1
WHERE
owner = 'someuser'
AND id = 12345
So I'm selecting my row, then selection the first row from the same user, with a different id, that is either with a lower order_field_1 or the same but a lower order_field_2.
This is not really efficient and I am getting poor performances, and I'm wondering if anyone has any idea on how I could improve it ?
Exemple dataset:
id | owner | order_field_1 | order_field_2
1 | someuser | 4 | 2
2 | someuser | 2 | 8
3 | someuser | 4 | 3
4 | someuser | 3 | 2
5 | someuser | 4 | 6
6 | someuser | 4 | 5
Ordered:
id | owner | order_field_1 | order_field_2
5 | someuser | 4 | 6
6 | someuser | 4 | 5
3 | someuser | 4 | 3
1 | someuser | 4 | 2
4 | someuser | 3 | 2
2 | someuser | 2 | 8
If I select owner = 'someuser' and id = 3, previous_id should be 1, next_id should be 6.
If I select owner = 'someuser' and id = 1, previous_id should be 4, next_id should be 3.
Thanks in advance for any help
With window functions and CTE
It is much cheaper to have the WHERE owner = 'someuser' in the CTE already:
WITH t AS (
SELECT id
,owner
,lag(id) over w AS previous_id
,lead(id) over w AS next_id
FROM mytable
WHERE owner = 'someuser'
WINDOW w AS (ORDER BY order_field_1 DESC, order_field_2 DESC)
)
SELECT *
FROM t
WHERE id = 3
Also, as you only select a single row, there is no need for an ORDER BY in the final SELECT.
__
Old school with subqueries
It's rather ugly, but it might be faster if there are a lot of rows per owner. You'll have to test ...
SELECT id
, owner
,(SELECT id
FROM tbl p
WHERE p.owner = t.owner -- same owner
AND p.id <> t.id -- different id
AND p.order_field_1 <= t.order_field_1
AND p.order_field_2 <= t.order_field_2
ORDER BY order_field_1 DESC
, order_field_2 DESC
LIMIT 1) AS previous_id
,(SELECT id
FROM tbl n
WHERE n.owner = t.owner
AND n.id <> t.id
AND n.order_field_1 >= t.order_field_1
AND n.order_field_2 >= t.order_field_2
ORDER BY order_field_1
, order_field_2
LIMIT 1) AS next_id
FROM tbl t
WHERE owner = 'someuser'
AND id = 3
This one works for older versions of PostgreSQL, too.
The key to performance are proper indexes, of course.
How about finding the lag and lead values before applying the WHERE clause?
WITH T as (
SELECT
id,
owner,
lag(id) over w AS previous_id,
lead(id) over w AS next_id
FROM
mytable
WINDOW w AS (
ORDER BY order_field_1 DESC,
order_field_2 DESC
)
)
SELECT * FROM T
WHERE
owner = 'someuser' AND id = 3
ORDER BY
order_field_1 DESC,
order_field_2 DESC