SQL query to group message conversations

SQL query to group message conversations - sql

I am trying to select the most recent record between 2 users based on a to_number and a from_number and created date/time.
Once the record is found, display the message and time stamp. As long as either the to_number or from_number have the same pairing, then that is the message I want to display.
I'm really getting stuck on finding unique to/from OR from/to records with the same number combinations AND that haven't been listed before.
My data:
Messages table:
"id","to_number","from_number","message","created_at","dm_user_id"
"1","7325551212","5705551234","new update","2011-12-17T11:26:33-05:00","1"
"2","5705551234","3015551212","next update","2011-12-17T11:26:53-05:00","1"
"3","6095559876","4695551212","trying messages.","2011-12-19T19:20:47-05:00","2"
"4","5705551234","4155551212","did i get this?","2011-12-19T20:04:40-05:00","1"
"5","9075551212","5705551234","Where did this go?","2011-12-19T20:05:51-05:00","1"
"6","9075551212","5705551234","testing","2011-12-19T20:12:53-05:00","1"
"7","3015551212","5705551234","Are you here ","2011-12-19T20:13:34-05:00","1"
"8","6175554567","4695551212","test from app","2011-12-19T22:51:32-05:00","2"
From the above data, I only want the following records, listed newest to oldest.
NOTE: Not all records will be returned because there are duplicate to/from combinations. For example, id 2 and id 7 are messages between the same 2 numbers. Only the most recent will be returned, id 7.
Another example is id 5 and id 6 - they are both to/from the same numbers so only the most recent is returned, id 6.:
for dm_user_id=1
"3015551212", "Hello", "2011-12-19T20:13:34-05:00" # id 7
"9075551212", "testing", "2011-12-19T20:12:53-05:00" # id 6
"4155551212", "did i get this?", "2011-12-19T20:04:40-05:00" # id 4
"7325551212", "new update", "2011-12-17T11:26:33-05:00" # id 1
for dm_user_id=2
"6175554567", "test from app", "2011-12-19T22:51:32-05:00" # id 8
"6095559876", "trying messages.", "2011-12-19T19:20:47-05:00" # id 3
I'm trying different combinations of GROUP BY and DISTINCT, but not getting the results I'm looking for.
select * from messages where dm_user_id = 1
group by to_number, from_number
select * from (
select DISTINCT to_number, from_number dm_user_id
from messages) where dm_user_id = 1

With a dm_users table, you want this:
select
m.*
from
dm_users u1
cross join dm_users u2
inner join messages m on
u1.phone_number in (m.to_number, m.from_number)
and u2.phone_number in (m.to_number, m.from_number)
where
u1.dm_user_id = 1
and u2.dm_user_id = 2
order by
m.created_at desc

This is a common question, basically you want the most recent message from each number, either TO or FROM, but you don't want duplicates. You may find something useful in this category greatest-n-per-group

The working SQL that returns only 1 most recent message for a given to/from, from/to number pair and sorts with most recent first. Modified the SQL from this link.
SELECT
fullMessage.id,
fullMessage.to_number,
fullMessage.from_number,
fullMessage.message,
fullMessage.dm_user_id
FROM
messages fullMessage JOIN
(
SELECT max(id) as MAX_ID, to_number, from_number
FROM messages WHERE dm_user_id = 1 # this can be changed for any dm_user_id
GROUP BY from_number, to_number
) maxMessage ON maxMessage.MAX_ID = fullMessage.id
ORDER BY fullMessage.id desc;

Related

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4

After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

Specific Column is not showing up, when it should?

I don't know what's going on with my code here. I am trying to return the highest number of calls received by particular phone numbers, and find the top 7 numbers by calls received, but I am only getting the count column in my results. The code is:
SELECT COUNT (call_id) FROM call_test
GROUP BY receiver_id
ORDER BY COUNT(call_id) DESC
LIMIT 7;
But all it is returning is:
COUNT(call_id)
3
2
2
2
2
1
1
I think my code is right, but how do you show the particular numbers that correspond to the respective counts? This is SQLPro for MAC.

Is it as simple as including the number in the select?
SELECT receiver_id, COUNT(call_id)
FROM call_test
GROUP BY receiver_id
ORDER BY COUNT(call_id) DESC
LIMIT 7;

Determine records which held particular "state" on a given date

I have a state machine architecture, where a record will have many state transitions, the one with the greatest sort_key column being the current state. My problem is to determine which records held a particular state (or states) for a given date.
Example data:
items table
id
1
item_transitions table
id item_id created_at to_state sort_key
1 1 05/10 "state_a" 1
2 1 05/12 "state_b" 2
3 1 05/15 "state_a" 3
4 1 05/16 "state_b" 4
Problem:
Determine all records from items table which held state "state_a" on date 05/15. This should obviously return the item in the example data, but if you query with date "05/16", it should not.
I presume I'll be using a LEFT OUTER JOIN to join the items_transitions table to itself and narrow down the possibilities until I have something to query on that will give me the items that I need. Perhaps I am overlooking something much simpler.

Your question rephrased means "give me all items which have been changed to state_a on 05/15 or before and have not changed to another state afterwards. Please note that for the example it added 2001 as year to get a valid date. If your "created_at" column is not a datetime i strongly suggest to change it.
So first you can retrieve the last sort_key for all items before the threshold date:
SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id
Next step is to join this result back to the item_transitions table to see to which state the item was switched at this specific sort_key:
SELECT *
FROM item_transistions it
JOIN (SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id) tmp ON it.item_id=tmp.item_id AND it.sort_key=tmp.last_change_sort_key
Finally you only want those who switched to 'state_a' so just add a condition:
SELECT DISTINCT it.item_id
FROM item_transistions it
JOIN (SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id) tmp ON it.item_id=tmp.item_id AND it.sort_key=tmp.last_change_sort_key
WHERE it.to_state='state_a'
You did not mention which DBMS you use but i think this query should work with the most common ones.

combine results from different selects

I have one table that contains a field "ID", "mailSent" and "serviceUsed". "mailSent" contains the time when a mail was sent and "serviceUsed" contains a counter that just says if the user has used the service for the particular mail that I have sent.
I am trying to do a report that gives me back for each ID the following two facts:
1. The last time when a user has used the service, i.e., the time when for a particular user serviceUsed != 0
2. The total number of times a user has used the service, i.e., sum(serviceUsed) for each user
I would like to display this in one view and map the result always to the particular user. I can build each of the two queries separately but do not know how to combine it into one view. The two queries look as follows:
1. Select ID, max(mailSent) from Mails where serviceUsed > 0 group by ID
2. Select ID, sum(serviceUsed) from Mails group by ID
Notice that I cannot just combine them both because I also want to show the IDs that have never used my service, i.e., where serviceUsed = 0. Hence, if I just eliminate the where clause in my first query, then I will get wrong results for max(mailSent). Any idea how I can combine both?
In other words what I want is then something like this:
ID, max(mailSent), sum(serviceUsed)
where max(mailSent) is from the first query and sum(serviceUsed) from the second query.
Regards!

Try like this
SELECT * FROM
(
Select ID, max(mailSent) from Mails where serviceUsed > 0 group by ID
UNOIN ALL
Select ID, sum(serviceUsed) from Mails group by ID
) AS T

You can write it within one Query:
SELECT ID, sum(serviceUsed), max(mailSent) from Mails group by ID;
The problem, that you don't have the serviceUsed > 0 in your second Query doesn't matter. You can sum them up too, because they have the value 0.
If you have the following input:
id serviceUsed mailSent
--------------------------
1 0 1.1.1970
1 4 3.1.1970
1 3 4.1.1970
2 0 2.1.1970
The Query should return this result:
id serviceUsed mailSent
--------------------------
1 7 4.1.1970
2 0 2.1.1970
But I wonder, where your primary key is?

You want to do this with conditional aggregation:
select ID, max(case when serviceUsed > 0 then mailSent end),
sum(serviceUsed)
from Mails
group by ID;

How to group by a column

Hi I know how to use the group by clause for sql. I am not sure how to explain this so Ill draw some charts. Here is my original data:
Name Location
----------------------
user1 1
user1 9
user1 3
user2 1
user2 10
user3 97
Here is the output I need
Name Location
----------------------
user1 1
9
3
user2 1
10
user3 97
Is this even possible?

The normal method for this is to handle it in the presentation layer, not the database layer.
Reasons:
The Name field is a property of that data row
If you leave the Name out, how do you know what Location goes with which name?
You are implicitly relying on the order of the data, which in SQL is a very bad practice (since there is no inherent ordering to the returned data)
Any solution will need to involve a cursor or a loop, which is not what SQL is optimized for - it likes working in SETS not on individual rows

Hope this helps
SELECT A.FINAL_NAME, A.LOCATION
FROM (SELECT DISTINCT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.NAME,
YT.LOCATION
FROM YOUR_TABLE_7 YT) A
As Jirka correctly pointed out, I was using the Outer select, distinct and raw Name unnecessarily. My mistake was that as I used DISTINCT , I got the resulted sorted like
1 1
2 user2 1
3 user3 97
4 user1 1
5 3
6 9
7 10
I wanted to avoid output like this.
Hence I added the raw id and outer select
However , removing the DISTINCT solves the problem.
Hence only this much is enough
SELECT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.LOCATION
FROM SO_BUFFER_TABLE_7 YT
Thanks Jirka

If you're using straight SQL*Plus to make your report (don't laugh, you can do some pretty cool stuff with it), you can do this with the BREAK command:
SQL> break on name
SQL> WITH q AS (
SELECT 'user1' NAME, 1 LOCATION FROM dual
UNION ALL
SELECT 'user1', 9 FROM dual
UNION ALL
SELECT 'user1', 3 FROM dual
UNION ALL
SELECT 'user2', 1 FROM dual
UNION ALL
SELECT 'user2', 10 FROM dual
UNION ALL
SELECT 'user3', 97 FROM dual
)
SELECT NAME,LOCATION
FROM q
ORDER BY name;
NAME LOCATION
----- ----------
user1 1
9
3
user2 1
10
user3 97
6 rows selected.
SQL>

I cannot but agree with the other commenters that this kind of problem does not look like it should ever be solved using SQL, but let us face it anyway.
SELECT
CASE main.name WHERE preceding_id IS NULL THEN main.name ELSE null END,
main.location
FROM mytable main LEFT JOIN mytable preceding
ON main.name = preceding.name AND MIN(preceding.id) < main.id
GROUP BY main.id, main.name, main.location, preceding.name
ORDER BY main.id
The GROUP BY clause is not responsible for the grouping job, at least not directly. In the first approximation, an outer join to the same table (LEFT JOIN below) can be used to determine on which row a particular value occurs for the first time. This is what we are after. This assumes that there are some unique id values that make it possible to arbitrarily order all the records. (The ORDER BY clause does NOT do this; it orders the output, not the input of the whole computation, but it is still necessary to make sure that the output is presented correctly, because the remaining SQL does not imply any particular order of processing.)
As you can see, there is still a GROUP BY clause in the SQL, but with a perhaps unexpected purpose. Its job is to "undo" a side effect of the LEFT JOIN, which is duplication of all main records that have many "preceding" ( = successfully joined) records.
This is quite normal with GROUP BY. The typical effect of a GROUP BY clause is a reduction of the number of records; and impossibility to query or test columns NOT listed in the GROUP BY clause, except through aggregate functions like COUNT, MIN, MAX, or SUM. This is because these columns really represent "groups of values" due to the GROUP BY, not just specific values.

If you are using SQL*Plus, use the BREAK function. In this case, break on NAME.
If you are using another reporting tool, you may be able to compare the "name" field to the previous record and suppress printing when they are equal.

If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
Relying on implicit GROUP BY sorting in MySQL 5.6 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.
For full information - http://academy.comingweek.com/sql-groupby-clause/

SQL GROUP BY STATEMENT
SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups.
Syntax:
1. SELECT column_nm, aggregate_function(column_nm) FROM table_nm WHERE column_nm operator value GROUP BY column_nm;
Example :
To understand the GROUP BY clauserefer the sample database.Below table showing fields from “order” table:
1. |EMPORD_ID|employee1ID|customerID|shippers_ID|
Below table showing fields from “shipper” table:
1. | shippers_ID| shippers_Name |
Below table showing fields from “table_emp1” table:
1. | employee1ID| first1_nm | last1_nm |
Example :
To find the number of orders sent by each shipper.
1. SELECT shipper.shippers_Name, COUNT (orders.EMPORD_ID) AS No_of_orders FROM orders LEFT JOIN shipper ON orders.shippers_ID = shipper.shippers_ID GROUP BY shippers_Name;
1. | shippers_Name | No_of_orders |
Example :
To use GROUP BY statement on more than one column.
1. SELECT shipper.shippers_Name, table_emp1.last1_nm, COUNT (orders.EMPORD_ID) AS No_of_orders FROM ((orders INNER JOIN shipper ON orders.shippers_ID=shipper.shippers_ID) INNER JOIN table_emp1 ON orders.employee1ID = table_emp1.employee1ID)
2. GROUP BY shippers_Name,last1_nm;
| shippers_Name | last1_nm |No_of_orders |
for more clarification refer my link
http://academy.comingweek.com/sql-groupby-clause/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas