Select well spread points from a big table - sql

I'm trying to write a stored procedure for selecting X amount of well spread points in time from a big table.
I have a table points:
"Userid" integer
, "Time" timestamp with time zone
, "Value" integer
It contains hundreds of millions of records. And about a million of records per each user.
I want to select X points (lets say 50), which all well spread from time A to time B. The problem is that the points are not spread equally (if one point is in 6:00:00, the next point may be after 15 seconds, 20, or 4 minutes for example).
Selection all the points for an id can take up to 60 seconds (because there are about a million points).
Is there any way to select the exact amount of points I desire, as much well spread as possible, in a fast way?
Sample data:
+--------+---------------------+-------+
| UserId | Time | Value |
+--------+---------------------+-------+
1 | 1 | 2017-04-10 14:00:00 | 1 |
2 | 1 | 2017-04-10 14:00:10 | 10 |
3 | 1 | 2017-04-10 14:00:20 | 32 |
4 | 1 | 2017-04-10 14:00:35 | 80 |
5 | 1 | 2017-04-10 14:00:58 | 101 |
6 | 1 | 2017-04-10 14:01:00 | 203 |
7 | 1 | 2017-04-10 14:01:30 | 204 |
8 | 1 | 2017-04-10 14:01:40 | 205 |
9 | 1 | 2017-04-10 14:02:02 | 32 |
10 | 1 | 2017-04-10 14:02:15 | 7 |
11 | 1 | 2017-04-10 14:02:30 | 900 |
12 | 1 | 2017-04-10 14:02:45 | 22 |
13 | 1 | 2017-04-10 14:03:00 | 34 |
14 | 1 | 2017-04-10 14:03:30 | 54 |
15 | 1 | 2017-04-10 14:04:00 | 54 |
16 | 1 | 2017-04-10 14:06:00 | 60 |
17 | 1 | 2017-04-10 14:07:20 | 654 |
18 | 1 | 2017-04-10 14:07:40 | 32 |
19 | 1 | 2017-04-10 14:08:00 | 33 |
20 | 1 | 2017-04-10 14:08:12 | 32 |
21 | 1 | 2017-04-10 14:10:00 | 8 |
+--------+---------------------+-------+
I want to select 11 "best" points from the list above, for the user with Id 1,
from time 2017-04-10 14:00:00 to 2017-04-10 14:10:00.
Currently its done on the server, after selecting all the points for the user.
I calculate the "best times" by dividing the difference in times and get a list such as: 14:00:00,14:01:00,....14:10:00 (11 "best times", as the amount of points). Than I look for the closest point for each "best time", that not have been selected yet.
The result will be points: 1, 6, 9, 13, 15, 16, 17, 18, 19, 20, 21
Edit:
I'm trying something like this:
SELECT * FROM "points"
WHERE "Userid" = 1 AND
(("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:00:00' - "Time"))
LIMIT 1)) OR
("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:01:00' - "Time"))
LIMIT 1)) OR
("Time" =
(SELECT "Time" FROM
"points"
ORDER BY abs(extract(epoch from '2017-04-10 14:02:00' - "Time"))
LIMIT 1)))
The problems here are that:
A) It doesn't take in account points that already have been selected.
B) Because of the ORDER BY, each additional time increases the running time of the query by ~ 1 second, and for 50 points I get back to the 1 minute mark.

There is an optimization problem behind your question that's hard to solve with just SQL.
That said, your attempt of an approximation can be implemented to use an index and show good performance irregardless of table size. You need this index if you don't have it already:
CREATE INDEX ON points ("Userid", "Time");
Query:
SELECT *
FROM generate_series(timestamptz '2017-04-10 14:00:00+0'
, timestamptz '2017-04-10 14:09:00+0' -- 1 min *before* end!
, interval '1 minute') grid(t)
LEFT JOIN LATERAL (
SELECT *
FROM points
WHERE "Userid" = 1
AND "Time" >= grid.t
AND "Time" < grid.t + interval '1 minute' -- same interval
ORDER BY "Time"
LIMIT 1
) t ON true;
dbfiddle here
Most importantly, the rewritten query can use above index and will be very fast, solving problem B).
It also addresses problem A) to some extent as no point is returned more than once. If there is no row between two adjacent points in the grid, you get no row in the result. Using LEFT JOIN .. ON true keeps all grid rows and appends NULL in this case. Eliminate those NULL rows by switching to CROSS JOIN. You may get fewer result rows this way.
I am only search ahead of each grid point. You might append a second LATERAL join to also search behind each grid point (just another index-scan), and take the closer one of the two results (ignoring NULL). But that introduces two problems:
If one match is behind and the next is ahead, the gap widens.
You need special treatment for lower and / or upper bound of the outer interval
And you need two LATERAL joins with two index scans.
You could use a recursive CTE to search 1 minute ahead of the last time actually found, but then the total number of rows returned varies even more.
It all comes down to an exact definition of what you need, and where compromises are allowed.
Related:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Aggregating the most recent joined records per week
MySQL/Postgres query 5 minutes interval data
Optimize GROUP BY query to retrieve latest row per user

answer use generate_series('2017-04-10 14:00:00','2017-04-10 14:10:00','1 minute'::interval) and join for comparison.
for others to save time on data set:
t=# create table points(i int,"UserId" int,"Time" timestamp(0), "Value" int,b text);
CREATE TABLE
Time: 13.728 ms
t=# copy points from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1 | 1 | 2017-04-10 14:00:00 | 1 |
>> 2 | 1 | 2017-04-10 14:00:10 | 10 |
3 | 1 | 2017-04-10 14:00:20 | 32 |
4 | 1 | 2017-04-10 14:00:35 | 80 |
5 | 1 | 2017-04-10 14:00:58 | 101 |
6 | 1 | 2017-04-10 14:01:00 | 203 |
7 | 1 | 2017-04-10 14:01:30 | >> 204 |
8 | 1 | 2017-04-10 14:01:40 | 205 |
9 | 1 | 2017-04-10 14:02:02 | 32 |
10 | 1 | 2017-04-10 14:02:15 | 7 |
11 | 1 | 2017-04-10 14:02:30 | 900 |
12 | 1 | 2017-04-10 14:02:45 | 22 |
>> >> >> >> >> >> >> >> >> >> 13 | 1 | 2017-04-10 14:03:00 | 34 |
14 | 1 | 2017-04-10 14:03:30 | 54 |
15 | 1 | 2017-04-10 14:04:00 | 54 |
16 | 1 | 2017-04-10 14:06:00 | 60 |
17 | 1 | 2017-04-10 14:07:20 | 654 |
18 | 1 | 2017-04-10 14:07:40 | 32 |
19 | 1 | 2017-04-10 14:08:00 | 33 |
20 | 1 | 2017-04-10 14:08:12 | 32 |
21 | 1 | 2017-04-10 14:10:00 | 8 |>> >> >> >> >> >> >> >> \.
>> \.
COPY 21
Time: 7684.259 ms
t=# alter table points rename column "UserId" to "Userid";
ALTER TABLE
Time: 1.013 ms
Frankly I don't understand the request. This is how I got it from description and results are different from expected by OP:
t=# with r as (
with g as (
select generate_series('2017-04-10 14:00:00','2017-04-10 14:10:00','1 minute'::interval) s
)
select *,abs(extract(epoch from '2017-04-10 14:02:00' - "Time"))
from g
join points on g.s = date_trunc('minute',"Time")
order by abs
limit 11
)
select i, "Time","Value",abs
from r
order by i;
i | Time | Value | abs
----+---------------------+-------+-----
4 | 2017-04-10 14:00:35 | 80 | 85
5 | 2017-04-10 14:00:58 | 101 | 62
6 | 2017-04-10 14:01:00 | 203 | 60
7 | 2017-04-10 14:01:30 | 204 | 30
8 | 2017-04-10 14:01:40 | 205 | 20
9 | 2017-04-10 14:02:02 | 32 | 2
10 | 2017-04-10 14:02:15 | 7 | 15
11 | 2017-04-10 14:02:30 | 900 | 30
12 | 2017-04-10 14:02:45 | 22 | 45
13 | 2017-04-10 14:03:00 | 34 | 60
14 | 2017-04-10 14:03:30 | 54 | 90
(11 rows)
I added abs column to justify why I thought those rows fit request better

Related

How to group data within a range of contigious timestamps

I have a table made up of rows of data collected through an indeterministic polling process. Each row has a start and end timestamp denoting the time period in which the data was collected. In some cases the data was collected contiguously, in which case the timestamp of one row will have the same value as the start timestamp for the next row. In other cases there is a break in time between one row and the next.
For example, in the table below, rows number 1,2,3 and 4 are all part of one time series of data. Similarly for rows 5, 6, 7 and 8 and again for rows 9 and 10. In between are time periods for which I do not have data.
Row Start_Timestamp End_Timestamp Data_Item
--- --------------- -------------- ---------
1 2019-08-12_22:07:53 2019-08-12_22:09:57 100
2 2019-08-12_22:09:57 2019-08-12_22:12:01 203
3 2019-08-12_22:12:01 2019-08-12_22:13:03 487
4 2019-08-12_22:13:03 2019-08-12_22:16:19 113
5 2019-08-12_22:24:34 2019-08-12_22:26:37 632
6 2019-08-12_22:26:37 2019-08-12_22:27:40 532
7 2019-08-12_22:27:40 2019-08-12_22:28:42 543
8 2019-08-12_22:28:42 2019-08-12_22:31:57 142
9 2019-08-13_19:56:06 2019-08-13_19:57:08 351
10 2019-08-13_19:57:08 2019-08-13_19:58:10 982
I would like to groups these contiguous time series ideally as follows:
Row Series Start_Timestamp End_Timestamp Data_Item
--- ------ --------------- -------------- -----------
1 1 2019-08-12_22:07:53 2019-08-12_22:09:57 100
2 1 2019-08-12_22:09:57 2019-08-12_22:12:01 203
3 1 2019-08-12_22:12:01 2019-08-12_22:13:03 487
4 1 2019-08-12_22:13:03 2019-08-12_22:16:19 113
5 2 2019-08-12_22:24:34 2019-08-12_22:26:37 632
6 2 2019-08-12_22:26:37 2019-08-12_22:27:40 532
7 2 2019-08-12_22:27:40 2019-08-12_22:28:42 543
8 2 2019-08-12_22:28:42 2019-08-12_22:31:57 142
9 3 2019-08-13_19:56:06 2019-08-13_19:57:08 351
10 3 2019-08-13_19:57:08 2019-08-13_19:58:10 982
I am new to SQL and have been struggling with this problem. I appreciate any insights or advice on how I might achieve this.
This is a simplified gaps-and-island problem. Assuming that your RDBMS support window functions, you can approach this with a window sum. When the Start_Timestamp of record is different than the End_Timestamp of the previous record, a new group starts:
select
t.Row,
sum(case when Start_Timestamp = lag_End_Timestamp then 0 else 1 end)
over(order by End_Timestamp) series,
t.Start_Timestamp,
t.End_Timestamp,
t.Data_Item
from (
select
t.*,
lag(End_Timestamp) over (order by End_Timestamp) lag_End_Timestamp
from mytable t
) t
Demo on DB Fiddle:
Row | series | Start_Timestamp | End_Timestamp | Data_Item
--: | -----: | :------------------ | :------------------ | --------:
1 | 1 | 2019-08-12 22:07:53 | 2019-08-12 22:09:57 | 100
2 | 1 | 2019-08-12 22:09:57 | 2019-08-12 22:12:01 | 203
3 | 1 | 2019-08-12 22:12:01 | 2019-08-12 22:13:03 | 487
4 | 1 | 2019-08-12 22:13:03 | 2019-08-12 22:16:19 | 113
5 | 2 | 2019-08-12 22:24:34 | 2019-08-12 22:26:37 | 632
6 | 2 | 2019-08-12 22:26:37 | 2019-08-12 22:27:40 | 532
7 | 2 | 2019-08-12 22:27:40 | 2019-08-12 22:28:42 | 543
8 | 2 | 2019-08-12 22:28:42 | 2019-08-12 22:31:57 | 142
9 | 3 | 2019-08-13 19:56:06 | 2019-08-13 19:57:08 | 351
10 | 3 | 2019-08-13 19:57:08 | 2019-08-13 19:58:10 | 982

How to join transactional data with customer data tables and perform case-based operations in SQL

I'm trying to perform a query between two different tables and come up with a case by case scenario, coming up with a list of records of calls for a specific month.
Here are my tables:
Customer table:
+----+----------------+------------+
| id | name | number |
+----+----------------+------------+
| 1 | John Doe | 8973221232 |
| 2 | American Dad | 7165531212 |
| 3 | Michael Clean | 8884731234 |
| 4 | Samuel Gatsby | 9197543321 |
| 5 | Mike Chat | 8794029819 |
+----+----------------+------------+
Transaction data:
+----------+------------+------------+----------+---------------------+
| trans_id | incoming | outgoing | duration | date_time |
+----------+------------+------------+----------+---------------------+
| 1 | 8973221232 | 9197543321 | 64 | 2018-03-09 01:08:09 |
| 2 | 3729920490 | 7651113929 | 276 | 2018-07-20 05:53:10 |
| 3 | 8884731234 | 8973221232 | 382 | 2018-05-02 13:12:13 |
| 4 | 8973221232 | 9234759208 | 127 | 2018-07-07 15:32:30 |
| 5 | 7165531212 | 9197543321 | 852 | 2018-08-02 07:40:23 |
| 6 | 8884731234 | 9833823023 | 774 | 2018-07-03 14:27:52 |
| 7 | 8273820928 | 2374987349 | 120 | 2018-07-06 05:27:44 |
| 8 | 8973221232 | 9197543321 | 79 | 2018-07-30 12:51:55 |
| 9 | 7165531212 | 7651113929 | 392 | 2018-05-22 02:27:38 |
| 10 | 5423541524 | 7165531212 | 100 | 2018-07-21 22:12:20 |
| 11 | 9197543321 | 2983479820 | 377 | 2018-07-20 17:46:36 |
| 12 | 8973221232 | 7651113929 | 234 | 2018-07-09 03:32:53 |
| 13 | 7165531212 | 2309483932 | 88 | 2018-07-16 16:22:21 |
| 14 | 8973221232 | 8884731234 | 90 | 2018-09-03 13:10:00 |
| 15 | 3820838290 | 2093482348 | 238 | 2018-04-12 21:59:01 |
+----------+------------+------------+----------+---------------------+
What am I trying to accomplish?
I'm trying to compile a list of "costs" for each of the customers that made calls on July 2018. The costs are based on:
1) If the customer received a call (incoming), the cost of the call is equal to the duration;
2) if the customer made a call (outgoing), the cost of the call is 100 if the call is 30 or less in duration. If it exceeds 30 duration, then the cost is 100 plus 5 * duration of the exceeded period.
If the customer didn't make any calls during that month he shouldn't be on the list.
Examples:
1) Customer American Dad has 3 incoming calls and 1 outgoing call, however only trans_id 10 and 13 are for the month of July. He should be paying a total of 538:
for trans_id 10 = 450 (100 for the first 30s + 5 * 70 for the remaining)
for trans_id 13 = 88
2) Customer Samuel Gatsby has 1 incoming call and 3 outgoing calls, however only trans_id 8 and 11 are for the month of July. He should be paying a total of 722:
for trans_id 8 = 345 (100 for the first 30s + 5 * 49 for the remaining)
for trans_id 11 = 377
Considering only these two examples, the output would be:
+----+----------------+------------+------------+
| id | name | number | billable |
+----+----------------+------------+------------+
| 2 | American Dad | 7165531212 | 538 |
| 4 | Samuel Gatsby | 9197543321 | 722 |
+----+----------------+------------+------------+
Note: Mike Chat shouldn't be on the list as he didn't make or receive any calls for that specific month.
What have I tried so far?
I've been playing cat and mouse with this one, I'm using the number as uniqueID, already attempted both a full outer join and combining where incoming or outgoing is not null then applying rules by case, tried doing a left join and applying cases, but I'm circling around and I can't get to a final list. Whenever I get incoming or outgoing, I'm either not able to apply the case or not able to come with both together. Really appreciate the help!
select customer_name.name, customer_name.number, bill = (CASE
WHEN customer_name.number = transaction_data.incoming then 'sum bill'
else 'multiply and add'
end)
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Note: I'm using sqlite to try it out online but the database is on SQL Server 2012, so I know that I can use a date format much easier, that way, but I'd like to keep as close to T-SQL as possible.
Also tried creating a case to determine whether it's incoming call or outgoing, but I'm only getting incoming as a result, even though trans_id 10 is outgoing:
select name, number, duration, case
when customer_name.number = transaction_data.incoming then 'incoming'
when customer_name.number = transaction_data.outgoing then 'outgoing'
END direction
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Try this:
SELECT
c."name", c.number,
SUM(CASE c.number
WHEN t.incoming THEN t.duration
ELSE IIF(t.duration - 30 < 0, 0, t.duration - 30) * 5 + 100
END) AS billable
FROM Customer AS c INNER JOIN [Transaction] AS t
ON c.number IN(t.incoming, t.outgoing)
WHERE t.date_time >= '20180701' AND t.date_time < '20180801'
GROUP BY c."name", c.number
Output:
| name | number | billable |
+---------------+------------+----------+
| John Doe | 8973221232 | 440 |
| American Dad | 7165531212 | 538 |
| Michael Clean | 8884731234 | 774 |
| Samuel Gatsby | 9197543321 | 722 |
Test it online with SQL Fiddle.

Postgres Query Based on Previous and Next Rows

I'm trying to solve the bus routing problem in postgresql which requires visibility of previous and next rows. Here is my solution.
Step 1) Have one edges table which represents all the edges (the source and target represent vertices (bus stops):
postgres=# select id, source, target, cost from busedges;
id | source | target | cost
----+--------+--------+------
1 | 1 | 2 | 1
2 | 2 | 3 | 1
3 | 3 | 4 | 1
4 | 4 | 5 | 1
5 | 1 | 7 | 1
6 | 7 | 8 | 1
7 | 1 | 6 | 1
8 | 6 | 8 | 1
9 | 9 | 10 | 1
10 | 10 | 11 | 1
11 | 11 | 12 | 1
12 | 12 | 13 | 1
13 | 9 | 15 | 1
14 | 15 | 16 | 1
15 | 9 | 14 | 1
16 | 14 | 16 | 1
Step 2) Have a table which represents bus details like from time, to time, edge etc.
NOTE: I have used integer format for "from" and "to" column for faster results as I can do an integer query, but I can replace it with any better format if available.
postgres=# select id, "busedgeId", "busId", "from", "to" from busedgetimes;
id | busedgeId | busId | from | to
----+-----------+-------+-------+-------
18 | 1 | 1 | 33000 | 33300
19 | 2 | 1 | 33300 | 33600
20 | 3 | 2 | 33900 | 34200
21 | 4 | 2 | 34200 | 34800
22 | 1 | 3 | 36000 | 36300
23 | 2 | 3 | 36600 | 37200
24 | 3 | 4 | 38400 | 38700
25 | 4 | 4 | 38700 | 39540
Step 3) Use dijkstra algorithm to find the nearest path.
Step 4) Get the upcoming buses from the busedgetimes table in the earliest first order for the nearest path detected by dijkstra algorithm.
Problem: I am finding it difficult to make the query for the Step 4.
For example: If I get the path as edges 2, 3, 4, to travel from source vertex 2 to target vertex 5 in the above records. To get the first bus for the first edge, it's not so hard as I can simply query with from < 'expected departure' order by from desc but for the second edge, the from condition requires to time of first result row. Also, query requires edge ids filter.
How can I achieve this in a single query?
I am not sure if I understood your problem correctly. But getting values from other rows this can be done by window functions (https://www.postgresql.org/docs/current/static/tutorial-window.html):
demo: db<>fiddle
SELECT
id,
lag("to") OVER (ORDER BY id) as prev_to,
"from",
"to",
lead("from") OVER (ORDER BY id) as next_from
FROM bustimes;
The lag function moves the value of the previous row into the current one. The lead function does the same with the next row. So you are able to calculate a difference between last arrival and current departure or something like that.
Result:
id prev_to from to next_from
18 33000 33300 33300
19 33300 33300 33600 33900
20 33600 33900 34200 34200
21 34200 34200 34800 36000
22 34800 36000 36300
Please notice that "from" and "to" are reserved words in PostgreSQL. It would be better to chose other names.

A single query to count the number of distinct rows in one table and the highest value of a column from another table

I have two SQL tables. Table 1 is as follows:
SALEREF
1 | 40303020
2 | 40303021
3 | 40303021
4 | 40303021
5 | 41210028
6 | 4120302701
7 | 41210030
8 | 4112700803
9 | 4112700803
10 | 41215030
11 | 41215026
12 | 41215026
13 | 41215026
14 | 41215026
15 | 41215026
16 | 41215026
17 | 41215026
18 | 41215027
19 | 41215027
20 | 41215027
Table 2 ("LEDGER") is as follows:
SALESREF SALEDATE
0 | 4081200201 | 20140804
1 | 40303020 | 20141015
2 | 40303021 | 20141017
3 | 40303021 | 20141017
4 | 40303021 | 20141017
5 | 41210028 | 20121214
6 | 4120302701 | 20130926
7 | 41210030 | 20130926
8 | 4112700803 | 20131107
9 | 4112700803 | 20131107
10 | 41215030 | 20120720
What I am looking for is a single line that outputs the following:
TotalDistinctSalesRefsInTable1 HighestSaleDateValueInTable2 (that has a matching value in table 1)
9 20141017
the total number of distinct SALESREF's in table 1 and the latest SALESDATE value from table 2.
I've tried selecting within a query but quickly found the limitation of my knowledge although I know I can get the latest overall sale date by doing:
SELECT MAX(LEDGER.SALEDATE) AS LAST_DATE FROM LEDGER
I just need help piecing the whole thing together.
you can use left join , count and max to get your desired result
select count(distinct t1.salesref) as TotalDistinctSalesRefsInTable1,
ifnull(max(l.saledate),0) as HighestSaleDateValueInTable
from table1 t1
left join ledger l
on t1.salesref = l.salesref

Return Max(Recent_Date) for each month, Duplicate value and inner join issue - SQL

Basically, I want to achieve , for each months, like in this example, from January until March 2013, what is the Max(Most_Recent_Day) for each users.
Example, From January to March, every month in the Database, systems will capture the Most_Recent_Day for each users.
Below are the expected results:
User | Most_Recent_Day
--------------------------------
afolabi.banu | 1/31/2013
afolabi.banu | 2/7/2013
afolabi.banu | 3/21/2013
mario.sapiter | 1/22/2013
mario.sapiter | 2/7/2013
mario.sapiter | 3/11/2013
However, I want to have another DB column as well to be display .Below is the column.
User|Total_Hits | Recent_Month| Most_Recent_Day | Most_Recent_Days_Hits
I tried to use inner join, but the result are not what i expect. I got duplicated user name and duplicated recent day. Basically, I want only to display no duplicated record for the same user name.
Below is the result that I got. Please ignore the recent_month value since it's data from database.
User |Total_Hits | Recent_Month | Most_Recent_Day | Most_Recent_Days_Hits
-------------------------------------------------------------------------------------
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 211 | 13 | 1/31/2013 | 3
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 296 | 31 | 3/21/2013 | 1
afolabi.banu | 296 | 31 | 3/21/2013 | 1
mario.sapiter | 95 | 7 | 2/7/2013 | 5
mario.sapiter | 7 | 7 | 3/21/2013 | 1
mario.sapiter | 7 | 37 | 3/22/2013 | 1
mario.sapiter | 249 | 37 | 2/7/2013 | 5
This is my SQL Code
SELECT t.[User],
t.Total_Hits,
t.Recent_Month,
t.Most_Recent_Day,
t.Most_Recent_Day_Hits FROM UserUsageMonthly t
INNER JOIN
(
select
[User]
, max(Most_Recent_Day) as Most_Recent_Day
from UserUsageMonthly (NoLock)
where Application_Name='Daily Production Review' and Site_Collection='wrm13'
and Most_Recent_Day between '1/1/2013' and '3/31/2013'
group by [User], datepart(month,Most_Recent_Day)
) table2
ON
t.[User]=table2.[User]
AND t.Most_Recent_Day = table2.Most_Recent_Day
order by t.[User]
You should add the month value to your SQL SELECT
SELECT
MONTH(t.Most_Recent_Day) as 'MyMonth',
t.[User],
t.Total_Hits,
t.Recent_Month,
t.Most_Recent_Day,
t.Most_Recent_Day_Hits FROM UserUsageMonthly t
Then you can group by the month column
GROUP BY MyMonth