SQL / Oracle Aggregation Buckets Between Dates - sql

I have a SQL related question I would love some help with as a suitable answer has been eluding me for some time.
Background
I’m working with a vendor product which has an Oracle Database which serves as the backend. I have the ability to write any adhoc SQL to query the underlying tables, but I cannot make any changes to their underlying structure (or to the data model itself). The table I’m interested currently has about +1M rows and essentially tracks users sessions. It has 4 columns of interest: session_id (which is a primary key and unique per session), user_name, start_date (date which tracks the beginning of the session), and stop_date (date which tracks the end of the session). My goal is to perform the aggregation of data for active sessions based on month, day, and hour give a set start date and end date. I need to create a view (or 3 separate views) which can either perform the aggregation itself or serve as the intermediate object from which I can then query and perform the aggregation. I understand the eventual SQL / view may actually need to be 3 different views (one for month, one for day, one for hour), but it seems to me that the concept (once achieved) should be the same regardless of the time period.
Current table example
Table Name = web_session
| Session_id | user_name | start_date | stop_date
----------------------------------------------------------------------------
| 1 | joe | 4/20/2017 10:42:10 PM | 4/21/2017 2:42:10 AM |
| 2 | matt | 4/20/2017 5:43:10 PM | 4/20/2017 5:59:10 PM |
| 3 | matt | 4/20/2017 3:42:10 PM | 4/20/2017 5:42:10 PM |
| 4 | joe | 4/20/2017 11:20:10 AM | 4/20/2017 4:42:10 PM |
| 5 | john | 4/20/2017 8:42:10 AM | 4/20/2017 11:42:10 AM |
| 6 | matt | 4/20/2017 7:42:10 AM | 4/20/2017 11:42:10 PM |
| 7 | joe | 4/19/2017 11:20:10 PM | 4/20/2017 1:42:10 AM |
Ideal Output For Hour View
-12:00 can be either 0 or 24 for the example
| Date | HR | active_sessions | distinct_users |
------------------------------------------------------------
| 4/21/2017 | 2 | 1 | 1 |
| 4/21/2017 | 1 | 1 | 1 |
| 4/20/2017 | 0 | 1 | 1 |
| 4/20/2017 | 23 | 1 | 1 |
| 4/20/2017 | 22 | 1 | 1 |
| 4/20/2017 | 17 | 2 | 1 |
| 4/20/2017 | 16 | 2 | 2 |
| 4/20/2017 | 15 | 2 | 2 |
| 4/20/2017 | 14 | 1 | 1 |
| 4/20/2017 | 13 | 1 | 1 |
| 4/20/2017 | 12 | 1 | 1 |
| 4/20/2017 | 11 | 3 | 3 |
| 4/20/2017 | 10 | 2 | 2 |
| 4/20/2017 | 9 | 2 | 2 |
| 4/20/2017 | 8 | 2 | 2 |
| 4/20/2017 | 7 | 1 | 1 |
| 4/20/2017 | 1 | 1 | 1 |
| 4/20/2017 | 0 | 1 | 1 |
| 4/19/2017 | 23 | 1 | 1 |
End Goal and Other Options
What I am eventually trying to achieve with this output is to populate a line chart which displays the number of active sessions for either a month, day, or hour (used in the example output) between two dates. In the hour example, the date in combination with the HR would be used along the X-axis and the active sessions would be used along the Y-axis. The distinct user count would be available if a user hovered over the point on the chart. FYI Active sessions are the total number of sessions that were open at any point during the interval. Distinct users are the total number of distinct users during the interval. If I logged on and off twice in the same hour, it would be 2 active sessions, but only 1 distinct user.
Alternative Solutions
This seems to be a problem which must have come up may times before, but from all of my googling and stack overflow research I cannot seem to find the correct approach. If I am thinking about the query or ideal output incorrectly I AM OPEN TO ALTERNATE SUGGESTIONS which allow me to get the desired output to populate the chart appropriately on the front end.
Some SQL I Have Tried (Good Faith Effort)
There are many queries I've tried, but I'll start with this one as it is the closest I got but is extremely slow (unusably so)and it still does not produce the result I need.
Select * FROM (
SELECT
u.YearDt, u.MonthDt, u.DayDt, u.HourDt, u.MinDt,
COUNT(Distinct u.session_id) as unique_sessions,
COUNT(Distinct u.user_name) as unique_users,
LISTAGG(u.user_name, ', ') WITHIN GROUP (ORDER BY u.user_name ASC) as users
FROM
(SELECT EXTRACT(year FROM l.start_date) as YearDt,
EXTRACT(month FROM l.start_date) as MonthDt,
EXTRACT(day FROM l.start_date) as DayDt,
EXTRACT(HOUR FROM CAST(l.start_date AS TIMESTAMP)) as HourDt,
EXTRACT(MINUTE FROM CAST(l.start_date AS TIMESTAMP)) as MinDt,
l.session_id,
l.user_name,
l.start_date as act_date,
1 as is_start
FROM web_session l
UNION ALL
SELECT EXTRACT(year FROM l.stop_date) as YearDt,
EXTRACT(month FROM l.stop_date) as MonthDt,
EXTRACT(day FROM l.stop_date) as DayDt,
EXTRACT(HOUR FROM CAST(l.stop_date AS TIMESTAMP)) as HourDt,
EXTRACT(MINUTE FROM CAST(l.stop_date AS TIMESTAMP)) as MinDt,
l.session_id,
l.user_name,
l.stop_date as act_date,
0 as is_start
FROM web_session l
) u
GROUP BY CUBE ( u.YearDt, u.MonthDt, u.DayDt, u.HourDt, u.MinDt)
) c

You can use a CTE (Query 1) or a correlated hierarchical query (Query 2) to generate the hours within the time ranges and then aggregate. This only requires a single table scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Web_Session ( Session_id, user_name, start_date, stop_date ) AS
SELECT 1, 'joe', CAST( TIMESTAMP '2017-04-20 22:42:10' AS DATE ), CAST( TIMESTAMP '2017-04-21 02:42:10' AS DATE ) FROM DUAL UNION ALL
SELECT 2, 'matt', TIMESTAMP '2017-04-20 17:43:10', TIMESTAMP '2017-04-20 17:59:10' FROM DUAL UNION ALL
SELECT 3, 'matt', TIMESTAMP '2017-04-20 15:42:10', TIMESTAMP '2017-04-20 17:42:10' FROM DUAL UNION ALL
SELECT 4, 'joe', TIMESTAMP '2017-04-20 11:20:10', TIMESTAMP '2017-04-20 16:42:10' FROM DUAL UNION ALL
SELECT 5, 'john', TIMESTAMP '2017-04-20 08:42:10', TIMESTAMP '2017-04-20 11:42:10' FROM DUAL UNION ALL
SELECT 6, 'matt', TIMESTAMP '2017-04-20 07:42:10', TIMESTAMP '2017-04-20 23:42:10' FROM DUAL UNION ALL
SELECT 7, 'joe', TIMESTAMP '2017-04-19 23:20:10', TIMESTAMP '2017-04-20 01:42:10' FROM DUAL;
Query 1:
WITH hours ( session_id, user_name, hour, duration ) AS (
SELECT session_id,
user_name,
CAST( TRUNC( start_date, 'HH24' ) AS DATE ),
( TRUNC( stop_date, 'HH24' ) - TRUNC( start_date, 'HH24' ) ) * 24
FROM web_session
UNION ALL
SELECT session_id,
user_name,
hour + INTERVAL '1' HOUR, -- There is a bug in SQLFiddle that subtracts
-- hours instead of adding so -1 is used there.
duration - 1
FROM hours
WHERE duration > 0
)
SELECT hour,
COUNT( session_id ) AS active_sessions,
COUNT( DISTINCT user_name ) AS distinct_users
FROM hours
GROUP BY hour
ORDER BY hour
Results:
| HOUR | ACTIVE_SESSIONS | DISTINCT_USERS |
|----------------------|-----------------|----------------|
| 2017-04-19T23:00:00Z | 1 | 1 |
| 2017-04-20T00:00:00Z | 1 | 1 |
| 2017-04-20T01:00:00Z | 1 | 1 |
| 2017-04-20T07:00:00Z | 1 | 1 |
| 2017-04-20T08:00:00Z | 2 | 2 |
| 2017-04-20T09:00:00Z | 2 | 2 |
| 2017-04-20T10:00:00Z | 2 | 2 |
| 2017-04-20T11:00:00Z | 3 | 3 |
| 2017-04-20T12:00:00Z | 2 | 2 |
| 2017-04-20T13:00:00Z | 2 | 2 |
| 2017-04-20T14:00:00Z | 2 | 2 |
| 2017-04-20T15:00:00Z | 3 | 2 |
| 2017-04-20T16:00:00Z | 3 | 2 |
| 2017-04-20T17:00:00Z | 3 | 1 |
| 2017-04-20T18:00:00Z | 1 | 1 |
| 2017-04-20T19:00:00Z | 1 | 1 |
| 2017-04-20T20:00:00Z | 1 | 1 |
| 2017-04-20T21:00:00Z | 1 | 1 |
| 2017-04-20T22:00:00Z | 2 | 2 |
| 2017-04-20T23:00:00Z | 2 | 2 |
| 2017-04-21T00:00:00Z | 1 | 1 |
| 2017-04-21T01:00:00Z | 1 | 1 |
| 2017-04-21T02:00:00Z | 1 | 1 |
Execution Plan:
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 364 | 7 | 00:00:01 |
| 1 | SORT GROUP BY | | 14 | 364 | 7 | 00:00:01 |
| 2 | VIEW | VW_DAG_0 | 14 | 364 | 7 | 00:00:01 |
| 3 | HASH GROUP BY | | 14 | 364 | 7 | 00:00:01 |
| 4 | VIEW | | 14 | 364 | 6 | 00:00:01 |
| 5 | UNION ALL (RECURSIVE WITH) BREADTH FIRST | | | | | |
| 6 | TABLE ACCESS FULL | WEB_SESSION | 7 | 245 | 3 | 00:00:01 |
| * 7 | RECURSIVE WITH PUMP | | | | | |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 7 - filter("DURATION">0)
Note
-----
- dynamic sampling used for this statement
Query 2:
SELECT t.COLUMN_VALUE AS hour,
COUNT( session_id ) AS active_sessions,
COUNT( DISTINCT user_name ) AS distinct_users
FROM web_session w
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT TRUNC( w.start_date, 'HH24' ) + ( LEVEL - 1 ) / 24
FROM DUAL
CONNECT BY TRUNC( w.start_date, 'HH24' ) + ( LEVEL - 1 ) / 24 < w.stop_date
) AS SYS.ODCIDATELIST
)
) t
GROUP BY t.COLUMN_VALUE
ORDER BY hour
Results:
| HOUR | ACTIVE_SESSIONS | DISTINCT_USERS |
|----------------------|-----------------|----------------|
| 2017-04-19T23:00:00Z | 1 | 1 |
| 2017-04-20T00:00:00Z | 1 | 1 |
| 2017-04-20T01:00:00Z | 1 | 1 |
| 2017-04-20T07:00:00Z | 1 | 1 |
| 2017-04-20T08:00:00Z | 2 | 2 |
| 2017-04-20T09:00:00Z | 2 | 2 |
| 2017-04-20T10:00:00Z | 2 | 2 |
| 2017-04-20T11:00:00Z | 3 | 3 |
| 2017-04-20T12:00:00Z | 2 | 2 |
| 2017-04-20T13:00:00Z | 2 | 2 |
| 2017-04-20T14:00:00Z | 2 | 2 |
| 2017-04-20T15:00:00Z | 3 | 2 |
| 2017-04-20T16:00:00Z | 3 | 2 |
| 2017-04-20T17:00:00Z | 3 | 1 |
| 2017-04-20T18:00:00Z | 1 | 1 |
| 2017-04-20T19:00:00Z | 1 | 1 |
| 2017-04-20T20:00:00Z | 1 | 1 |
| 2017-04-20T21:00:00Z | 1 | 1 |
| 2017-04-20T22:00:00Z | 2 | 2 |
| 2017-04-20T23:00:00Z | 2 | 2 |
| 2017-04-21T00:00:00Z | 1 | 1 |
| 2017-04-21T01:00:00Z | 1 | 1 |
| 2017-04-21T02:00:00Z | 1 | 1 |
Execution Plan:
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 57176 | 2115512 | 200 | 00:00:03 |
| 1 | SORT GROUP BY | | 57176 | 2115512 | 200 | 00:00:03 |
| 2 | NESTED LOOPS | | 57176 | 2115512 | 195 | 00:00:03 |
| 3 | TABLE ACCESS FULL | WEB_SESSION | 7 | 245 | 3 | 00:00:01 |
| 4 | COLLECTION ITERATOR SUBQUERY FETCH | | 8168 | 16336 | 27 | 00:00:01 |
| * 5 | CONNECT BY WITHOUT FILTERING | | | | | |
| 6 | FAST DUAL | | 1 | | 2 | 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(TRUNC(:B1,'fmhh24')+(LEVEL-1)/24<:B2)
Note
-----
- dynamic sampling used for this statement

I think something like this will work:
WITH ct ( active_dt ) AS (
-- Build the query for the "table" of hours
SELECT DATE'2018-04-19' + (LEVEL-1)/24 AS active_dt FROM dual
CONNECT BY DATE'2018-04-19' + (LEVEL-1)/24 < DATE'2018-04-22'
)
SELECT active_dt AS "Date", active_hr AS "HR"
, COUNT(session_id) AS active_sessions
, COUNT(DISTINCT user_name) AS distinct_users
FROM (
SELECT TRUNC(ct.active_dt) AS active_dt
, TO_CHAR(ct.active_dt, 'HH24') AS active_hr
, ws.session_id, ws.user_name
FROM ct LEFT JOIN web_session ws
ON ct.active_dt + 1/24 >= ws.start_dt
AND ct.active_dt < ws.stop_dt
) GROUP BY active_dt, active_hr
ORDER BY active_dt DESC, active_hr DESC;
I may not have the conditions for the LEFT JOIN 100% correct.
Hope this helps.

Matt,
What you need to do is generate a time dimension either as a static table or dynamically at run time:
create table time_dim (
ts date primary key,
year number not null,
month number not null,
day number not null,
wday number not null,
dy varchar2(3) not null,
hr number not null
);
insert into time_dim (ts, year, month, day, wday, dy, hr)
select ts
, extract(year from ts) year
, extract(month from ts) month
, extract(day from ts) day
, to_char(ts,'d') wday
, to_char(ts,'dy') dy
, to_number(to_char(ts,'HH24')) hr
from (
select DATE '2017-01-01' + (level - 1)/24 ts
FROM DUAL connect by level <= 365*24) a;
Then outer join that to your web_sessions table:
select t.ts, t.year, t.month, t.wday, t.dy, t.hr
, count(session_id) sessions
, count(distinct user_name) users
from time_dim t
left join web_session w
on t.ts between trunc(w.start_date, 'hh24') and w.stop_date
where trunc(t.ts) between date '2017-04-19' and date '2017-04-21'
group by rollup (t.year, t.month, (t.wday, t.dy), (t.hr, t.ts));
You can change up the group by clause to get the various aggregates you're interested in.
In the above code, I'm truncating the start_date to the hour in the ON clause so that the start hour will be included in the results otherwise sessions that don't start exactly at the top of the hour would not get counted in that hour.

Related

SQL - Calculate number of occurrences of previous day?

I want to calculate the number of people who also had occurrence the previous day on a daily basis, but I'm not sure how to do this?
Sample Table:
| ID | Date |
+----+-----------+
| 1 | 1/10/2020 |
| 1 | 1/11/2020 |
| 2 | 2/20/2020 |
| 3 | 2/20/2020 |
| 3 | 2/21/2020 |
| 4 | 2/23/2020 |
| 4 | 2/24/2020 |
| 5 | 2/22/2020 |
| 5 | 2/23/2020 |
| 5 | 2/24/2020 |
+----+-----------+
Desired Output:
| Date | Count |
+-----------+-------+
| 1/11/2020 | 1 |
| 2/21/2020 | 1 |
| 2/23/2020 | 1 |
| 2/24/2020 | 2 |
+-----------+-------+
Edit: Added desired output. The output count should be unique to the ID, not the number of date occurrences. i.e. an ID 5 can appear on this list 10 times for dates 2/23/2020 and 2/24/2020, but that would count as "1".
Use lag():
select date, count(*)
from (select t.*, lag(date) over (partition by id order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;

Counting events only once if an event happens more than once every X minutes

I have a table that is filled everytime a user starts a session in my app. But I dont want to count their session more than once if they make it within 10 minutes. How can I do it?
Here's an example of what is returned from the table
select
*
from table
limit 100
+----------+--------+---------+----------------+
| event_ID | userid | city_id | created_at |
+----------+--------+---------+----------------+
| 1 | a | 1 | 15/08/19 10:10 |
| 2 | b | 1 | 15/08/19 10:11 |
| 3 | a | 1 | 15/08/19 10:14 |
| 4 | a | 1 | 15/08/19 10:25 |
| 5 | b | 1 | 15/08/19 10:27 |
| 6 | c | 1 | 15/08/19 10:30 |
| 7 | c | 1 | 15/08/19 10:35 |
| 8 | d | 1 | 15/08/19 10:40 |
| 9 | d | 1 | 15/08/19 10:49 |
| 10 | c | 1 | 15/08/19 10:55 |
+----------+--------+---------+----------------+
In the end, I would want to count the unique event_ids for each user, based on the premise that a unique event_id is defined by the amount of times it happens every 10 minutes
So it should be something like this in the end:
+--------+------------------+
| userid | unique_event_ids |
+--------+------------------+
| a | 2 |
| b | 2 |
| c | 2 |
| d | 1 |
+--------+------------------+
+--------+------------------+
| Total | 7 |
+--------+------------------+
Any suggestion on how to start?
Use lag() to determine when the previous event was created for the user. Then some date filtering and aggregation:
select userid, count(*)
from (select t.*,
lag(created_at) over (partition by userid order by created_at) as prev_created_at
from t
) t
where prev_created_at is null or prev_created_at < created_at - interval '10 minute'
group by userid
I would do:
select
userid,
sum(case when created_at - interval '10 minute' < prev then 0 else 1 end)
as unique_events_ids
from (
select
*,
lag(created_at) over(partition by userid order by created_at) as prev
from t
) x
group by userid

Count of records between min date range and other date

I'm trying to get the count of records of users who appear between a certain date range, specifically the min(date) for each unique user and that min(date) + 14 days. I've checked this link
SQL HAVING BETWEEN a date range
but it's not what I'm looking for. Here's an example of what I'm working with and what I've tried to do
+----+------------+
| ID | ServiceDt |
+----+------------+
| 10 | 2017-03-02 |
| 10 | 2017-03-05 |
| 10 | 2017-03-06 |
| 10 | 2017-03-14 |
| 10 | 2017-03-27 |
| 11 | 2017-03-10 |
| 11 | 2017-03-19 |
| 11 | 2017-04-02 |
| 11 | 2017-04-14 |
| 11 | 2017-04-23 |
| .. | .. |
The query is:
SELECT ID, COUNT(ServiceDt) AS date_count
FROM (
SELECT ID, ServiceDt
FROM tbl
GROUP BY ID, ServiceDt
HAVING ServiceDt BETWEEN MIN(ServiceDt) AND DATEADD(day, +14, MIN(ServiceDt))
) AS R1
GROUP BY ID
When I do the above query I get the following result.
+----+------------+
| ID | date_count |
+----+------------+
| 10 | 5 |
| 11 | 5 |
| .. | .. |
I also tried using CONVERT(date, ...), but I get the same resulting table above. I want the result to be
+----+------------+
| ID | date_count |
+----+------------+
| 10 | 4 |
| 11 | 2 |
| .. | .. |
Can someone please guide me on what I can do to get my desired output, thanks
Use window functions:
select id, count(*)
from (select t.*, min(servicedt) over (partition by id) as min_sd
from tbl t
) t
where servicedt <= dateadd(day, 14, min_sd)
group by id;
Another option is to use cross apply() to get the first ServiceDt for each id and use that in your where clause.
select id, count(*) as date_count
from t
cross apply (
select top 1
i.ServiceDt
from t i
where i.Id = t.Id
order by i.ServiceDt
) x
where t.ServiceDt <= dateadd(day,14,x.ServiceDt)
group by id
rextester demo: http://rextester.com/WXA46698
returns:
+----+------------+
| id | date_count |
+----+------------+
| 10 | 4 |
| 11 | 2 |
+----+------------+

How to partition by a customized sum value?

I have a table with the following columns: customer_id, event_date_time
I'd like to figure out how many times a customer triggers an event every 12 hours from the start of an event. In other words, aggregate the time between events for up to 12 hours by customer.
For example, if a customer triggers an event (in order) at noon, 1:30pm, 5pm, 2am, and 3pm, I would want to return the noon, 2am, and 3pm record.
I've written this query:
select
cust_id,
event_datetime,
nvl(24*(event_datetime - lag(event_datetime) over (partition BY cust_id ORDER BY event_datetime)),0) as difference
from
tbl
I feel like I'm close with this. Is there a way to add something like
over (partition BY cust_id, sum(difference)<12 ORDER BY event_datetime)
EDIT: I'm adding some sample data:
+---------+-----------------+-------------+---+
| cust_id | event_datetime | DIFFERENCE | X |
+---------+-----------------+-------------+---+
| 1 | 6/20/2015 23:35 | 0 | x |
| 1 | 6/21/2015 0:09 | 0.558611111 | |
| 1 | 6/21/2015 0:49 | 0.667777778 | |
| 1 | 6/21/2015 1:30 | 0.688333333 | |
| 1 | 6/21/2015 9:38 | 8.133055556 | |
| 1 | 6/21/2015 10:09 | 0.511111111 | |
| 1 | 6/21/2015 10:45 | 0.600555556 | |
| 1 | 6/21/2015 11:09 | 0.411111111 | |
| 1 | 6/21/2015 11:32 | 0.381666667 | |
| 1 | 6/21/2015 11:55 | 0.385 | x |
| 1 | 6/21/2015 12:18 | 0.383055556 | |
| 1 | 6/21/2015 12:23 | 0.074444444 | |
| 1 | 6/22/2015 10:01 | 21.63527778 | x |
| 1 | 6/22/2015 10:24 | 0.380555556 | |
| 1 | 6/22/2015 10:46 | 0.373611111 | |
+---------+-----------------+-------------+---+
The "x" are the records that should be pulled since they're the first records in the 12 hour block.
If I understand correctly, you want the first record in each 12-hour block where the blocks of time are defined by the first event time.
If so, you need to modify your query to get the difference from the *first * time for each customer. The rest is just arithmetic. The query would look something like this:
with t as (
select cust_id, event_datetime,
(24 * (event_datetime -
coalesce(min(event_datetime) over (partition by cust_id ), 0)
) as difference
from tbl
)
select t.*
from (select t.*,
row_number() over (partition by cust_id, floor(difference / 12)
order by difference) as seqnum
from t
) t
where seqnum = 1;

Oracle SQL Count of a field Grouped by another field

I have a small query below which i want to have a Count of each HOUSE_ID and Grouped By LOCATION_ID however it is not grouping by LOCATION_ID because the HOUSE_ID's are different. I want it to count HOUSE_ID's by LOCATION_ID's regardless of the HOUSE_ID.
QUERY
SELECT
COUNT(HOUSE_ID) AS Count,
LOCATION_ID,
ZONE,
AREA
FROM TABLE
WHERE SITE_ID = 'ABC'
AND LOCATION_ID NOT LIKE ('%LAND%')
GROUP BY LOCATION_ID, HOUSE_ID, ZONE, AREA
Expected Result
_____________________________
|Count|LOCATION_ID|ZONE|AREA|
|¯¯¯¯¯|¯¯¯¯¯¯¯¯¯¯¯|¯¯¯¯|¯¯¯¯|
| 4 | LOCA | 2 | 1 |
| 7 | LOCB | 6 | 2 |
| 3 | LOCC | 3 | 1 |
| 9 | LOCD | 5 | 7 |
| 6 | LOCE | 7 | 4 |
| 2 | LOCF | 2 | 1 |
| 8 | LOCG | 7 | 5 |
| 7 | LOCH | 9 | 1 |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Actual Result
_____________________________
|Count|LOCATION_ID|ZONE|AREA|
|¯¯¯¯¯|¯¯¯¯¯¯¯¯¯¯¯|¯¯¯¯|¯¯¯¯|
| 1 | LOCA | 2 | 1 |
| 1 | LOCA | 6 | 2 |
| 1 | LOCA | 3 | 1 |
| 1 | LOCA | 5 | 7 |
| 1 | LOCA | 7 | 4 |
| 1 | LOCA | 2 | 1 |
| 1 | LOCA | 7 | 5 |
| 1 | LOCA | 9 | 1 |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
You have to remove HOUSE_ID from group by clause.
Without source data I can only guess that you also need agregate functions for ZONE and AREA column, MAX fro example. Try below solution:
SELECT
COUNT(HOUSE_ID) AS Count,
LOCATION_ID,
MAX(ZONE),
MAX(AREA)
FROM TABLE
WHERE SITE_ID = 'ABC'
AND LOCATION_ID NOT LIKE ('%LAND%')
GROUP BY LOCATION_ID
Got it, Needed to Count(*)!!
SELECT
COUNT(*) AS Count,
SUM(AREA) AS AREA
LOCATION_ID,
ZONE,
FROM TABLE
WHERE SITE_ID = 'ABC'
AND LOCATION_ID NOT LIKE ('%LAND%')
GROUP BY LOCATION_ID, ZONE