SQL for Microsoft Access 2013 - sql

I am trying to get a count of customers who return for additional services that takes into account a comparison of assessment scores prior to the service, grouped by the type of Service. (Ultimately, I also want to be able to ignore returns that are within a month of the orginal service, but I'm pretty sure I can get that wrinkle sorted myself)
When counting results for a particular service, it should look at returns to any service type, not just the original service type. (Edit: *It should also look at all future returns, not just the next or the most recent *).
It does not need to be run often, but there are 15000+ lines of data and computational resources are limited by an underpowered machine (this is for a nonprofit organization), so efficiency would be nice but not absolutely needed.
Sample Data
ServiceTable
CustomerID Service Date ScoreBefore
A Service1 1/1/2017 1
A Service2 1/3/2017 1
A Service1 1/1/2018 4
B Service3 3/1/2018 3
B Service1 6/1/2018 1
B Service1 6/2/2018 1
C Service2 1/1/2019 4
C Service2 6/1/2019 1
Results should be (not taking into account the date padding option):
Service1
ReturnedWorse 0
ReturnedSame 2
ReturnedBetter 1
Service2
ReturnedWorse 1
ReturnedSame 0
ReturnedBetter 1
Service3
ReturnedWorse 2
So far, I have tried creating make table queries that could then be queried to get the aggregate info, but I am a bit stuck and suspect there may be a better route.
What I have tried:
SELECT CustomerID, Service, Date, ScoreBefore INTO ReturnedWorse
FROM ServiceTable AS FirstStay
WHERE ((((SELECT COUNT(*)
FROM ServiceTable AS SecondStay
WHERE FirstStay.CustomerID=SecondStay.CustomerID
AND
FirstStay.ScoreBefore> SecondStay.ScoreBefore
AND
SecondStay.Date > FirstStay.Date))));
Any help would be greatly appreciated.

This would have been easier to do with window functions, but they are not available in ms-access.
Here is a query that solves my understanding of your question :
t0: pick a record in the table (a customer buying a service)
t1 : pull out the record corresponding to the next time the same customer contracted any service with an INNER JOIN and a correlated subquery (if there is no such record, the initial record is not taken into account)
compare the score of the previous record to the current one
group the results by service id
You can see it in action in this db fiddlde. The results are slightly different from your expectation (see my comments)... but they are consistent with the above explanation ; you might want to adapt some of the rules to match your exact expected result, using the same principles.
SELECT
t0.service,
SUM(CASE WHEN t1.scorebefore < t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedWorse,
SUM(CASE WHEN t1.scorebefore = t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedSame,
SUM(CASE WHEN t1.scorebefore > t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedBetter
FROM
mytable t0
INNER JOIN mytable t1
ON t0.customerid = t1.customerid
AND t0.date < t1.date
AND NOT EXISTS (
SELECT 1
from mytable
WHERE
customerid = t1.customerid
AND date < t1.date
AND date > t0.date
)
GROUP BY t0.service
| service | ReturnedWorse | ReturnedSame | ReturnedBetter |
| -------- | ------------- | ------------ | -------------- |
| Service1 | 0 | 2 | 0 |
| Service2 | 1 | 0 | 1 |
| Service3 | 1 | 0 | 0 |
From your comments, I understand that you want to take into account all future returns and not only the next one. This eliminates the need for a correlatead subquery, and actually yields your expected output. See this db fiddle :
SELECT
t0.service,
SUM(CASE WHEN t1.scorebefore < t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedWorse,
SUM(CASE WHEN t1.scorebefore = t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedSame,
SUM(CASE WHEN t1.scorebefore > t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedBetter
FROM
mytable t0
INNER JOIN mytable t1
ON t0.customerid = t1.customerid
-- AND t0.service = t1.service
AND t0.date < t1.date
GROUP BY t0.service
| service | ReturnedWorse | ReturnedSame | ReturnedBetter |
| -------- | ------------- | ------------ | -------------- |
| Service1 | 0 | 2 | 1 |
| Service2 | 1 | 0 | 1 |
| Service3 | 2 | 0 | 0 |

Related

Count specific values that belong to the same column SQL Server

I would like to ask for help I would like to count the number of records but having 2 conditions for 2 specific values only from a one single column:
Count the records first:
count(service) as contacts
service | state | address
service1| 1 |123
service2| 1 |321
service3| 3 |332
service1| 2 |333
service2| 2 |111
service3| 3 |333
1st result
Service | Contacts | status
service1| 1 | 1
service1| 1 | 2
service1| 1 | 3
service2| 1 | 1
service2| 1 | 2
service2| 1 | 3
if status = 1 and 2 then add to count else 0 (only count who's "status" is equal to 1 and 2.
Result:
Final result
Service | Contacts
Service1 | 2
Service2 | 2
sorry for the confusion
Thanks for your big help
This should work for all major databases
SELECT service,
SUM(CASE WHEN status IN(1,2) THEN contacts ELSE 0 END) as Contacts
FROM (your query) as x
GROUP BY service
select service,
sum(case when status in(1,2) then contacts else 0 end) as contact_count
from MyTable
group by service
What this will do is evaluate each row and include the contacts value in the sum where the status is one that is required. As it is an aggregate, you need to group by the service

Most efficient way to count number of matching rows with multiple criterias at once

I have a very large table (called device_operation with 50 million rows) which holds all the operations of a product in its lifecycle (such as "start", "stop", "refill", ..." and the status of these operations (row status : Completed, Failed), with the ID of the associated device (row device_id) and a timestamp for each operation (row create_date).
Something like this :
/------+-----------+------------------+---------\
| ID | Device ID | Create_Date | Status |
+------+-----------+------------------+---------+
| 1 | 1 | 2012-03-04 01:43 | Success |
| 2 | 4 | 2012-04-04 02:34 | Failed |
| 3 | 9 | 2013-01-01 01:23 | Failed |
| 4 | 4 | 2013-12-12 12:34 | Success |
| 5 | 23 | 2014-02-01 03:45 | Success |
| 6 | 1 | 2014-05-03 08:34 | Failed |
\------+-----------+------------------+---------/
I also have another table (called subscription) that tells me when the warranty has started (row create_date) for the product (row device_id). Warranty lasts one year.
/-----------+------------------\
| Device ID | Create_Date |
+-----------+------------------+
| 2 | 2011-04-03 05:00 |
| 4 | 2012-03-05 03:45 |
| 5 | 2012-03-05 06:07 |
| ... | ... |
\-----------+------------------/
I am using PostgreSQL.
I want to do the following :
List all device IDs which had at least one successful operation before a given date (2014-07-06)
For each of those devices, count :
The number of failed operations after that date + 2 days (2014-07-08), and the device was under warranty when the operation was attempted
The number of failed operations after that date + 2 days (2014-07-08), and the device was outside warranty when the operation was attempted
The number of successful operations after that date (device being under warranty or not)
I had some limited success with the following (query has been simplified a little bit for readability - there are other joins involved to get to the subscription table, and other criterias to include the devices in the list) :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(*)
FROM device_operation dop
WHERE dop.device_id = device_operation.device_id and
dop.create_date > '2014-07-08' and
dop.status = 'Success'
) as success,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date <= subscription.create_date + interval '1 year'
) as failed_during_warranty,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date > subscription.create_date + interval '1 year'
) as failed_after_warranty,
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_during_warranty DESC, failed_after_warranty DESC
As you can guess, it's so slow I cannot run the query. However it gives you an idea of the structure.
I have tried to use NULLIF to combine the requests into one, in the hope it's going to make PostgreSQL only list the subquery once instead of 3, but it returns "subquery must return only one column" :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(NULLIF(dop2.status != 'Success', true)) as completed,
COUNT(NULLIF(dop2.status != 'Failed' or not (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_in_warranty,
COUNT(NULLIF(dop2.status != 'Failed' or (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_after_warranty
FROM device_operation dop2
WHERE
dop2.device_id = device_operation.device_id and
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08'
) as subq
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_in_warranty DESC, failed_outside_warranty DESC
I also tried to move the subquery to the FROM clause but that doesn't work as I need to run the subquery for each row of the main query (or do I ? maybe there's a better way)
What I expect is something like this :
/-----------+---------+------------------------+-----------------------\
| Device ID | Success | Failed during warranty | Failed after warranty |
+-----------+---------+------------------------+-----------------------+
| 194853 | 10 | 0 | 0 |
| 7853 | 5 | 5 | 0 |
| 5848 | 3 | 0 | 56 |
| 8546455 | 0 | 45 | 0 |
| 102 | 0 | 4 | 1 |
| 69329548 | 0 | 0 | 9 |
| 17 | 0 | 0 | 0 |
\-----------+---------+------------------------+-----------------------+
Can someone help me find the most efficient way to do it ?
EDIT: Corner cases: You can consider all devices have an entry in subscription.
Thank you very much !
I think you just require conditional aggregation. I find the data structure and logic a bit hard to follow, but I think the following is basically what you need:
SELECT d.device_id,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumFails,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day' AND
d.create_date > s.create_date + interval '1 year'
THEN 1 ELSE 0
END) as NumFailsNoWarranty,
SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumSuccesses
FROM device_operation d JOIN
subscription s
ON d.device_id = s.device_id
GROUP BY d.device_id
HAVING SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' THEN 1 ELSE 0 END) > 0;

Postgres: How to use value of field in subquery

I have db table like this:
id | user | service | pay_date | status
---+------+---------+--------------+------------
1 | john | 1 | 2014-08-20 | 1
2 | john | 3 | 2014-07-24 | 0
I am trying to build query which return all users who are not payed service 1 or service 3 between 2 dates and haven`t active services (status = 0 for service 1 and service 2). The problem is that query return service 3, but this is not correct because service 1 is active.
SELECT *
FROM services
WHERE date > 2014-07-01
AND date < 2014-07-30
AND service = 1 OR service = 3
My solution is to check if service 3 is active when i`am checking service 1. Something like that...
SELECT *
FROM services
WHERE date > 2014-07-01
AND date < 2014-07-30
AND ((service = 1 AND "SERVICE 3 IS NOT ACTIVE") OR service = 3)
but i cant check if "SERVICE 3 IS NOT ACTIVE" for current user
I think you want NOT EXISTS, i.e. checking that another active record does not exist for the same user, possibly something like:
SELECT *
FROM Services
WHERE Service IN (1, 3)
AND pay_date > '2014-07-01'
AND pay_date < '2014-07-30'
AND NOT EXISTS
( SELECT 1
FROM services AS s2
WHERE s2.Name = s.Name
AND s2.Status = 1
AND s2.Service IN (1, 3)
AND s2.pay_date > '2014-07-01'
AND s2.pay_date < '2014-07-30'
);
There is no single record that meets all those criteria.
id 1 record cannot be both service = 1 & service = 3 at the some time
id | user | service | pay_date | status
---+------+---------+--------------+------------
1 | john | 1 | 2014-08-20 | 1
id 2 record cannot be both service = 1 & service = 3 at the some time
id | user | service | pay_date | status
---+------+---------+--------------+------------
2 | john | 3 | 2014-08-24 | 0 -- nb: I changed this date!
SO: your criteria require comparing record 1 to record 2 (or more broadly, that you have to compare "across records"). below I use EXISTS to compare across records.
SELECT
*
FROM services
WHERE (pay_date >= '2014-08-01'
AND pay_date < '2014-08-30')
AND service = 1
AND EXISTS (
SELECT
NULL
FROM services s3
WHERE (s3.pay_date >= '2014-08-01'
AND s3.pay_date < '2014-08-30')
AND s3.service = 3
and s3.status = 1
AND s3.user = services.user
)
see SQLfiddle Demo
by the way, it is more conventional to use a date range via >= & <

Selecting Historical Record Outside of Date Range

I have a situation where I need to select an address that was current for a particular date and time from an address history table. Some sample records might be as follows:
Address/Client JOIN Table (Address_Client_JOIN):
-------------------------
|AddressId | ClientId |
-------------------------
|5 | 8888887 |
-------------------------
|6 | 8888887 |
-------------------------
History Table (Address_History):
-------------------------------------------------------------------------------------------
|HistoryId | AddressId | AddTypeId | StreetAddress | CreatedDate | ModifiedDate |
-------------------------------------------------------------------------------------------
|1 | 5 | 1 | 123 Home Street| 2013-03-11 21:08 | 2013-04-02 13:18|
-------------------------------------------------------------------------------------------
|2 | 5 | 2 | 456 My Avenue | 2013-03-11 21:08 | 2013-04-08 15:00|
-------------------------------------------------------------------------------------------
|3 | 6 | 1 | 789 Cat Road | 2013-05-17 12:00 | 2013-05-17 12:00|
-------------------------------------------------------------------------------------------
The requirements for this query are that I have to grab the earliest record where #dateOfService falls between the CreatedDate and the ModifiedDate and where the AddTypeId is "1", if there is one, otherwise any other AddTypeId. The query I've thus far created is:
SELECT TOP 1 ah.HistoryId, ah.AddTypeId, ah.AddressId, ah.StreetAddress,
ah.CreatedDate, ah.ModifiedDate
FROM Address_Client_JOIN acj WITH (NOLOCK)
INNER JOIN Address_History ah WITH (NOLOCK) ON ah.AddressId = acj.AddressId
WHERE apj.ClientId = #clientId
AND (ah.CreatedDate <= #dateOfService
AND (#dateOfService <= ah.ModifiedDate ))
ORDER BY
ah.HistoryId ASC, CASE WHEN ah.AddTypeId = 1 THEN 0 ELSE 1 END
This works fine as long as the #dateOfService falls between the CreatedDate and ModifiedDate. However, when I've got a #dateOfService that occurs after the ModifiedDate, I get nothing, obviously. I need to be able to account for a situation where (using the above data) #dateOfService is after the ModifiedDate of 5/17/2013. For example, where #dateOfService = '2013-08-01 12:30'.
Thanks in advance.
You are only selecting the top row. That means that you can move the where filter into the order by clause. Then, it becomes a priority rather than a filter.
So, if nothing matches the filter, you will still be able to get a row. I think the query you want is something like:
SELECT TOP 1 ah.HistoryId, ah.AddTypeId, ah.AddressId, ah.StreetAddress,
ah.CreatedDate, ah.ModifiedDate
FROM Address_Client_JOIN acj WITH (NOLOCK) INNER JOIN
Address_History ah WITH (NOLOCK)
ON ah.AddressId = acj.AddressId
WHERE apj.ClientId = #clientId
ORDER BY ah.HistoryId ASC,
(CASE WHEN ah.AddTypeId = 1 THEN 0 ELSE 1 END),
(case when ah.CreatedDate <= #dateOfService AND #dateOfService <= ah.ModifiedDate then 1
when #dateOfService > ah.ModifiedDate then 2
else 3
end)

How to retrieve data from different rows of the same table based on different criteria

I'm trying to write a plain SQL statement for building an Oracle report but I'm stuck at some point. x_request table stores the requests made and different tasks related to specific requests that have been done are stored in x_request_work_log. To summarize the structure of these tables:
X_request
-id
-name
-requester
-request_date
x_request_work_log
-id
-request_id (foreign key)
-taskId
-start_date
-end_date
Now let's assume that these tables are filled with sample data as follows:
x_request
id name requester request_date
1 firstReq John 01/01/2012
2 secondReq Steve 21/01/2012
x_request_work_log
id requestId taskId startDate endDate
1 1 0 01/01/2012 03/01/2012
2 1 1 04/01/2012 04/01/2012
3 1 2 05/01/2012 15/01/2012
4 2 0 24/01/2012 02/02/2012
The template of my report is as follows:
requestName timeSpent(task(0)) timeSpent(task(1)) timeSpent(task(2))
| | | | | | | |
So, that's where I'm stuck. I need a Sql Select statement that will return each row in the formatted way as described above. How can i retrieve and display the start and end dates of different tasks. Btw timeSpent = endDate(task(x)) - startDate(task(x))
Note: Using different select subqueries for each spent time calculation is not an option due to performance constraints. There must be another way.
It sounds like you just want something like
SELECT r.name request_name,
SUM( (CASE WHEN l.taskId = 0
THEN l.endDate - l.StartDate
ELSE 0
END) ) task0_time_spent,
SUM( (CASE WHEN l.taskId = 1
THEN l.endDate - l.StartDate
ELSE 0
END) ) task1_time_spent,
SUM( (CASE WHEN l.taskId = 2
THEN l.endDate - l.StartDate
ELSE 0
END) ) task2_time_spent
FROM x_request_work_log l
JOIN x_request r ON (l.requestId = r.Id)
GROUP BY r.name
If you happen to be using 11g, you could also use the PIVOT operator.
If you need to display all members of a group in one row, you can accomplish this in MySQL with the GROUP_CONCAT operator (I don't know what the equivalent is in Oracle):
> SELECT requestID,
GROUP_CONCAT(DATEDIFF(endDate,startDate)) AS length
FROM request_work_log
GROUP BY requestId;
+-----------+--------+
| requestID | length |
+-----------+--------+
| 1 | 2,0,10 |
| 2 | 9 |
+-----------+--------+
(and then add in the inner join to your other table to replace requestID with the request name)