Postgres: How to use value of field in subquery - sql

I have db table like this:
id | user | service | pay_date | status
---+------+---------+--------------+------------
1 | john | 1 | 2014-08-20 | 1
2 | john | 3 | 2014-07-24 | 0
I am trying to build query which return all users who are not payed service 1 or service 3 between 2 dates and haven`t active services (status = 0 for service 1 and service 2). The problem is that query return service 3, but this is not correct because service 1 is active.
SELECT *
FROM services
WHERE date > 2014-07-01
AND date < 2014-07-30
AND service = 1 OR service = 3
My solution is to check if service 3 is active when i`am checking service 1. Something like that...
SELECT *
FROM services
WHERE date > 2014-07-01
AND date < 2014-07-30
AND ((service = 1 AND "SERVICE 3 IS NOT ACTIVE") OR service = 3)
but i cant check if "SERVICE 3 IS NOT ACTIVE" for current user

I think you want NOT EXISTS, i.e. checking that another active record does not exist for the same user, possibly something like:
SELECT *
FROM Services
WHERE Service IN (1, 3)
AND pay_date > '2014-07-01'
AND pay_date < '2014-07-30'
AND NOT EXISTS
( SELECT 1
FROM services AS s2
WHERE s2.Name = s.Name
AND s2.Status = 1
AND s2.Service IN (1, 3)
AND s2.pay_date > '2014-07-01'
AND s2.pay_date < '2014-07-30'
);

There is no single record that meets all those criteria.
id 1 record cannot be both service = 1 & service = 3 at the some time
id | user | service | pay_date | status
---+------+---------+--------------+------------
1 | john | 1 | 2014-08-20 | 1
id 2 record cannot be both service = 1 & service = 3 at the some time
id | user | service | pay_date | status
---+------+---------+--------------+------------
2 | john | 3 | 2014-08-24 | 0 -- nb: I changed this date!
SO: your criteria require comparing record 1 to record 2 (or more broadly, that you have to compare "across records"). below I use EXISTS to compare across records.
SELECT
*
FROM services
WHERE (pay_date >= '2014-08-01'
AND pay_date < '2014-08-30')
AND service = 1
AND EXISTS (
SELECT
NULL
FROM services s3
WHERE (s3.pay_date >= '2014-08-01'
AND s3.pay_date < '2014-08-30')
AND s3.service = 3
and s3.status = 1
AND s3.user = services.user
)
see SQLfiddle Demo
by the way, it is more conventional to use a date range via >= & <

Related

SQL for Microsoft Access 2013

I am trying to get a count of customers who return for additional services that takes into account a comparison of assessment scores prior to the service, grouped by the type of Service. (Ultimately, I also want to be able to ignore returns that are within a month of the orginal service, but I'm pretty sure I can get that wrinkle sorted myself)
When counting results for a particular service, it should look at returns to any service type, not just the original service type. (Edit: *It should also look at all future returns, not just the next or the most recent *).
It does not need to be run often, but there are 15000+ lines of data and computational resources are limited by an underpowered machine (this is for a nonprofit organization), so efficiency would be nice but not absolutely needed.
Sample Data
ServiceTable
CustomerID Service Date ScoreBefore
A Service1 1/1/2017 1
A Service2 1/3/2017 1
A Service1 1/1/2018 4
B Service3 3/1/2018 3
B Service1 6/1/2018 1
B Service1 6/2/2018 1
C Service2 1/1/2019 4
C Service2 6/1/2019 1
Results should be (not taking into account the date padding option):
Service1
ReturnedWorse 0
ReturnedSame 2
ReturnedBetter 1
Service2
ReturnedWorse 1
ReturnedSame 0
ReturnedBetter 1
Service3
ReturnedWorse 2
So far, I have tried creating make table queries that could then be queried to get the aggregate info, but I am a bit stuck and suspect there may be a better route.
What I have tried:
SELECT CustomerID, Service, Date, ScoreBefore INTO ReturnedWorse
FROM ServiceTable AS FirstStay
WHERE ((((SELECT COUNT(*)
FROM ServiceTable AS SecondStay
WHERE FirstStay.CustomerID=SecondStay.CustomerID
AND
FirstStay.ScoreBefore> SecondStay.ScoreBefore
AND
SecondStay.Date > FirstStay.Date))));
Any help would be greatly appreciated.
This would have been easier to do with window functions, but they are not available in ms-access.
Here is a query that solves my understanding of your question :
t0: pick a record in the table (a customer buying a service)
t1 : pull out the record corresponding to the next time the same customer contracted any service with an INNER JOIN and a correlated subquery (if there is no such record, the initial record is not taken into account)
compare the score of the previous record to the current one
group the results by service id
You can see it in action in this db fiddlde. The results are slightly different from your expectation (see my comments)... but they are consistent with the above explanation ; you might want to adapt some of the rules to match your exact expected result, using the same principles.
SELECT
t0.service,
SUM(CASE WHEN t1.scorebefore < t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedWorse,
SUM(CASE WHEN t1.scorebefore = t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedSame,
SUM(CASE WHEN t1.scorebefore > t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedBetter
FROM
mytable t0
INNER JOIN mytable t1
ON t0.customerid = t1.customerid
AND t0.date < t1.date
AND NOT EXISTS (
SELECT 1
from mytable
WHERE
customerid = t1.customerid
AND date < t1.date
AND date > t0.date
)
GROUP BY t0.service
| service | ReturnedWorse | ReturnedSame | ReturnedBetter |
| -------- | ------------- | ------------ | -------------- |
| Service1 | 0 | 2 | 0 |
| Service2 | 1 | 0 | 1 |
| Service3 | 1 | 0 | 0 |
From your comments, I understand that you want to take into account all future returns and not only the next one. This eliminates the need for a correlatead subquery, and actually yields your expected output. See this db fiddle :
SELECT
t0.service,
SUM(CASE WHEN t1.scorebefore < t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedWorse,
SUM(CASE WHEN t1.scorebefore = t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedSame,
SUM(CASE WHEN t1.scorebefore > t0.scorebefore THEN 1 ELSE 0 END) AS ReturnedBetter
FROM
mytable t0
INNER JOIN mytable t1
ON t0.customerid = t1.customerid
-- AND t0.service = t1.service
AND t0.date < t1.date
GROUP BY t0.service
| service | ReturnedWorse | ReturnedSame | ReturnedBetter |
| -------- | ------------- | ------------ | -------------- |
| Service1 | 0 | 2 | 1 |
| Service2 | 1 | 0 | 1 |
| Service3 | 2 | 0 | 0 |

SQL query that finds dates between a range and takes values from another query & iterates range over them?

Sorry if the wording for this question is strange. Wasn't sure how to word it, but here's the context:
I'm working on an application that shows some data about the how often individual applications are being used when users make a request from my web server. The way we take data is by every time the start page loads, it increments a data table called WEB_TRACKING at the date of when it loaded. So there are a lot of holes in data, for example, an application might've been used heavily on September 1st but not at all September 2nd. What I want to do, is add those holes with a value on hits of 0. This is what I came up with.
Select HIT_DATA.DATE_ACCESSED, HIT_DATA.APP_ID, HIT_DATA.NAME, WORKDAYS.BENCH_DAYS, NVL(HIT_DATA.HITS, 0) from (
select DISTINCT( TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY')) as BENCH_DAYS
FROM WEB_TRACKING WEB
) workDays
LEFT join (
SELECT TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY') as DATE_ACCESSED, APP.APP_ID, APP.NAME,
COUNT(WEB.IP_ADDRESS) AS HITS
FROM WEB_TRACKING WEB
INNER JOIN WEB_APP APP ON WEB.APP_ID = APP.APP_ID
WHERE APP.IS_ENABLED = 1 AND (APP.APP_ID = 1 OR APP.APP_ID = 2)
AND (WEB.ACCESS_TIME > TO_DATE('08/04/2018', 'MM/DD/YYYY')
AND WEB.ACCESS_TIME < TO_DATE('09/04/2018', 'MM/DD/YYYY'))
GROUP BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), APP.APP_ID, APP.NAME
ORDER BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), app_id DESC
) HIT_DATA ON HIT_DATA.DATE_ACCESSED = WORKDAYS.BENCH_DAYS
ORDER BY WORKDAYS.BENCH_DAYS
It returns all the dates that between the date range and even converts null hits to 0. However, it returns null for app id and app name. Which makes sense, and I understand how to give a default value for 1 application. I was hoping someone could help me figure out how to do it for multiple applications.
Basically, I am getting this (in the case of using just one application):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| NULL | NULL | 08/04/2018 | 0 |
| 1 | test_app | 08/05/2018 | 1 |
| NULL | NULL | 08/06/2018 | 0 |
But I want this(with multiple applications):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| 1 | test_app | 08/04/2018 | 0 |<- these 0's are converted from null
| 1 | test_app | 08/05/2018 | 1 |
| 1 | test_app | 08/06/2018 | 0 | <- these 0's are converted from null
| 2 | prod_app | 08/04/2018 | 2 |
| 2 | prod_app | 08/05/2018 | 0 | <- these 0's are converted from null
So again to reiterate the question in this long post. How should I go about populating this query so that it fills up the holes in the dates but also reuses the application names and ids and populates that information as well?
You need a list of dates, that probably comes from a number generator rather than a table (if that table has holes, your report will too)
Example, every date for the past 30 days:
select trunc(sysdate-30) + level as bench_days from dual connect by level < 30
Use TRUNC instead of turning a date into a string in order to cut the time off
Now you have a list of dates, you want to add in repeating app id and name:
select * from
(select trunc(sysdate-30) + level as bench_days from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
Now you have all your dates, crossed with all your apps. 2 apps and 30 days will make a 60 row resultset via a cross join. Left join your stat data onto it, and group/count/sum/aggregate ...
select app.app_id, app.name, dat.artificialday, COALESCE(stat.ct, 0) as hits from
(select trunc(sysdate-30) + level as artificialday from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
LEFT JOIN
(SELECT app_id, trunc(access_time) accdate, count(ip_address) ct from web_tracking group by app_id, trunc(access_time)) stat
ON
stat.app_id = app.app_id AND
stat.accdate = dat.artificialday
You don't have to write the query this way/do your grouping as a subquery, I'm just representing it this way to lead you to thinking about your data in blocks, that you build in isolation and join together later, to build more comprehensive blocks

Count specific values that belong to the same column SQL Server

I would like to ask for help I would like to count the number of records but having 2 conditions for 2 specific values only from a one single column:
Count the records first:
count(service) as contacts
service | state | address
service1| 1 |123
service2| 1 |321
service3| 3 |332
service1| 2 |333
service2| 2 |111
service3| 3 |333
1st result
Service | Contacts | status
service1| 1 | 1
service1| 1 | 2
service1| 1 | 3
service2| 1 | 1
service2| 1 | 2
service2| 1 | 3
if status = 1 and 2 then add to count else 0 (only count who's "status" is equal to 1 and 2.
Result:
Final result
Service | Contacts
Service1 | 2
Service2 | 2
sorry for the confusion
Thanks for your big help
This should work for all major databases
SELECT service,
SUM(CASE WHEN status IN(1,2) THEN contacts ELSE 0 END) as Contacts
FROM (your query) as x
GROUP BY service
select service,
sum(case when status in(1,2) then contacts else 0 end) as contact_count
from MyTable
group by service
What this will do is evaluate each row and include the contacts value in the sum where the status is one that is required. As it is an aggregate, you need to group by the service

Most efficient way to count number of matching rows with multiple criterias at once

I have a very large table (called device_operation with 50 million rows) which holds all the operations of a product in its lifecycle (such as "start", "stop", "refill", ..." and the status of these operations (row status : Completed, Failed), with the ID of the associated device (row device_id) and a timestamp for each operation (row create_date).
Something like this :
/------+-----------+------------------+---------\
| ID | Device ID | Create_Date | Status |
+------+-----------+------------------+---------+
| 1 | 1 | 2012-03-04 01:43 | Success |
| 2 | 4 | 2012-04-04 02:34 | Failed |
| 3 | 9 | 2013-01-01 01:23 | Failed |
| 4 | 4 | 2013-12-12 12:34 | Success |
| 5 | 23 | 2014-02-01 03:45 | Success |
| 6 | 1 | 2014-05-03 08:34 | Failed |
\------+-----------+------------------+---------/
I also have another table (called subscription) that tells me when the warranty has started (row create_date) for the product (row device_id). Warranty lasts one year.
/-----------+------------------\
| Device ID | Create_Date |
+-----------+------------------+
| 2 | 2011-04-03 05:00 |
| 4 | 2012-03-05 03:45 |
| 5 | 2012-03-05 06:07 |
| ... | ... |
\-----------+------------------/
I am using PostgreSQL.
I want to do the following :
List all device IDs which had at least one successful operation before a given date (2014-07-06)
For each of those devices, count :
The number of failed operations after that date + 2 days (2014-07-08), and the device was under warranty when the operation was attempted
The number of failed operations after that date + 2 days (2014-07-08), and the device was outside warranty when the operation was attempted
The number of successful operations after that date (device being under warranty or not)
I had some limited success with the following (query has been simplified a little bit for readability - there are other joins involved to get to the subscription table, and other criterias to include the devices in the list) :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(*)
FROM device_operation dop
WHERE dop.device_id = device_operation.device_id and
dop.create_date > '2014-07-08' and
dop.status = 'Success'
) as success,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date <= subscription.create_date + interval '1 year'
) as failed_during_warranty,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date > subscription.create_date + interval '1 year'
) as failed_after_warranty,
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_during_warranty DESC, failed_after_warranty DESC
As you can guess, it's so slow I cannot run the query. However it gives you an idea of the structure.
I have tried to use NULLIF to combine the requests into one, in the hope it's going to make PostgreSQL only list the subquery once instead of 3, but it returns "subquery must return only one column" :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(NULLIF(dop2.status != 'Success', true)) as completed,
COUNT(NULLIF(dop2.status != 'Failed' or not (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_in_warranty,
COUNT(NULLIF(dop2.status != 'Failed' or (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_after_warranty
FROM device_operation dop2
WHERE
dop2.device_id = device_operation.device_id and
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08'
) as subq
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_in_warranty DESC, failed_outside_warranty DESC
I also tried to move the subquery to the FROM clause but that doesn't work as I need to run the subquery for each row of the main query (or do I ? maybe there's a better way)
What I expect is something like this :
/-----------+---------+------------------------+-----------------------\
| Device ID | Success | Failed during warranty | Failed after warranty |
+-----------+---------+------------------------+-----------------------+
| 194853 | 10 | 0 | 0 |
| 7853 | 5 | 5 | 0 |
| 5848 | 3 | 0 | 56 |
| 8546455 | 0 | 45 | 0 |
| 102 | 0 | 4 | 1 |
| 69329548 | 0 | 0 | 9 |
| 17 | 0 | 0 | 0 |
\-----------+---------+------------------------+-----------------------+
Can someone help me find the most efficient way to do it ?
EDIT: Corner cases: You can consider all devices have an entry in subscription.
Thank you very much !
I think you just require conditional aggregation. I find the data structure and logic a bit hard to follow, but I think the following is basically what you need:
SELECT d.device_id,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumFails,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day' AND
d.create_date > s.create_date + interval '1 year'
THEN 1 ELSE 0
END) as NumFailsNoWarranty,
SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumSuccesses
FROM device_operation d JOIN
subscription s
ON d.device_id = s.device_id
GROUP BY d.device_id
HAVING SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' THEN 1 ELSE 0 END) > 0;

Selecting only the very last set of rows matching the criteria

Considering following data
test1=# select * from sample order by created_at DESC;
id | status | service | created_at
----+--------+---------+---------------------
8 | OK | 1 | 2015-09-16 11:54:00
7 | OK | 1 | 2015-09-16 11:53:00
6 | FAIL | 1 | 2015-09-16 11:52:00
5 | OK | 1 | 2015-09-16 11:51:00
How can I select only the rows with ID 7 and 8. Using window functions I can get row numbers partitioned over status, but so far did not figure out the way how to limit the results only to the last rows identifying 'successful period' for given service.
You need to find the time of the most recent status = 'FAIL' for each service, then select those records of the same service that are more recent:
SELECT *
FROM sample
LEFT JOIN (
SELECT service, max(created_at) AS last_fail
FROM sample
WHERE status = 'FAIL'
GROUP BY service) f USING (service)
WHERE created_at > last_fail
OR last_fail IS NULL; -- also show services without ever failing
This assumes there are only two status codes. If there are more, add a status = 'OK' filter to the WHERE clause.
The most simple approach would be this:
SELECT *
FROM sample AS s
LEFT JOIN (SELECT service, max(id)
FROM sample
WHERE status = 'FAIL'
GROUP BY service) AS q
ON s.id > q.id
AND s.service = q.service