SQL - How to find entries where a value is missing between two rows - sql

We are using Presto SQL at my job. I have spent hours trying to search for the answer to this question but can't find an answer and it's quite difficult to search for. Solving this issue opens the door to fixing a lot of problems.
I need to write a query that tries to find all entries where REQUEST_CANCEL & CHARGED exist but CANCEL_ACCOUNT is missing.
CHARGED & CANCEL_ACCOUNT should always come after REQUEST_CANCEL.
Table Name: CUSTOMER_INFO
|DATE_TIME|CUST_ID |ACTION |
|20180726 |1234 |CHARGED |
|20180726 |1234 |CANCEL_ACCOUNT|
|20180726 |1234 |REQUEST_CANCEL|
All these values exist in the same table. Here's what I have so far.
SELECT *
FROM
(SELECT *
FROM CUSTOMER_INFO
WHERE
DATE_TIME = 20180726
AND ACTION = REQUEST_CANCEL) as a
JOIN
(SELECT *
FROM CUSTOMER_INFO
WHERE
DATE_TIME = 20180726
AND ACTION = CHARGED) as b
ON a.CUST_ID = b.CUST_ID
WHERE
a.TIME < b.TIME
Let me explain it in a way that makes sense.
A = REQUEST_CANCEL
B = CANCEL_ACCOUNT
C = CHARGED
How do you query for when A and C exist but B is missing. The sequence needs to be exact A > B > C. It's essentially querying for something that doesn't exist between two values that do exist. In my current query, B can be returned between the two values and that's NOT what I want.

I think you're searching for NOT EXISTS and a corellated subquery.
SELECT *
FROM (SELECT *
FROM customer_info
WHERE action = 'REQUEST_CANCEL') rc
INNER JOIN (SELECT *
FROM customer_info
WHERE action = 'CHARGED') c
ON c.cust_id = rc.cust_id
AND c.date_time >= rc.date_time
WHERE NOT EXISTS (SELECT *
FROM customer_info ca
WHERE ca.cust_id = rc.cust_id
AND ca.action = 'CANCEL_ACCOUNT'
AND ca.date_time >= rc.date_time
AND ca.date_time <= c.date_time);

Use group by and having:
select cust_id
from customer_info ci
where date_time = 20180726 and
action in ('REQUEST_CANCEL', 'CHARGED', 'CANCEL_ACCOUNT')
group by cust_id
having sum(case when action = 'REQUEST_CANCEL' then 1 else 0 end) > 0 and
sum(case when action = 'CHARGED' then 1 else 0 end) > 0 and
sum(case when action = 'CANCEL_ACCOUNT' then 1 else 0 end) = 0 ;
Each sum() counts the number of matching records for the customer with that action. The > 0 says that one exists. The = 0 says that none exist.
The database doesn't matter for this logic. Here is a SQL Fiddle using MySQL.

Related

Grouped / pivot query in SQL

I am looking to find a query to link data from one table to an other table. both tables contain the same order ID and part ID. but one table has 4 lines for every piece. the PRFNAME field should be added in a separated column.
Table 1 : IDBGPL
ID;ORDERID;CNT;NAME1;MATNAME;MATGRID;SURFTLEN;SURFTWIDTH
16385;Project_Name_1;1;Corpuszijde;EG_ED_Px_W1001_ST9_18;0;2146;138
16386;Project_Name_1;1;Corpuszijde;EG_ED_Px_W1001_ST9_18;0;2146;50
16385;Project_Name_2;1;Zijde Rechts;EG_ED_Px_W1001_ST9_18;0;888;519,2
Table 2: IDBPRF
ID;ORDERID;PRFNO;PRFID
16385;Project_Name_1;1;PRF_Verstek_Overmaat_25
16385;Project_Name_1;2;PRF_EG_ABS_W1000_ST9_2
16385;Project_Name_1;3;PRF_EG_ABS_W1000_ST9_2
16385;Project_Name_1;4;PRF_EG_ABS_W1000_ST9_2
16386;Project_Name_1;1;PRF_Verstek_Overmaat_25
16386;Project_Name_1;2;PRF_EG_ABS_W1000_ST9_2
16386;Project_Name_1;3;PRF_00_Overmaat_25
16386;Project_Name_1;4;PRF_EG_ABS_W1000_ST9_2
16385;Project_Name_2;1;EG_ABS_H3335_ST28_08_75
16385;Project_Name_2;2;PRF_EG_ABS_W1000_ST9_2
16385;Project_Name_2;3;PRF_00
16385;Project_Name_2;4;PRF_EG_ABS_W1000_ST9_2
This is the desired result from the query:
ID;ORDERID;NAME1;Kant 1 (PRFNO = 1);Kant 2 (PRFNO = 2);Kant 3 (PRFNO = 3);Kant 4 (PRFNO = 4)
16385;Project_Name_1;Corpuszijde;PRF_Verstek_Overmaat_25;PRF_EG_ABS_W1000_ST9_2;PRF_EG_ABS_W1000_ST9_2;PRF_EG_ABS_W1000_ST9_2
16386;Project_Name_1;Corpuszijde;PRF_Verstek_Overmaat_25;PRF_EG_ABS_W1000_ST9_2;PRF_00_Overmaat_25;PRF_EG_ABS_W1000_ST9_2
16385;Project_Name_2;Zijde Rechts;EG_ABS_H3335_ST28_08_75;PRF_EG_ABS_W1000_ST9_2;PRF_00;PRF_EG_ABS_W1000_ST9_2
Here is a link to some example data in Excel:
https://rasgroup-my.sharepoint.com/:x:/g/personal/maarten_de_potter_ras-group_eu/Ec-PvcsV5GhFuademkU83JcBiob28FicrUr3Kl9-VkPE7Q?e=sqOYUu
The closest i got to a result was this query, but then I was not able to group the 4 part lnes to one.
enter image description here
SELECT
a.ID AS A_ID ,
a.ORDERID AS A_ORDERID,
b.ID AS B_ID ,
b.ORDERID AS B_ORDERID,
b.NAME1,
(CASE WHEN a.PRFNO = 1 THEN a.PRFID END) AS Kant1,
(CASE WHEN a.PRFNO = 2 THEN a.PRFID END) AS Kant2,
(CASE WHEN a.PRFNO = 3 THEN a.PRFID END) AS Kant3,
(CASE WHEN a.PRFNO = 4 THEN a.PRFID END) AS Kant4
FROM
IDBPRF a, IDBGPL b
WHERE
a.ORDERID = b.ORDERID
AND a.ID = b.ID
Hopefully someone could help me with solving this puzzle.

TSQL Select element with only one value

I have a project in which I need to save request for file. I have to save the state of the request with the date of each step.
I have two tables in my database :
FileRequest :
FileRequestStatus :
I would like to create a request to get each FileName from the table FileRequest having only the Status == 'NEW'. The desired output would be in this case C12345, LIVE.
If you have a better idea in order to build my database, I'll take it
I tried something like that :
SELECT [FileName] FROM [FileRequest]
INNER JOIN [FileRequestStatus] ON [FileRequestStatus].[RequestId] = [FileRequest].[RequestId]
GROUP BY [FileRequestStatus].[RequestId]
HAVING COUNT([FileRequestStatus].[RequestStatus]) < 2
SELECT FR.FileName
FROM [FileRequest] FR
INNER JOIN [FileRequestStatus] FRS ON FRS.[RequestId] = FR.[RequestId]
GROUP BY FR.FileName
HAVING COUNT(CASE WHEN FRS.RequestStatus <> 'New' THEN 1 END) = 0 --No statuses other than `NEW`
AND COUNT(CASE WHEN FRS.RequestStatus = 'New' THEN 1 END) >= 1 --Atleast one status `NEW`
I think the simplest method is aggregation:
select fr.FileName
from nep.FileRequest fr join
nep.FileRequestStatus] frs
on frs.RequestId = fr.RequestId
group by fr.FileName
having min(frs.RequestStatus) = max(frs.RequestStatus) and
min(frs.RequestStatus) = 'New';
Note: This assumes that the request status is never NULL (although that is easy to take into account).
What this does is aggregate by file name and then check that the statuses for a given file are all equal (the first condition) and equal to 'New' (the second condition).
You need to include [FileName] to your GROUP BY, then you can select it in result
SELECT [nep].[FileRequest].[FileName] FROM [nep].[FileRequest]
INNER JOIN [nep].[FileRequestStatus] ON [nep].[FileRequestStatus].[RequestId] = [nep].[FileRequest].[RequestId]
GROUP BY [nep].[FileRequestStatus].[RequestId], [nep].[FileRequest].[FileName]
HAVING COUNT([nep].[FileRequestStatus].[RequestStatus]) < 2

SQL AVG statement another table

I'm having trouble with a SQL query. The goal is to see only the certain entries on a specific date (I got this already) which have an average score below 1 in their last 5 home games.
You can see the tables here:
http://dbup2date.uni-bayreuth.de/downloads/bundesliga/Klassendiagramm_Bundesliga.pdf
I have this code so far:
SELECT
A.Spieltag, A.Datum, A.Uhrzeit, B.Name AS Heim
FROM
Spiel AS A
JOIN
Verein AS B ON A.Heim = B.V_ID AND B.Liga = 1
WHERE
Spieltag = 5
HAVING
AVG(SELECT Tore_Heim
FROM Spiel AS A
JOIN Verein AS B
WHEN A.Heim = B.V_ID) < 1
Sorry for my bad English
Thank you
Make sure you group by any =fields that you are not aggregating on when you use HAVING. And you can simplify that HAVING clause since you are already referencing those exact tables in your FROM:
SELECT
A.Spieltag, A.Datum, A.Uhrzeit, B.Name AS Heim
FROM
Spiel AS A
JOIN
Verein AS B ON A.Heim = B.V_ID AND B.Liga = 1
WHERE
Spieltag = 5
GROUP BY A.Spieltag, A.Datum, A.Uhrzeit, B.Name
HAVING
AVG(Tore_Heim) < 1

Is it possible to get several COUNT in one single SQL request?

I need help to write a simple procedure. Let me explain what I'm trying to do.
I have 3 tables
tJobOffer
tApplication
tApplicationStatus
I would like to create a procedure that return me a list of tJobOffer with the statistics of different status of this tJobOffer. tApplicationStatus is linked to tApplication that is linked to tJobOffer. An application can be CANDIDATE / ACCEPTED / REFUSED / IGNORED / ...
I created this query :
SELECT
[T].[JobOfferId],
[T].[JobOfferTitle],
COUNT([A].[ApplicationId]) AS [CandidateCount]
FROM [tJobOffer] AS [T]
LEFT JOIN [tApplication] AS [A]
INNER JOIN [tApplicationStatus] AS [S]
ON [S].[ApplicationStatusId] = [A].[ApplicationStatusId]
AND [S].[ApplicationStatusTechnicalName] = 'CANDIDATE'
ON [A].[JobOfferId] = [T].[JobOfferId]
GROUP BY
[T].[JobOfferId],
[T].[JobOfferTitle]
ORDER BY [T].[JobOfferTitle] ;
The result is
> 52ED7C67-21E1-49BB-A1F8-0601E6EED1EA Announce a 0
> F26B228D-0C81-4DA8-A287-F8F997CC1F9C Announce b 0
> 9DA60B23-F113-4C7F-9707-2B90C1556D5D Announce c 2
> 258E11A7-79C1-47B6-8C61-413AA54E2360 Announce d 0
> DA582383-5DF4-4E1D-837C-382371BDEF57 Announce e 1
The result is correct. I get my tJoboffers with statistic on status candidate. I have 2 candidates for Announce c and 1 candidate for announce e. If I change my string 'CANDIDATE' to 'ACCEPTED' or 'REFUSED' I can get the statistic on these status. Is it possible to get everything in one request?
Something like
> 52ED7C67-21E1-49BB-A1F8-0601E6EED1EA Announce a 0 0 2
> F26B228D-0C81-4DA8-A287-F8F997CC1F9C Announce b 0 0 1
> 9DA60B23-F113-4C7F-9707-2B90C1556D5D Announce c 2 0 0
> 258E11A7-79C1-47B6-8C61-413AA54E2360 Announce d 0 0 0
> DA582383-5DF4-4E1D-837C-382371BDEF57 Announce e 1 1 0
use SUM and CASE
SELECT
[T].[JobOfferId],
[T].[JobOfferTitle],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'CANDIDATE' THEN 1 ELSE 0 END) AS [CandidateCount],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'ACCEPTED' THEN 1 ELSE 0 END) AS [ACCEPTEDCount],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'REFUSED' THEN 1 ELSE 0 END) AS [REFUSEDCount]
FROM [tJobOffer] AS [T]
LEFT JOIN [tApplication] AS [A]
ON [A].[JobOfferId] = [T].[JobOfferId]
LEFT JOIN [tApplicationStatus] AS [S]
ON [S].[ApplicationStatusId] = [A].[ApplicationStatusId]
GROUP BY
[T].[JobOfferId],
[T].[JobOfferTitle]
ORDER BY [T].[JobOfferTitle] ;
Yes, it is. One way to do that is to use the PIVOT function. The other way to do this would be to use LEFT OUTER JOIN each time you need a count of items, something like that:
SELECT a.JobID, COUNT(b.JobID), COUNT(c.JobID)
FROM AllVacancies as a
LEFT OUTER JOIN
(SELECT JobID from AllVacancies WHERE ApplicationStatus = 'CANDIDATE') as b
ON a.JobID = b.JobID
LEFT OUTER JOIN
(SELECT JobID FROM AllVacancies WHERE ApplicationStatus = 'ACCEPTED') as c
ON a.JobID = cJobID
as many times as the categories that you need.
Yes you can carry as many counts as you want
try this
SELECT COUNT(1),COUNT(2) FROM demoTable;
this will give you the count of no of rows in column 1 and column two
usually this will result the same count unless you have any null values allowesd and existing in any of the column.
If any column has any null value then its count may differ , so basically the idea is to apply count on the primary Key column .
Select count(*) from demoTable ;
this line also results in count values but it applies for the complete table , so performance wise applying count on any particular column is better .
again on the accuracy issue this must be applied on the column with primary key or not null constraint .
moving further , you need not to restrain to a single table
SELECT COUNT(1),COUNT(2) FROM ( joins or any selection from any no of table);
just be aware of the no of columns existing in the selection set

how to write this query using joins?

i have a table campaign which has details of campaign mails sent.
campaign_table: campaign_id campaign_name flag
1 test1 1
2 test2 1
3 test3 0
another table campaign activity which has details of campaign activities.
campaign_activity: campaign_id is_clicked is_opened
1 0 1
1 1 0
2 0 1
2 1 0
I want to get all campaigns with flag value 3 and the number of is_clicked columns with value 1 and number of columns with is_opened value 1 in a single query.
ie. campaign_id campaign_name numberofclicks numberofopens
1 test1 1 1
2 test2 1 1
I did this using sub-query with the query:
select c.campaign_id,c.campaign_name,
(SELECT count(campaign_id) from campaign_activity WHERE campaign_id=c.id AND is_clicked=1) as numberofclicks,
(SELECT count(campaign_id) from campaign_activity WHERE campaign_id=c.id AND is_clicked=1) as numberofopens
FROM
campaign c
WHERE c.flag=1
But people say that using sub-queries are not a good coding convention and you have to use join instead of sub-queries. But i don't know how to get the same result using join. I consulted with some of my colleagues and they are saying that its not possible to use join in this situation. Is it possible to get the same result using joins? if yes, please tell me how.
This should do the trick. Substitute INNER JOIN for LEFT OUTER JOIN if you want to include campaigns which have no activity.
SELECT
c.Campaign_ID
, c.Campaign_Name
, SUM(CASE WHEN a.Is_Clicked = 1 THEN 1 ELSE 0 END) AS NumberOfClicks
, SUM(CASE WHEN a.Is_Opened = 1 THEN 1 ELSE 0 END) AS NumberOfOpens
FROM
dbo.Campaign c
INNER JOIN
dbo.Campaign_Activity a
ON a.Campaign_ID = c.Campaign_ID
GROUP BY
c.Campaign_ID
, c.Campaign_Name
Assuming is_clicked and is_opened are only ever 1 or 0, this should work:
select c.campaign_id, c.campaign_name, sum(d.is_clicked), sum(d.is_opened)
from campaign c inner join campaign_activity d
on c.campaign_id = d.campaign_id
where c.flag = 1
group by c.campaign_id, c.campaign_name
No sub-queries.
Hmm. Is what you want as simple as this? I'm not sure I'm reading the question right...
SELECT
campaign_table.campaign_id, SUM(is_clicked), SUM(is_opened)
FROM
campaign_table
INNER JOIN campaign_activity ON campaign_table.campaign_id = campaign_activity.campaign_id
WHERE
campaign_table.flag = 1
GROUP BY
campaign_table.campaign_id
Note that with an INNER JOIN here, you won't see campaigns where there's nothing corresponding in the campaign_activity table. In that circumstance, you should use a LEFT JOIN, and convert NULL to 0 in the SUM, e.g. SUM(IFNULL(is_clicked, 0)).
I suppose this should do it :
select * from campaign_table inner join campaign_activity on campaign_table.id = campaign_activity.id where campaign_table.flag = 3 and campaign_activity.is_clicked = 1 and campaign_activity.is_opened = 1
Attn : this is not tested in a live situation
The SQL in it's simplest form and most robust form is this: (formatted for readability)
SELECT
campaign_table.campaign_ID, campaign_table.campaign_name, Sum(campaign_activity.is_clicked) AS numberofclicks, Sum(campaign_activity.is_open) AS numberofopens
FROM
campaign_table INNER JOIN campaign_activity ON campaign_table.campaign_ID = campaign_activity.campaign_ID
GROUP BY
campaign_table.campaign_ID, campaign_table.campaign_name, campaign_table.flag
HAVING
campaign_table.flag=1;