SQL: tricky question for finding lockout dates - sql

Hope you can help. We have a table with two columns Customer_ID and Trip_Date. The customer receives 15% off on their first visit and on every visit where they haven't received the 15% off offer in the past thirty days. How do I write a single SQL query that finds all days where a customer received 15% off?
The table looks like this
+-----+-------+----------+
| Customer_ID | date |
+-----+-------+----------+
| 1 | 01-01-17 |
| 1 | 01-17-17 |
| 1 | 02-04-17 |
| 1 | 03-01-17 |
| 1 | 03-15-17 |
| 1 | 04-29-17 |
| 1 | 05-18-17 |
+-----+-------+----------+
The desired output would look like this:
+-----+-------+----------+--------+----------+
| Customer_ID | date | received_discount |
+-----+-------+----------+--------+----------+
| 1 | 01-01-17 | 1 |
| 1 | 01-17-17 | 0 |
| 1 | 02-04-17 | 1 |
| 1 | 03-01-17 | 0 |
| 1 | 03-15-17 | 1 |
| 1 | 04-29-17 | 1 |
| 1 | 05-18-17 | 0 |
+-----+-------+----------+--------+----------+
We are doing this work in Netezza. I can't think of a way using just window functions, only using recursion and looping. Is there some clever trick that I'm missing?
Thanks in advance,
GF

You didn't tell us what your backend is, nor you gave some sample data and expected output nor you gave a sensible data schema :( This is an example based on guess of schema using postgreSQL as backend (would be too messy as a comment):
(I think you have Customer_Id, Trip_Date and LocationId in trips table?)
select * from trips t1
where not exists (
select * from trips t2
where t1.Customer_id = t2.Customer_id and
t1.Trip_Date > t2.Trip_Date
and t1.Trip_date - t2.Trip_Date < 30
);

Related

What Clause would most optimally create this query?

So I don't have much experience with SQL, and am trying to learn. An interview question I came across had this question. I'm trying to learn more SQL but maybe I'm missing a piece of info to solve this? Or maybe I'm approaching the problem wrong.
This is the question:
We have following two tables , below is their info:
POLICY (id as int, policy_content as varchar2)
POLICY_VOTES (vote as boolean, policy_id as int)
Write a single query that returns the policy_id, number of yes(true) votes and number of no(false) votes with a row for each policy up for a vote stored
My first thought when approaching this was to use a WITH clause to get the policy_ids and use an inner join to get the votes for yes and no but I can't find a way to make it work, which is what leads me to believe that there's another clause in SQL I'm not aware of or couldn't find that would make it easier. Either that or I'm thinking of the problem in the wrong way.
Good question.
I cannot answer too specifically, since you did not specify a DBMS, but what you will want to do is count or situationally sum based on criteria. When you use an aggregate function like that, you also need GROUP BY.
Here are two example tables I made with test data:
policy
| id | policy_content |
|----|----------------|
| 1 | foo |
| 2 | foo |
| 3 | foo |
| 4 | foo |
| 5 | foo |
policy votes
| vote | policy_id |
|------|-----------|
| yes | 1 |
| no | 1 |
| yes | 2 |
| yes | 2 |
| no | 3 |
| no | 3 |
| no | 4 |
| yes | 4 |
| yes | 5 |
| yes | 5 |
Using the below query:
SELECT
policy_votes.policy_id,
SUM(CASE WHEN vote = 'yes' THEN 1 ELSE 0 END) AS yes_votes,
SUM(CASE WHEN vote = 'no' THEN 1 ELSE 0 END) AS no_votes
FROM
policy_votes
GROUP BY
policy_votes.policy_id
You get:
| POLICY_ID | YES_VOTES | NO_VOTES |
|-----------|-----------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 0 |
| 4 | 1 | 1 |
| 5 | 2 | 0 |
| 3 | 0 | 2 |
Here is an SQL Fiddle for you to try it out.
Try this:
select p.id, p.content,
Count(case when pv.vote='true' then 1 end) as number_of_yes,
Count(case when pv.vote='false' then 1 end) as number_of_no
From policy p join policy_votes pv
On(p.id = pv.policy_id)
Group by p.id, p.content
Cheers!!

T-SQL - Turn table with current page and previous pages into a sequential order per session

I'm trying to create a table to show the activy per session on a website.
Should look like something like that
Prefered table:
+------------+---------+--------------+-----------+
| SessionID | PageSeq| Page | Duration |
+------------+---------+--------------+-----------+
| 1 | 1 | Home | 5 |
| 1 | 2 | Sales | 10 |
| 1 | 3 | Contact | 9 |
| 2 | 1 | Sales | 5 |
| 3 | 1 | Home | 30 |
| 3 | 2 | Sales | 5 |
+------------+---------+--------------+-----------+
Unfortunetly my current dataset doesn't have information about the session_id, but can be deducted based on the time and the path.
Current table:
+------------------+---------+------------+---------------+----------+
| DATE_HOUR_MINUTE | Page | Prev_page | Total_session | Duration |
+------------------+---------+------------+---------------+----------+
| 201801012020 | Home | (entrance) | 24 | 5 |
| 201801012020 | Sales | Home | 24 | 10 |
| 201801012020 | Contact | Sales | 24 | 9 |
| 201801012020 | Sales | (entrance) | 5 | 5 |
| 201801012020 | Home | (entrance) | 35 | 30 |
| 201801012020 | Sales | Home | 35 | 5 |
+------------------+---------+------------+---------------+----------+
What is the best way to turn the current table into the prefered table format?
I've tried searching for nested tables, looped tables, haven't found a something related to this problem yet.
So if you can risk sessions starting at the same time with the same duration, should be easy enough to do using a recursive query.
;WITH sessionTree AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as sessionId
, 1 AS PageSeq
, *
FROM Session
WHERE PrevPage = '(entrance)'
UNION ALL
SELECT prev.sessionId
, prev.PageSeq + 1
, next.*
FROM sessionTree prev
JOIN Session next
ON next.TotalDuration = prev.TotalDuration
AND next.PrevPage = prev.Page
AND next.date_hour_minute >= prev.date_hour_minute
)
SELECT * FROM sessionTree
ORDER BY sessionId, PageSeq
sessionId is generated for each entry with (entrance) as prevPage, with PageSeq = 1. Then in the recursive part visits with the timestamp later than the previous page and with the same duration are joined on prev.page = next.PrevPage condition.
Here's a working example on dbfiddle

Calculate Final outcome based on Results/ID

For a Table T1
+----------+-----------+-----------------+
| PersonID | Date | Employment |
+----------+-----------+-----------------+
| 1 | 2/28/2017 | Stayed the same |
| 1 | 4/21/2017 | Stayed the same |
| 1 | 5/18/2017 | Stayed the same |
| 2 | 3/7/2017 | Improved |
| 2 | 4/1/2017 | Stayed the same |
| 2 | 6/1/2017 | Stayed the same |
| 3 | 3/28/2016 | Improved |
| 3 | 5/4/2016 | Improved |
| 3 | 4/19/2017 | Worsened |
| 4 | 5/19/2016 | Worsened |
| 4 | 2/16/2017 | Improved |
+----------+-----------+-----------------+
I'm trying to calculate a Final Result field partitioning on Employment/PersonID fields, based on the latest result/person relative to prior results. What I mean by that is explained in the logic behind Final Result:
For every Person,
If all results/person are Stayed the same, then only should final
result for that person be "Stayed the same"
If Worsened/Improved
are in the result set for a person, the final result should be the
latest Worsened/Improved result for that person, irrespective of "Stayed the same" after a W/I result.
Eg:
Person 1 Final result -> Stayed the same, as per (1)
Person 2 Final result -> Improved, as per (2)
Person 3 Final result -> Worsened, as per (2)
Person 4 Final result -> Improved, as per (2)
Desired Result:
+----------+-----------------+
| PersonID | Final Result |
+----------+-----------------+
| 1 | Stayed the same |
| 2 | Improved |
| 3 | Worsened |
| 4 | Improved |
+----------+-----------------+
I know this might involve Window functions or Sub-queries but I'm struggling to code this.
Hmmm. This is a prioritization query. That sounds like row_number() is called for:
select t1.personid, t1.employment
from (select t1.*,
row_number() over (partition by personid
order by (case when employment <> 'Stayed the same' then 1 else 2 end),
date desc
) as seqnum
from t1
) t1
where seqnum = 1;

Select rows where one column is within a day of another column

I have two tables from a site similar to SO: one with posts, and one with up/down votes for each post. I would like to select all votes cast on the day that a post was modified.
My tables layout is as seen below:
Posts:
-----------------------------------------------
| post_id | post_author | modification_date |
-----------------------------------------------
| 0 | David | 2012-02-25 05:37:34 |
| 1 | David | 2012-02-20 10:13:24 |
| 2 | Matt | 2012-03-27 09:34:33 |
| 3 | Peter | 2012-04-11 19:56:17 |
| ... | ... | ... |
-----------------------------------------------
Votes (each vote is only counted at the end of the day for anonymity):
-------------------------------------------
| vote_id | post_id | vote_date |
-------------------------------------------
| 0 | 0 | 2012-01-13 00:00:00 |
| 1 | 0 | 2012-02-26 00:00:00 |
| 2 | 0 | 2012-02-26 00:00:00 |
| 3 | 0 | 2012-04-12 00:00:00 |
| 4 | 1 | 2012-02-21 00:00:00 |
| ... | ... | ... |
-------------------------------------------
What I want to achieve:
-----------------------------------
| post_id | post_author | vote_id |
-----------------------------------
| 0 | David | 1 |
| 0 | David | 2 |
| 1 | David | 4 |
| ... | ... | ... |
-----------------------------------
I have been able to write the following, but it selects all votes on the day before the post modification, not on the same day (so, in this example, an empty table):
SELECT Posts.post_id, Posts.post_author, Votes.vote_id
FROM Posts
LEFT JOIN Votes ON Posts.post_id = Votes.post_id
WHERE CAST(Posts.modification_date AS DATE) = Votes.vote_date;
How can I fix it so the WHERE clause takes the day before Votes.vote_date? Or, if not possible, is there another way?
Depending on which type of database you are using (SQL, Oracle ect..);To take the Previous days votes you can usually just subtract 1 from the date and it will subtract exactly 1 day:
Where Cast(Posts.modification_date - 1 as Date) = Votes.vote_date
or if modification_date is already in date format just:
Where Posts.modification_date - 1 = Votes.vote_date
If you have a site similar to Stack Overflow, then perhaps you also use SQL Server:
SELECT p.post_id, p.post_author, v.vote_id
FROM Posts p LEFT JOIN
Votes v
ON p.post_id = v.post_id
WHERE CAST(DATEDIFF(day, -1, p.modification_date) AS DATE) = v.vote_date;
Different databases have different ways of subtracting one day. If this doesn't work, then your database has something similar.
I found another solution, which is to add a day to Posts.modification_date:
...
WHERE CAST(CEILING(CAST(p.modification_date AS FLOAT)) AS datetime) = v.vote_date

Create a pivot table from two tables based on dates

I have two MS Access tables sharing a one to many relationship. Their structures are like the following:
tbl_Persons
+----------+------------+-----------+
| PersonID | PersonName | OtherData |
+----------+------------+-----------+
| 1 | PersonA | etc. |
| 2 | PersonB | |
| 3 | PersonC | |
tbl_Visits
+----------+------------+------------+-----------------------
| VisitID | PersonID | VisitDate | dozens of other fields
+----------+------------+------------+-----------
| 1 | 1 | 09/01/13 |
| 2 | 1 | 09/02/13 |
| 3 | 2 | 09/03/13 |
| 4 | 2 | 09/04/13 | etc...
I wish to create a new table based on the VisitDate field, the column headings of which are Visit-n where n is 1 to the number of visits, Visit-n-Data1, Visit-n-Data2, Visit-n-Data3 etc.
MergedTable
+----------+----------+---------------+-----------------+----------+----------------+
| PersonID | Visit1 | Visit1Data1 | Visit1Data2... | Visit2 | Visit2Data1... |
+----------+----------+---------------+-----------
| 1 | 09/01/13 | | | 09/02/13 |
| 2 | 09/03/13 | | | 09/04/13 |
| 3 | etc. | |
I am really not sure how to do this. Whether SQL query or using DAO then looping through records and columns. It is essential that there is only 1 PersonID per row and all his data appears chronologically into columns.
Start of by ranking the visits with something like
SELECT PersonID, VisitID,
(SELECT COUNT(VisitID) FROM tbl_Visits AS C
WHERE C.PersonID = tbl_Visits.PersonID
AND C.VisitDate < tbl_Visits.VisitDate) AS RankNumber
FROM tbl_Visits
Use this query as a base for the 'pivot'
Since you seem to have some visits of persons on the same day (visit 1 and 2) the WHERE clause needs to be a bit more sophisticated. But I hope you get the basic concept.
Pivoting can be done with multiple LEFT JOINs.
I question if my solution will have a high performance, since I did not test it. It is easier in SQL Server than in MS Access to accomplish.