SQL stratascratch facebook interview question - sql

SMS Confirmations From Users
Facebook sends SMS texts when users attempt to 2FA (2-factor authenticate) to log into the platform. In order to successfully 2FA they must confirm they received the SMS text. Confirmation texts are only valid on the date they were sent. Unfortunately, there was an ETL problem where friend requests and invalid confirmation records were inserted into the logs which are stored in the 'fb_sms_sends' table. Fortunately, the 'fb_confirmers' table contains valid confirmation records so you can use this table to identify confirmed SMS texts.
Calculate the percentage of confirmed SMS texts for August 4, 2020.
fb_sms_sends
ds datetime
country varchar
carrier varchar
phone_number int
type varchar
fb_confirmers
date datetime
phone_number int
My solution -
Select s.ds, (count(c.phone_number)::Float/count(s.phone_number)::Float)*100 as perc
from fb_sms_sends s
left join fb_confirmers c
on s.phone_number = c.phone_number
where s.ds = c.date
group by s.ds
fb_sms_sends table
Not sure what is wrong here. Can someone please explain?

I think what you're not handling is the scenario
Unfortunately, there was an ETL problem where friend requests and invalid confirmation records were inserted into the logs which are stored in the 'fb_sms_sends' table. Fortunately, the 'fb_confirmers' table contains valid confirmation records so you can use this table to identify confirmed SMS texts.
You need to remove the friend requests and invalid confirmation records from the table.
If you add
type NOT IN ('confirmation', 'friend_request')
to your WHERE clause, you should get the right answer.

Instead of using date in where condition, you need to use it in the join condition, it will keep all the records from the left table. Keeping in the where makes it an inner join. I slightly modified your solution.
Select s.ds, (count(c.phone_number)::Float/count(s.phone_number)::Float)*100 as perc
from fb_sms_sends s
left join fb_confirmers c
on s.phone_number = c.phone_number and s.ds = c.date
where type <> 'friend_request'
and s.ds = '2020-08-04'
group by 1

Related

MS Access SQL code - query issue with NOT IN

I'm trying to find out which partners has not paid the monthly tuition in a particular month.
I have a table called Socios containing all partners names SocioNome and another table called RegistroPagamento contaning all payments done (This particular table is fulfilled by a form where the user input the Partner Name, Amount Paid and which particular month/year the payment is related to).
I have created a query where I used the SQL code below:
SELECT [SocioNome]
FROM [Socios] NOT IN
(SELECT [SocioNome] FROM [RegistroPagamento] WHERE [MesBoleto] = [Forms]![Selecionar_MCobranca]![TBoxMes] AND [AnoBoleto] = [Forms]![Selecionar_MCobranca]![TBoxAno]);
[Selecionar_MCobranca] is the form I have mentioned before and the [TBoxMes] & [TBoxAno] are the combo boxes from the form which the user can select the month and the year the payment refers to.
When I run the code, a error message pops up indicating that there is a FORM clause syntax issue, and I don't know exactly what is causing the problem.
NOT IN is a comparison operator in the WHERE clause. It does not belong in the FROM cluase. I strongly recommend using NOT EXISTS instead. The idea is:
SELECT s.SocioNome
FROM Socios as s
WHERE NOT EXISTS (SELECT 1
FROM RegistroPagamento as rp
WHERE rp.MesBoleto = [Forms]![Selecionar_MCobranca]![TBoxMes] AND
rp.AnoBoleto = [Forms]![Selecionar_MCobranca]![TBoxAno] AND
rp.SocioNome = s.SocioNome
);
NOT IN returns no rows if any row in the subquery is NULL. To protect against this, just use NOT EXISTS. It has the expected behavior in this case.

Retrieve the most recent person record added to an account

I am trying to build a table of Contact IDs (Primary Keys) of the most recently created records assigned to each Account of a certain type in our Salesforce org.
Working in Salesforce Marketing Cloud, I'm trying to build a sample list that I can setup to update automatically so the records I'm testing against are never stale. I only need one example from each account to do my testing. Since I want to make sure the record isn't stale, I want to select the most recent record assigned to each Account.
Every Contact is assigned to one and only one Account. The Account ID lives as a foreign key on the Contact Record. The created date of the Contact is also a field on the Contact record.
The Sample list needs to contain the email address, ContactID, and the name of the Management Company, which lives on the Account record.
I figured doing a directional JOIN toward the Account table would do the trick, but that didn't work. I figure that's because there's nothing distinguishing which record to pick.
This is what I've got for code, which is pretty useless...
SELECT
C.Email AS Email,
C.Id AS Id18,
C.AccountID AS AccountID,
A.Management_Company AS ManagementCompany
FROM
ENT.Contact_Salesforce_DE AS C
RIGHT JOIN ENT.Account_Salesforce_DE AS A ON C.AccountID = A.Id
WHERE
A.RecordTypeId = '1234567890ABCDEFGH' AND
A.Management_Company IS NOT NULL AND
C.Email IS NOT NULL
The syntax checks out, but I get a system error every time I run it.
Marketing Cloud runs on an older version of SQL Server, so some more recent query functions won't always work.
And yes, I'm a relative noob to SQL. Won't surprise me if this has a really simple solution, but I couldn't find another entry describing the solution, so...
If I followed you correctly, you want to pull out the latest contact associated with each account.
On database servers that do not support window function (which seems to be the case of your RDBMS), one typical solution is to add a special condition to the JOIN. This NOT EXISTS condition uses a subquery to ensure that the record being picked in the child table is the most recent one (in other words : there is no child record with a highest creation date than the one being joined) :
SELECT
c.Email AS Email,
c.Id AS Id18,
c.AccountID AS AccountID
FROM
ENT.Account_Salesforce_DE AS a
INNER JOIN ENT.Contact_Salesforce_DE AS c
ON c.AccountID = a.Id
AND c.email IS NOT NULL
AND NOT EXISTS (
SELECT 1
FROM ENT.Contact_Salesforce_DE AS c1
WHERE
c1.email IS NOT NULL
AND c1.AccountID = a.Id
AND c1.CreatedDate > c.CreatedDate
)
WHERE
A.RecordTypeId = '1234567890ABCDEFGH'
AND A.Management_Company IS NOT NULL

How to tightly contain an SQL query result

I'm writing an application that implements a message system through a 'memos' table in a database. The table has several fields that look like this:
id, date_sent, subject, senderid, recipients,message, status
When someone sends a new memo, it will be entered into the memos table. A memo can be sent to multiple people at the same time and the recipients userid's will be inserted into the 'recipients' field as comma separated values.
It would seem that an SQL query like this would work to see if a specific userid is included in a memo:
SELECT * FROM memos WHERE recipients LIKE %15%
But I'm not sure this is the right solution. If I use the SQL statement above, won't that return everything that "contains" 15? For example, using the above statement, user 15, 1550, 1564, 2015, would all be included in the result set (and those users might not actually be on the recipient list).
What is the best way to resolve this so that ONLY the user 15 is pulled in if they are in the recipient field instead of everything containing a 15? Am I misunderstanding the LIKE statement?
I think you would be better off having your recipients as a child table of the memos table. So your memo's table has a memo ID which is referenced by the child table as
MemoRecipients
-----
MemoRecipientId INT PRIMARY KEY, IDENTITY,
MemoId INT FK to memos NOT NULL
UserId INT NOT NULL
for querying specific memos from a user you would do something like
SELECT *
FROM MEMOS m
INNER JOIN memoRecipients mr on m.Id = mr.memoId
WHERE userId = 15
No, you aren't misunderstood, that's how LIKE works.. But to achieve what you want, it would be better not to combine the recipients into 1 field. Instead try to create separate table that saves the recipient list for each memo..
For me I will use below schema, for your need:
Table_Memo
id, date_sent, subject, senderid, message, status
Table_Recipient
id_memo FK Table_Memo(id), recipient
By doing so, if you want to get specific recipients from a memo, you can do such query:
SELECT a.* FROM Table_Memo a, Table_Recipient b
WHERE a.id = "memo_id" AND a.id = b.id_memo AND b.recipient LIKE %15%
I am not sure how your application is exactly pulling these messages, but I imagine that better way would be creating a table message_recepient, which will represent many-to-many relationship between recipients and memos
id, memoId, recepientId
Then your application could pull messages like this
SELECT m.*
FROM memos m inner join message_recepient mr on m.id = mr.memoId
WHERE recepientId = 15
This way you will get messages for the specific user. Again, don't know what your status field is for but if this is for new/read/unread, you could add in your where
and m.status = 'new'
Order by date_set desc
This way you could just accumulate messages, those that are new

Trying to write SQL Self Join?

I have a user table associated to a website with when customers forget there password they create a new account rather then be bothered retrieving their forgotten password.
I would like to see how many times a customer may have joined the website by joining the user table to itself using the customer email address and the ID would be unique each time they joined and I put a statement in checking to see if the Account IDs are different.
Here is my query:
`Select Distinct
T1.Email as "eMail-01", T2.Email as "eMail-02", T1.AccountID as "AccountID-01", T2.AccountID as "AccountID-02", T1.UserID as "UserID-01", T2.UserID as "UserID-02"
From User T1
Left Join Users T2 on T1.eMail = T2.eMail
Where ( T2.eMail is not null ) and ( T2.eMail <> '' )
and ( T1.AccountID <> T2.AccountID )`
The table has about 60,000 records in it and I seem to be getting a great number records returned base on the number of AccountID permeations.
For instance 1 Customer registered 5 times with the same email address, so I’m getting 25 records back (5 x 5). I’m not sure if I’m writing this query correctly.
The query is running very long.
If I understand correctly, what you most probably want is a count of AccountID per email address so there's no need for self join here. The query would be :
SELECT Email, count(AccountID)
FROM User
GROUP BY Email
and should run quite quickly event with 60.000 emails.
Anyway, you should think about putting a UNIQUE index on the email column after cleaning the table. You could then benefit with email search performances and prevent users to create multiple account with the same e-mail address. This should help them retrieve their password instead.
Suggestions:
First, you are using a left join which is useless, because you filter the records that are not null on the right side of the relation (which can be acomplished with a simple inner join). Either you use inner join or remove the condition T2.eMail is not null.
Secondly, is your table properly indexed? If it is not, add the appropriate indexes.
Thirdly, you can use a very simple query to track down the emails that have more than one accountId:
select email, count(accountId) as accounts
from user
group by email
having count(accountId) > 1
Then you can work using only the emails that have more than one account.
Hope this helps

Extract Both Counts AND Earliest Instance from my Dataset

Using Microsoft Sql 2000
I have a requirement to be able to email a monthly report that details a number of events.
(I have got the email bit sussed).
Amongst the data I need to email is a report on the number of certain courses people have attended. (so far so easy, couple of inner joins and a Count() and Im there.)
To add to that, some of the internal courses that we run have an expiry date which prompts a referesher course. I have been able to crudely get the data I need by using the sql code for part one and sticking the result set into a temp table, then by iterating over each row in that table, getting the user Id, querying the users course attendences, sorting it on date so that the earliest is at the top, and just taking the TOP 1 record.
This seems so inefficient, so is there any way I can ammend my current query so that I can also get the date of just the earliest course that the user attended?
i.e.
SELECT uName, COUNT(uId), [ not sure what would go in here] FROM UserDetails
INNER JOIN PassDates
ON PassDates.fkUser = uId)
GROUP BY uName, uId
where, for examples sake
UserDetails
uId
uName
and
PassDates
fkUser
CourseId
PassDate
Hope Ive explained this well enough for someone to help me.
To put an answer to the question..
SELECT uName, COUNT(uId), MIN(PassDate)
FROM UserDetails
INNER JOIN PassDates ON PassDates.fkUser = uId
GROUP BY uName, uId
You can turn it into a left join if you have users without any courses (yet)