Oracle SQL find popular query - sql

I have got a table called Questionnaire in SQL where there are columns named ID, Newspaper and CreditCards. I need to output the newspaper that is most popular among IDs who has at least 3 creditcards.
Example:
ID Credit Cards Newspaper
----------------------------------------
10354 3 The Independent
12154 4 The Independent
11354 2 The Times
14587 3 The Daily Mail
19874 5 The Sunday news
16847 1 The Independent
Can you please help with an sql command to output the query stated above?

select *
from (
select newspaper,
rank() over (order by count(*) desc) as rnk
from Questionnaire
where credit_cards >= 3
group by newspaper
) t
where rnk = 1
If two newspapers have the same "popularity" both will be returned.
SQLFiddle demo: http://sqlfiddle.com/#!4/16dcb/1

If you are looking to get only the most popular Newspaper[s], then this can solve the query.
select
newspaper
, count(1) as fct
from
Questionnaire
where
CreditCards >= 3
group by newspaper
having fct =
(
select
max(ct)
from
(
select
newspaper
,count(1) as ct
from
Questionnaire
where
CreditCards >= 3
group by newspaper
)
)
/

Related

Postgres, groupBy and count for table and relations at the same time

I have a table called 'users' that has the following structure:
id (PK)
campaign_id
createdAt
1
123
2022-07-14T10:30:01.967Z
2
1234
2022-07-14T10:30:01.967Z
3
123
2022-07-14T10:30:01.967Z
4
123
2022-07-14T10:30:01.967Z
At the same time I have a table that tracks clicks per user:
id (PK)
user_id(FK)
createdAt
1
1
2022-07-14T10:30:01.967Z
2
2
2022-07-14T10:30:01.967Z
3
2
2022-07-14T10:30:01.967Z
4
2
2022-07-14T10:30:01.967Z
Both of these table are up to millions of records... I need the most efficient query to group the data per campaign_id.
The result I am looking for would look like this:
campaign_id
total_users
total_clicks
123
3
1
1234
1
3
I unfortunately have no idea how to achieve this while minding performance and most important of it all I need to use WHERE or HAVING to limit the query in a certain time range by createdAt
Note, PostgreSQL is not my forte, nor is SQL. But, I'm learning spending some time on your question. Have a go with INNER JOIN after two seperate SELECT() statements:
SELECT * FROM
(
SELECT campaign_id, COUNT (t1."id(PK)") total_users FROM t1 GROUP BY campaign_id
) tbl1
INNER JOIN
(
SELECT campaign_id, COUNT (t2."user_id(FK)") total_clicks FROM t2 INNER JOIN t1 ON t1."id(PK)" = t2."user_id(FK)" GROUP BY campaign_id
) tbl2
USING(campaign_id)
See an online fiddle. I believe this is now also ready for a WHERE clause in both SELECT statements to filter by "createdAt". I'm pretty sure someone else will come up with something better.
Good luck.
Hope this will help you.
select u.campaign_id,
count(distinct u.id) users_count,
count(c.user_id) clicks_count
from
users u left join clicks c on u.id=c.user_id
group by 1;
See here query output

SQL interview question: select, join, grouping

At the interview got question:
What products clients bought before first order of brand "Brand 1". Select top 5 by orders.
Tables:
Items:
RezonItemID;BrandName
5555613;Brand 1
2315946;Brand 2
9132648;Brand 3
3125847;Brand 1
3126548;Brand 5
Orders:
ClientID;ClientOrderID;RezonItemID;FactMoment
00611847;4562145;5555613;2021-01-09
00798451;7987465;1321321;2021-08-10
00914751;3154844;9132648;2021-07-01
00975418;9797451;1312125;2021-09-09
00978461;9413235;9754512;2021-10-29
My decision:
WITH first_order AS (
SELECT ClientID, MIN(FactMoment) as o_date
FROM orders
JOIN items
USING(RezonItemID)
WHERE BrandName = 'Brand 1'
GROUP BY ClientID
)
SELECT RezonItemID, COUNT(*) AS n_orders
FROM orders
JOIN items
USING(RezonItemID)
JOIN first_order
USING(ClientID)
WHERE FactMoment < o_date
GROUP BY RezonItemID
ORDER BY n_orders DESC
LIMIT 5
Is it possible to solve this by window functions? Maybe there is better decision?
Given the following tables
1 RezonItemID;BrandName
2 5555613;Brand 1
3 2315946;Brand 2
4 9132648;Brand 3
5 3125847;Brand 1
6 3126548;Brand 5
7
8 ClientID;ClientOrderID;RezonItemID;FactMoment
9 00611847;4562145;5555613;2021-01-09
10 00798451;7987465;1321321;2021-08-10
11 00914751;3154844;9132648;2021-07-01
12 00975418;9797451;1312125;2021-09-09
13 00978461;9413235;9754512;2021-10-29
It seems that if the question is "Which products did the clients buy before the first order for an item of Brand 1," a sql query may not be necessary. Assuming that FactMoment is a timestamp for the order, we can see that the first order has the earliest date (01/09/21) and has a "RezonItemID" 5555613. That item has the brand "Brand 1".
So the answer would be that no items were purchased before the first purchase of an item with BrandName = 'Brand 1'.
This is a puzzling question, the sample test data is not particularly useful since it yields no testable results so is pretty much useless.
If you want to use window functions then that's certainly possible.
The following successfully yields no rows and should work, but without proper test data it's hard to actually be sure!
Note using() is not supported by some databases, ansi join syntax is preferred.
select top(5) BrandName from (
select o.ClientID, i.BrandName, o.FactMoment,
Min(case when i.BrandName='Brand 1' then FactMoment end) over() earliest,
Count(*) over(partition by ClientID) qty
from Orders o left join Items i on i.RezonItemID=o.RezonItemID
)o
where FactMoment<earliest
order by qty desc

SQL To delete number of items is less than required item number

I have two tables - StepModels (support plan) and FeedbackStepModels (feedback), StepModels keeps how many steps each support plan requires.
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
SupportPlanID (Steps)
-------------------------------
1 4
2 9
3 3
4 10
FeedbackStepModels keeps how many steps employee entered the system
SELECT [FeedbackID],SupportPlanID,Count(*)AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
FeedbackID SupportPlanID
---------------------------------------------
1 1 3 --> this suppose to be 4
2 2 9 --> Correct
3 3 0 --> this suppose to be 3
4 4 10 --> Correct
If submitted Feedback steps total is less then required total amount I want to delete this wrong entry from the database. Basically i need to delete FeedbackID 1 and 3.
I can load the data into List and compare and delete it, but want to know if we can we do this in SQL rather than C# code.
You can use the query below to remove your unwanted data by SQL Script
DELETE f
FROM FeedbackStepModels f
INNER JOIN (
SELECT [FeedbackID],SupportPlanID, Count(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
) f_derived on f_derived_FeedbackID=f.FeedBackID and f_derived.SupportPlanID = f.SupportPlanID
INNER JOIN (
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) s_derived on s_derived.SupportPlanID = f.SupportPlanID
WHERE f_derived.StepsNumber < s_derived.Steps
I think you want something like this.
DELETE FROM [FeedbackStepModels]
WHERE FeedbackID IN
(
SELECT a.FeedbackID
FROM
(
SELECT [FeedbackID],
SupportPlanID,
COUNT(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,
SupportPlanID
) AS a
INNER JOIN
(
SELECT [SupportPlanID],
COUNT(*) AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) AS b ON a.SupportPlanID = b.[SupportPlanID]
WHERE a.StepsNumber < b.Steps
);

SQL with nested condition

EDIT: added third requirement after playing with solution from Tim Biegeleisen
EDIT2: modified Robbie's DOB to be before his parent's marriage date
I am trying to create a query that will look at two tables and determine the difference in dates based on a percentage. I know, super confusing... Let me try and explain using the tables below:
Bob and Mary are married on 2010-01-01 and expect 4 kids (Parent table)
I want to know how many years it took until they met 50% of their expected kids (i.e. 2/4 kids). Using the Child table to see the DOB of their 4 kids, we know that Frankie is the second child which meets our 50% threshold so we use Frankie's DOB and subtract it from Frankie's parent's marriage date and end up with 3 years!
If the goal isn't reached then display no value e.g. Mick and Jo only had 1 child so far so they haven't yet reached their goal
Hoping this is doable using BigQuery standard SQL.
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
Below both soluthions for BigQuery Standard SQL
First one is more in classic sql way, the second one is more of BigQuery style (I think)
First Solution: with analytics function
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
Second Solution: with use of ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
Dummy Data
You can test / play with both solutions using below dummy data
#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)
Try the following query, whose logic is too verbose to explain it well. I join the parent and child tables, bringing into line the parent id, number of years elapsed since marriage, running number of children, and expected number of children. With this information in hand, we can easily find the first row whose running number of children matches or exceeds half of the expected number.
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
Note that there may be issues with how I computed the yearly differences, or how I compute the threshhold for half the number of expected children. Also, if we were using a recent enterprise database we could have used an analytic function to get the running number of children instead of a correlated subquery, but I was unsure if Big Query would support that, so I used the latter.

Query to select distinct values from different tables and not have them repeat (show them as a flat file)

I'm trying to get all phones, emails, and organizations for a person and show it in a flat file format. There should be n number of rows, where n is the max count of organizations, emails, or phones. NULL values will be shown once all values have been shown in the rows, with NULL being the last values. The emails and phones can only have 1 PreferredInd per person. I want these to be on the same row (1 of them can be NULL). I've tried to do this on a more complex query, but couldn't get it to work, so I've started over using this simpler example.
Example tables and values:
#ContactPerson
Id Name
1 John Doe
#ContactEmail
Id PersonId Email PreferredInd
1 1 johndoe#us.gov 0
2 1 jdoe#us.gov 1
3 1 johndoe#gmail.com 0
#ContactPhone
Id PersonId Phone PreferredInd
1 1 888-867-5309 0
2 1 305-476-5234 1
#ContactOrganization
Id PersonId Organization
1 1 US Government
2 1 US Army
I want a resulting set to look like:
Name Organization PreferredInd Email Phone
John Doe US Government 1 jdoe#us.gov 888-867-5309
John Doe US Army 0 johndoe#us.gov 305-467-5234
John Doe NULL 0 johndoe#gmail.com NULL
The complete sql code that I have for this example is here on pastebin. It also includes code to create the sample tables. It works when the count of emails exceeds the count of organizations or phones, but that won't always be true. I can't seem to figure out how to get the result that I'm looking for. The actual tables I'm working with can have 0 or infinity emails, phones, or organizations per person. There will also be many more values, but I can fix that myself.
Can you help me fix my query or show me a simpler way to do it? If you have any questions, just let me know and I can try to answer them.
something like this?
with cte_e as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactEmail
), cte_p as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactPhone
), cte_o as (
select
*,
row_number() over(order by Organization) as rn
from ContactOrganization
), cte_d as (
select distinct rn, PersonId from cte_e union
select distinct rn, PersonId from cte_p union
select distinct rn, PersonId from cte_o
)
select
pr.Name, o.Organization, e.Email, p.Phone
from cte_d as d
left outer join ContactPerson as pr on pr.Id = d.PersonId
left outer join cte_e as e on e.PersonId = d.PersonId and e.rn = d.rn
left outer join cte_p as p on p.PersonId = d.PersonId and p.rn = d.rn
left outer join cte_o as o on o.PersonId = d.PersonId and o.rn = d.rn
sql fiddle demo
it's a bit clumsy, I can think of couple of other possible ways to do this, but I think this one is most readable one
Step 1
Write a query that does the full join of all the tables, which will end up with lots of duplicate rows for each person (for each email or phone number)
Step 2
Write a second query that uses GroupBy to group the rows, and that uses the Case or Decode keywords (like a c# switch statement) to find the preferred row value and select it as the value to display