SQLite Getting multiple results with LIMIT 1 - sql

I have the following problem.
Part of a task is to determine the visitor(s) with the most money spent between 2000 and 2020.
It just looks like this.
SELECT UserEMail FROM Visitor
JOIN Ticket ON Visitor.UserEMail = Ticket.VisitorUserEMail
where Ticket.Date> date('2000-01-01') AND Ticket.Date < date ('2020-12-31')
Group by Ticket.VisitorUserEMail
order by SUM(Price) DESC;
Is it possible to output more than one person if both have spent the same amount?

Use rank():
SELECT VisitorUserEMail
FROM (SELECT VisitorUserEMail, SUM(PRICE) as sum_price,
RANK() OVER (ORDER BY SUM(Price) DESC) as seqnum
FROM Ticket t
WHERE t.Date >= date('2000-01-01') AND Ticket.Date <= date('2021-01-01')
GROUP BY t.VisitorUserEMail
) t
WHERE seqnum = 1;
Note: You don't need the JOIN, assuming that ticket buyers are actually visitors. If that assumption is not true, then use the JOIN.

Use a CTE that returns all the total prices for each email and with NOT EXISTS select the rows with the top total price:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT c.VisitorUserEMail
FROM cte c
WHERE NOT EXISTS (
SELECT 1 FROM cte
WHERE SumPrice > c.SumPrice
)
or:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT VisitorUserEMail
FROM cte
WHERE SumPrice = (SELECT MAX(SumPrice) FROM cte)
Note that you don't need the function date() because the result of date('2000-01-01') is '2000-01-01'.
Also I think that the conditions in the WHERE clause should include the =, right?

Related

SQL - One Table with Two Date Columns. Count and Join

I have a table (vOKPI_Tickets) that has the following columns:
|CreationDate | CompletionDate|
I'd like to get a count on each of those columns, and group them by date. It should look something like this when complete:
| Date | Count-Created | Count-Completed |
I can get each of the counts individually, by doing something like this:
SELECT COUNT(TicketId)
FROM vOKPI_Tickets
GROUP BY CreationDate
and
SELECT COUNT(TicketId)
FROM vOKPI_Tickets
GROUP BY CreationDate
How can I combine the output into one table? I should also note that this will become a View.
Thanks in advance
Simple generic approach:
select
coalesce(crte.creationdate, cmpl.CompletionDate) as theDate,
crte.cnt as created,
cmpl.cnt as completed
from
(select creationdate, count (*) as cnt from vOKPI_Tickets where creationdate is not null group by creationdate) crte
full join
(select CompletionDate, count (*) as cnt from vOKPI_Tickets where CompletionDate is not null group by CompletionDate) cmpl
on crte.creationdate = cmpl.CompletionDate
You can unpivot and aggregate. A general method is:
select dte, sum(created), sum(completed)
from ((select creationdate as dte, 1 as created, 0 as completed
from vOKPI_Tickets
) union all
(select completed as dte, 0 created, 1 as completed
from vOKPI_Tickets
)
) t
group by dte;
In SQL Server, you can use cross apply for this:
select d.dt, sum(d.is_completed) count_created, sum(d.is_completed) count_completed
from vokpi_tickets t
cross apply (values (creationdate, 1, 0), (completion_date, 0, 1)) as d(dt, is_created, is_completed)
where d.dt is not null
group by d.dt

conditional running sum

I'm trying to return the number of unique users that converted over time.
So I have the following query:
WITH CTE
As
(
SELECT '2020-04-01' as date,'userA' as user,1 as goals Union all
SELECT '2020-04-01','userB',0 Union all
SELECT '2020-04-01','userC',0 Union all
SELECT '2020-04-03','userA',1 Union all
SELECT '2020-04-05','userC',1 Union all
SELECT '2020-04-06','userC',0 Union all
SELECT '2020-04-06','userB',0
)
select
date,
COUNT(DISTINCT
IF
(goals >= 1,
user,
NULL)) AS cad_converters
from CTE
group by date
I'm trying to count distinct user but I need to find a way to apply the distinct count to the whole date. I probably need to do something like a cumulative some...
expected result would be something like this
date, goals, total_unique_converted_users
'2020-04-01',1,1
'2020-04-01',0,1
'2020-04-01',0,1
'2020-04-03',1,2
'2020-04-05',1,2
'2020-04-06',0,2
'2020-04-06',0,2
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.date, t.goals, total_unique_converted_users
FROM `project.dataset.table` t
LEFT JOIN (
SELECT a.date,
COUNT(DISTINCT IF(b.goals >= 1, b.user, NULL)) AS total_unique_converted_users
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.date >= b.date
GROUP BY a.date
)
USING(date)
I would approach this by tagging when the first goal is scored for each name. Then simply do a cumulative sum:
select cte.* except (seqnum), countif(seqnum = 1) over (order by date)
from (select cte.*,
(case when goals = 1 then row_number() over (partition by user, goals order by date) end) as seqnum
from cte
) cte;
I realize this can be expressed without the case in the subquery:
select cte.* except (seqnum), countif(seqnum = 1 and goals = 1) over (order by date)
from (select cte.*,
row_number() over (partition by user, goals order by date) as seqnum
from cte
) cte;

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

writing a sql query in MySQL with subquery on the same table

I have a table svn1:
id | date | startdate
23 2002-12-04 2000-11-11
23 2004-08-19 2005-09-10
23 2002-09-09 2004-08-23
select id,startdate from svn1 where startdate>=(select max(date) from svn1 where id=svn1.id);
Now the problem is how do I let know the subquery to match id with the id in the outer query. Obviously id=svn1.id wont work. Thanks!
If you have the time to read more:
This really is a simplified version of asking what I really am trying to do here. my actual query is something like this
select
id, count(distinct archdetails.compname)
from
svn1,svn3,archdetails
where
svn1.name='ant'
and svn3.name='ant'
and archdetails.name='ant'
and type='Bug'
and svn1.revno=svn3.revno
and svn3.compname=archdetails.compname
and
(
(startdate>=sdate and startdate<=edate)
or
(
sdate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
and
edate>=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
or
(
sdate>=startdate
and
edate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
)
group by id LIMIT 0,40;
As you notice select max(date) from svn1 where type='Bug' and id=svn1.id has to be calculated many times.
Can I just calculate this once and store it using AS and then use that variable later. Main problem is to correct id=svn1.id so as to correctly equate it to the id in the outer table.
I'm not sure you can eliminate the repetition of the subquery, but the subquery can reference the main query if you use a table alias, as in the following:
select id,
count(distinct archdetails.compname)
from svn1 s1,
svn3 s3,
archdetails a
where s1.name='ant' and
s3.name='ant' and
a.name='ant' and
type='Bug' and
s1.revno=s3.revno and
s3.compname = a.compname and
( (startdate >= sdate and startdate<=edate) or
(sdate <= (select max(date)
from svn1
where type='Bug' and
id=s1.id and
edate>=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) or
(sdate >= startdate and edate<=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) )
group by id LIMIT 0,40;
Share and enjoy.
You should be able to left join to a sub-select so you only run the query once. Then you can do a join condition to pull out the maximum for the ID on each record as shown below:
SELECT id,
COUNT(DISTINCT archdetails.compname)
FROM svn1,
svn3,
archdetails
LEFT JOIN (
SELECT id, MAX(date) AS MaximumDate
FROM svn1
WHERE TYPE = 'Bug'
GROUP BY id
) AS MaxDate ON MaxDate.id = svn1.id
WHERE svn1.name = 'ant'
AND svn3.name = 'ant'
AND archdetails.name = 'ant'
AND TYPE = 'Bug'
AND svn1.revno = svn3.revno
AND svn3.compname = archdetails.compname
AND (
(startdate >= sdate AND startdate <= edate)
OR (
sdate <= MaxDate.MaximumDate
AND edate >= MaxDate.MaximumDate
)
OR (
sdate >= startdate
AND edate <= MaxDate.MaximumDate
)
)
GROUP BY
id LIMIT 0,
40;
Try using alias, something like this should work:
select s.id,s.startdate from svn1.s where s.startdate>=(select max(date) from svn1.s2 where s.id=s2.id);